redshift query taking too long

1 of 3

6

You have 126 million rows in that table. It's going to take more than a second on a single dc1.large node.

Here's some ways you could improve the performance:

More nodes

Spreading data across more nodes allows more parallelization. Each node adds additional processing and storage. Even if your data volume only justifies one node, if you want more performance, add more nodes.

SORTKEY

For the right type of query, the SORTKEY can be the best way to improve query speed. Sorting data on disk allows Redshift to skip over blocks that it knows does not contain relevant data.

For example, your query has WHERE brandID = 3927, so having brandID as the SORTKEY would make this extremely efficient because very few disk blocks would contain data for one brand.

Interleaved sorting is rarely the best sorting method to use because it is less efficient than a single or compound sort key and takes a long time to VACUUM. If the query you have shown is typical of the type of queries you are running, then use a compound sort key of brandId, ti or ti, brandId. It will be much more efficient.

SORTKEYs are typically a date column, since they are often found in a WHERE clause and the table will be automatically sorted if data is always appended in time order.

The Interleaved Sort would be causing Redshift to read many more disk blocks to find your data, thereby significantly increasing query time.

DISTKEY

The DISTKEY should typically be set to the field that is most used in a JOIN statement on the table. This is because data relating to the same DISTKEY value is stored on the same slice. This won't have such a large impact on a single node cluster, but it is still worth getting right.

Again, you have only shown one type of query, so it is hard to recommend a DISTKEY. Based on this query alone, I would recommend DISTKEY EVEN so that all slices participate in the query. (It is also the default DISTKEY if no specific DISTKEY is selected.) Alternatively, set DISTKEY to a field not shown -- but certainly don't use brandId as the DISTKEY otherwise only one slice will participate in the query shown.

VACUUM

VACUUM your tables regularly so that the data is stored in SORTKEY order and deleted data is removed from storage.

Experiment!

Optimal settings depend upon your data and the queries you typically run. Perform some tests to compare SORTKEY and DISTKEY values and choose the settings that perform the best. Then, test again in 3 months to see if your queries or data has changed enough to make other settings more efficient.

2 of 3

0

Some time the issue could be due to locks being acquired by other processes. You can refer: https://aws.amazon.com/premiumsupport/knowledge-center/prevent-locks-blocking-queries-redshift/

docs.aws.amazon.com › amazon redshift › database developer guide › query performance tuning › query troubleshooting › query takes too long

Query takes too long - Amazon Redshift

For more information, see Automatic table optimization · Your queries might be writing to disk for at least part of the query execution. For more information, see Query performance improvement. You might be able to improve overall system performance by creating query queues and assigning different ...

Discussions

What can I do about redshift slowness?

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns. More on reddit.com

r/dataengineering

28

43

May 6, 2023

Redshift Long Running Query hang indefinitely: Query completes on RedShift

First off love the module and use it often for large ETL processes for multiple clients, certainly the best Db connection module on npm (and I often have to connect to literally every type of Db in the same @project). After upgrading fro... More on github.com

github.com

6

March 27, 2019

After a Redshift's Maintenance my queries are too slow

It is possible to limit the negative effects of query performance after your cluster maintenance?Thanks More on repost.aws

repost.aws

3

0

January 17, 2022

performance - Simple queries to Redshift really slow - Database Administrators Stack Exchange

I just started testing AWS Redshift and populated a single node with the AWS sample data. Querying a table with 10 or ~400 rows takes around 2 seconds, uncached. I'm not sure if I'm misunderstand... More on dba.stackexchange.com

dba.stackexchange.com

August 28, 2016

docs.aws.amazon.com › amazon redshift › database developer guide › query performance tuning › query processing › factors affecting query performance

Factors affecting query performance - Amazon Redshift

More nodes means more processors and more slices, which enables your queries to process faster by running portions of the query concurrently across the slices. However, more nodes also means greater expense, so you need to find the balance of cost and performance that is appropriate for your system.

reddit.com › r/dataengineering › what can i do about redshift slowness?

r/dataengineering on Reddit: What can I do about redshift slowness?

May 6, 2023 -

Hi Reddit DE - I'm a data analyst that changed jobs to join a dinosaur working with Redshift. I was previously working with Bigquery for SQL scripts, where just looking at table samples (e.g. SELECT * FROM table LIMIT 5) took microseconds. Under the AWS Redshift architecture, these same table sampling jobs now take 3+ minutes and I'm going crazy.

The admins have set up resources dedicated under a user cluster, so things could be worse, but is there anything small you suggest I push for to make life more bearable? I think I need to start by asking for more 2x, 3x more resource slots, but please stop me if this sounds stupid.

I work with Redshift and a similar query for me would never take more than a second let alone multiple minutes, so the issue you're seeing strikes me as not inherent to Redshift. Have you raised the issue with the teams who manage those resources? If someone told me it took 3+ mins for SELECT * FROM table LIMIT 5 that would indicate a deeper problem that needed to be solved.

1 of 21

58

2 of 21

19

Redshift is columnar based so in an instance of select * you’re literally doing the most inefficient thing you could do, because it must scan every column as a file. Redshift shines in areas like incremental aggregation. Anything that requires a lot of small aggregates combined together will perform tremendously better in a columnar based structure than a row based structure like MySQL. But anything that is scanning big data across many columns is awful, even with really efficient indexing. A row based database is good at performing a few large queries. A columnar based database is good at performing many small queries

Chartio

chartio.com › learn › amazon-redshift › identifying-slow-queries-in-redshift

Amazon Redshift: detecting queries that are taking unusually long

February 7, 2018 - Detecting queries that are taking unusually long or are run on a higher frequency interval are good candidates for query tuning. In this tutorial we will look at a diagnostic query designed to help you do just that. During its entire time spent querying against the database that particular query is using up one of your cluster’s concurrent connections which are limited by Amazon Redshift.

Civis Analytics

support.civisanalytics.com › hc › en-us › articles › 360032992652-Troubleshooting-Redshift-Slowness

Troubleshooting Redshift Slowness – Civis Analytics

November 20, 2023 - There are multiple ways that running queries can cause Redshift slowness. It's important to know how to find out what queries are running on your cluster. To do this you can run the following SQL statement: SELECT * FROM stv_recents WHERE status = 'Running' ORDER BY duration DESC; It is recommended that you evaluate the longest-running queries to see if one is potentially blocking others.

docs.aws.amazon.com › amazon redshift › database developer guide › query performance tuning › query troubleshooting › query hangs

Query hangs - Amazon Redshift

Your client connection to the database appears to hang or time out when running long queries, such as a COPY command. In this case, you might observe that the Amazon Redshift console displays that the query has completed, but the client tool itself still appears to be running the query.

Find elsewhere

Google Bing Mojeek

docs.aws.amazon.com › amazon redshift › database developer guide › query performance tuning › query troubleshooting

Query troubleshooting - Amazon Redshift

Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the blog post · . This section provides a quick reference for identifying and addressing some of the most common and most serious issues that you are likely to encounter with Amazon Redshift queries. Connection fails · Query hangs · Query takes too long ·

Twilio Segment

segment.com › docs › connections › storage › warehouses › redshift-tuning

Speeding Up Redshift Queries | Twilio

December 2, 2025 - Number and size of columns. Column sizes and the number of columns also affect load time. If you have long property values or lots of properties per event, the load may take longer as well. To make sure you have enough headroom for quick queries while using Segment Warehouses, here are some tips!

GitHub

github.com › brianc › node-postgres › issues › 1863

Redshift Long Running Query hang indefinitely: Query completes on RedShift · Issue #1863 · brianc/node-postgres

March 27, 2019 - After upgrading from v6.4 to v7.9 long running queries began to hang indefinitely despite completing successfully on the Redshift (RS) instance.

Author OTooleMichael

Integrate.io

integrate.io › home › blog › big data › 15 performance tuning techniques for amazon redshift

15 Performance Tuning Techniques for Amazon Redshift | Integrate.io

November 25, 2025 - Doing so would remove 374,371 queries from your Redshift database. Such a single query would take just a few seconds, instead of 125 minutes. Use Amazon RDS and DBLINK to use Redshift as an OLTP. In the post “Have your Postgres Cake and Eat it Too” we describe this approach in detail.

repost.aws › questions › QU5L3c0MRASSCZx0euQzlyDA › after-a-redshift-s-maintenance-my-queries-are-too-slow

After a Redshift's Maintenance my queries are too slow | AWS re:Post

1 of 3

There are some possible explanations to the behavior you're observing. For example, it is possible that a Redshift software upgrade forces an invalidation of your cluster's global query compile cache if the compile logic was changed as part of the release, but this is just one theory and I would be surprised if this happens with every software upgrade. The best course of action that I can recommend is to open a support ticket, so the AWS/Redshift support team can diagnose your cluster with you in real time.

2 of 3

0

Hello, to best answer your question, I think a few follow-up questions are needed. For example, you mention cluster maintenance (e.g. Redshift software upgrade), but am wondering if it is really automatic table maintenance that is being referred to (e.g. automatic sort key optimization). Also, is it all queries exhibiting a slow-down or just some queries? Have you tried reaching out to account support or your account solutions architect for help possibly examining your cluster with you in real time?

Stack Exchange

dba.stackexchange.com › questions › 148114 › simple-queries-to-redshift-really-slow

performance - Simple queries to Redshift really slow - Database Administrators Stack Exchange

1 of 1

The whole idea around Redshift is not to run in a single node. It is actually designed to run in a sharded cluster and it is expected to have very bad numbers within only one node. If you look at the internals you'll see that is actually designed to run on top of a set of nodes, adding an extra layer for the query processing.

If you are already familiar with the basics, you can check the Best practices.

repost.aws › questions › QUj0lZNbwhRmCdvXLDllFTBw › redshift-slow-read-speeds

Redshift: Slow read speeds | AWS re:Post

1 of 2

Redshift is a columnar database. So instead of using select * from a table, selecting specific columns will perform lot better. Based on query you have provided, please try creating sort key on column id and see if it helps. Typically Redshift takes care of updating statistics automatically but you can also update it using analyze table command. https://docs.aws.amazon.com/redshift/latest/dg/t_Analyzing_tables.html

2 of 2

0

Hello, to improve performance for this specific query and table, I would first explore data model optimizations such as ensuring that you have optimal compression and sort keys (e.g. id column) for the table (distribution style is also important in most cases but since this query doesn't involve joins, not so much). You can easily add these characteristics to your table via the ALTER command. Try looking at the Redshift Advisor recommendations in the Redshift console to see if there any data model optimizations recommended by the Redshift ML algorithms. Another aspect worth considering is if you have an underpowered Redshift cluster vis-a-vis this workload and/or other concurrent workloads. Try examining the CPU utilization for example to see if it is peaking. Try experimenting with an increased node count to see if it results in improved query runtimes.

Medium

medium.com › @AustinBG › anatomy-of-a-redshift-query-bf7433aca5b9

Anatomy of a Redshift Query. Or, Why is My Query Slow? | by Austin Gibbons | Medium

July 30, 2021 - However, creating a new secondary cluster is not a turnkey access to all the same data in your primary redshift cluster. You have to create a Data Share and manage tables and permissions within the data share. We explored this direction but ran into a “cold-start” problem. Data isn’t cached onto disk from S3 until the first time you access the table, and we were observing that the first time a query was run from the consuming cluster it would take about twice as long to execute, so we decided to re-evaluate later rather than risk paying a trailblazer tax.

repost.aws › knowledge-center › redshift-cluster-degrade

Troubleshoot cluster performance issues in Amazon Redshift | AWS re:Post

December 11, 2023 - Check the Amazon Redshift Advisor recommendations. Review the query execution alerts and excessive disk usage. Check for locking issues and long-running sessions or transactions.

Stack Overflow

stackoverflow.com › questions › 76061519 › aws-redshift-query-taking-too-much-time-how-to-troubleshoot-this

amazon web services - AWS: redshift query taking too much time. How to troubleshoot this - Stack Overflow

1 of 1

You are asking Redshift to read and sort nearly a billion rows at what looks like 1000 columns wide. This is going to take a minute.

You only want the top 10 ids so you can break this up and do a lot less work. If you have a reasonable idea of what the max id is then you can add a WHERE clause to only read (and sort) rows that are near this value. Your estimate doesn't need to be very close - just get the number of rows down to a million or so :)

Try:

select * from dbname.tablename where id > <some reasonable guess> order by id desc limit 10;

It would be best if your table is sorted by id (and analyzed) to save on the scan step but even if it isn't the sort phase will be less as the data will still be pruned down.

If you have no idea at all what the max id is you can make improvements by having Redshift find the max id in the query. The cost of this is scanning the table twice but the first time it only needs one column (id).

select * 
from dbname.tablename 
where id > (select max(id) from dbname.tablename) - 1000
order by id desc 
limit 10;

In general it is a good idea to reduced data as early as possible in a query when working with large amounts of data.

repost.aws › knowledge-center › redshift-query-planning-time

Find the cause of high query planning time in Amazon Redshift | AWS re:Post

February 16, 2024 - Queries with exclusive locks on a production load can increase the lock wait time. This increase causes your query planning time in Amazon Redshift to be much longer than the actual execution time.

CloudThat

cloudthat.com › home › blogs › boosting amazon redshift query speed best practices and tips

Boosting Amazon Redshift Query Speed Best Practices and Tips

December 13, 2024 - Queries with multiple joins or aggregations are particularly susceptible to this issue. ... Amazon Redshift distributes data across nodes to enable parallel processing. However, uneven data distribution (data skew) can overload some nodes, creating bottlenecks. Properly configuring distribution styles (e.g., EVEN, KEY) is critical to balancing the load across nodes. ... Too many concurrent queries can saturate the cluster’s resources in multi-user environments.