redshift leader node high cpu

1 of 2

6

The Redshift leader node is the same size and class of compute as the compute nodes. Typically this means that the leader is over provisioned for the role it plays but since its role is so important and impactful if things slows down, it is good that it is over provisioned. The leader needs to compile and optimized the queries and perform final steps in queries (final sort for example). It communicates with the session clients and handles all their requests. If the leader becomes overloaded all these activities slow down creating significant performance issues. It is not good that your leader is hitting 100% CPU often enough for you to notice. I bet the seems sluggish when this happens.

There are a number of ways I've seen "leader abuse" and it usually becomes a problem when bad patterns are copied between users. In no particular order:

Large data literals in queries (INSERT ... VALUES ...). This puts your data through the query compiler on the leader node. This is not what it is design to do and is very expensive for the leader. Use the COPY command to bring data into the cluster. (Just bad, don't do this)
Over use of COMMIT. A commits cause an update to the coherent state of the database and needs to run through the "commit queue" and creates work for the leader and the compute nodes. Having COMMITs every other statement can cause this queue to back up and work to generally back up.
Too many slots defined in the WLM. Redshift can typically only efficiently run between 1 and 2 dozen queries at once. Setting the total slot count very high (like 50) can lead to very inefficient operation and high CPU loads. Depending on workload this can show up for compute or occasionally the lead node.
Large data output through SELECT statements. SELECTs return data but when this data is many GBs in size the management of this data movement (and sorting) is done by the leader node. If large amounts of data need to be extracted from Redshift it should be done with an UNLOAD statement.
Overuse of large cursors. Cursors can be an important tool and needed for many BI tools but cursors are located on the leader and overuse can lead to reduced leader attention on other tasks.
Many / large UNLOADs with parallel off. UNLOADs generally come from the compute nodes straight to S3 but with "parallel off" all the data is routed to the leader node where it is combined (sorted) and sent to S3.

While none of the above of problems in and of themselves, it is when these are overused, used in ways they are not intended, or all at once that the leader starts to be impacted. It also comes down to what you intend to do with your cluster - if it support BI tools then you may have a lot of cursors but this load on the leader is part of the cluster's intent. Issue often arise when the cluster's intent is to all things to everybody.

If your workload for Redshift is leader function heavy and you are efficiently using the leader node (no large literals, using COPY and UNLOAD, etc.) then high leader workload is what you want. You're getting the most out of the critical resource. However, most use Redshift to perform analytics on large data which is the function of the compute nodes. A highly loaded leader can detract significantly from this mission and needs to be addressed.

Another way that leader can get stressed is when clusters are configured with many smaller node types instead of fewer bigger nodes. Since the leader is the same size as the compute nodes many smaller nodes means you have a small leader doing the work. Something to consider but I'd make sure you don't have unneeded leader node stressers before investing in a resize.

2 of 2

Whenever you execute some commands which require calculation on the leader node, whether for dispatching data, computing statistics, or aggregating results from the workers, like COPY, UNLOAD, VACUUM, ANALYZE, you'll see an increase in CPU usage. More information about this here: https://docs.aws.amazon.com/redshift/latest/dg/c_high_level_system_architecture.html

repost.aws › knowledge-center › redshift-high-cpu

Troubleshoot high CPU usage on Amazon Redshift's leader node | AWS re:Post

October 11, 2022 - Amazon Redshift generates and compiles code for each query execution plan. Query compilation and recompilation are resource-intensive operations, and this can result in high CPU usage of the leader node.

Discussions

High CPU Utilization on Redshift Leader Node Despite No Active Queries.

Hello AWS Community, I'm experiencing an issue with my Amazon Redshift cluster where the leader node is consistently showing 99-100% CPU utilization, while the compute nodes remain below 30%. This... More on repost.aws

repost.aws

1

0

April 21, 2024

Redshift problems with sigma?

My guess would be it’s an issue with either your WLM configuration, or the queries are somehow bogging down the leader node as they aren’t efficient for Redshift. A leader at 100% for a month means someone is doing something wrong. Without providing queries or metrics it’d be tough to get actual help other than examples, but with some googling you’ll find a ton of answers like in this thread: https://stackoverflow.com/a/70217381 More on reddit.com

r/aws

2

March 13, 2024

High CPU usage of one Redshift node (not leader). How understand what is causing this imbalance?

Hi there, I have a problem that I can't solve yet. One node has high CPU load almost all the time, but I can't find any significant skew in data storage. What could it be? Is it possible to track ... More on repost.aws

repost.aws

1

0

October 11, 2024

Running Redshift at Scale

On using ra3, I agree generally, but for cost would recommend Refshift Serverless in Dev/QA for cost reasons unless you have steady workloads there · On CPU, even 1 query will cause CPU to hit 100% so I don’t consider it that helpful a metric on a well used cluster. More on news.ycombinator.com

news.ycombinator.com

7

32

November 18, 2023

repost.aws › knowledge-center › redshift-high-cpu-usage

Troubleshoot high CPU usage in Amazon Redshift | AWS re:Post

April 27, 2022 - Review your Redshift cluster workload. Maintain your data hygiene. Update your table design. Check for maintenance updates. Check for spikes in your leader node CPU usage. Use Amazon CloudWatch to monitor spikes in CPU utilization. ... An increased workload (due to more queries running). The increase in workload increases the number of database connections, causing higher query concurrency.

AWS

docs.aws.amazon.com › amazon redshift › management guide › amazon redshift provisioned clusters › monitoring amazon redshift cluster performance › viewing performance data › viewing cluster performance data

Viewing cluster performance data - Amazon Redshift

The following examples show some of the graphs that are displayed in the new Amazon Redshift console. CPU utilization – Shows the percentage of CPU utilization for all nodes (leader and compute).

repost.aws › questions › QU6TudAtMOSlasnDrHeuu-mA › high-cpu-utilization-on-redshift-leader-node-despite-no-active-queries

High CPU Utilization on Redshift Leader Node Despite No Active Queries. | AWS re:Post

April 21, 2024 - I'm experiencing an issue with my Amazon Redshift cluster where the leader node is consistently showing 99-100% CPU utilization, while the compute nodes remain below 30%. This issue has persisted for over 8 hours and began around the time of an automatic cluster restart during a maintenance window at 3:30 AM. Despite pausing all external data ingestion and manually restarting the cluster, the high CPU usage continues with no active user queries running.

Medium

medium.com › @israel.jerome › overcoming-aws-redshift-leader-node-bottleneck-strategies-for-enhanced-write-performance-b7c2304cdcc0

Overcoming AWS Redshift Leader Node Bottleneck: Strategies for Enhanced Write Performance | by Jerome Israel | Medium

August 10, 2023 - Common symptoms of a leader node bottleneck include slower query execution times and increased query commit queues. ... Redshift uses the Single Commit Queue Architecture to manage writes, handle query coordination and optimization in a distributed ...

reddit.com › r/aws › redshift problems with sigma?

r/aws on Reddit: Redshift problems with sigma?

March 13, 2024 -

I have inherited a redshift DW that is used by another team via sigma for data stuff. I noticed today that the leader node has been at 100% cpu for at least a month. sure enough, sigma is running crazy queries all day that take several minutes to execute. the 4 compute nodes hover at around 5%. These are all dc2.large. I'm a software engineer and not a database guy, so this stuff isn't my strong suit. But from what I see in the documentation, queries will only be executed on the compute nodes if the nodes contain data relevant to the query (?). So other than the usual suspects (indices, bad queries, etc.), could this have something to do with whatever strategy is being used to replicate data to the compute nodes? Can we control that with redshift? Any insights greatly appreciated.

My guess would be it’s an issue with either your WLM configuration, or the queries are somehow bogging down the leader node as they aren’t efficient for Redshift. A leader at 100% for a month means someone is doing something wrong. Without providing queries or metrics it’d be tough to get actual help other than examples, but with some googling you’ll find a ton of answers like in this thread: https://stackoverflow.com/a/70217381

1 of 2

3

2 of 2

I don't know anything about sigma. But I wonder if it's trying to extract large volumes of data via a select query. Are you able to see the queries it's running and get an idea of no. of rows returned? When you run a select, any data that is returned to the client has to go via the leader node. The only efficient way to get bulk data out of Redshift is by using UNLOAD. Doing this will parallelize the operation across the compute nodes. queries will only be executed on the compute nodes if the nodes contain data relevant to the query (?) Pretty much every query is executed on compute nodes. However poor data distribution can mean that one compute node is doing all the work. But that's not what you're seeing here.

1 Billion Technology

1billiontech.com › blog_AWS_Redshift_optimization.php

AWS Redshift Optimization – A Case Study

When a request comes to the leader node, it parses the query and generates an execution plan and a compiled code to be executed in the compute nodes. The compute nodes process the incoming requests in parallel. Each compute node has a dedicated CPU, memory and a storage. Each compute node can scale out/in and scale up/down (resizing the Redshift cluster).

Find elsewhere

Google Bing Mojeek

YouTube

youtube.com › watch

Understanding the Main Causes for High CPU Usage on Leader Nodes in Amazon Redshift - YouTube

01:49

Discover the key factors leading to high CPU usage on Amazon Redshift leader nodes and learn practical solutions to optimize your cluster performance.---This...

Published March 31, 2025

Views 2

Medium

medium.com › @KuldeepsinhVaghela › amazon-redshift-architecture-explained-leader-node-compute-nodes-and-performance-tuning-197ec98c6e7a

Amazon Redshift Architecture Explained: Leader Node, Compute Nodes, and Performance Tuning | by Kuldeepsinh Vaghela | Medium

April 24, 2025 - Amazon Redshift offers different node types, optimized for different workloads: Dense Compute (DC) Nodes: These nodes are designed for compute-intensive workloads with smaller data volumes, utilizing fast CPUs and SSDs for high performance.

repost.aws › questions › QUyalUXnVeQVGZ15sGD0gmmQ › high-cpu-usage-of-one-redshift-node-not-leader-how-understand-what-is-causing-this-imbalance

High CPU usage of one Redshift node (not leader). How understand what is causing this imbalance? | AWS re:Post

October 11, 2024 - Examine data distribution: Although you mentioned not finding significant skew in data storage, it's worth double-checking the data distribution across nodes. Run a query to identify tables with data skew or unsorted rows in your Redshift cluster. This can help pinpoint if certain tables are causing uneven workload distribution. Investigate longest-running queries: Use a diagnostic query to identify the longest-running queries in your cluster. This can help you pinpoint specific queries that might be causing the high CPU usage on the affected node.

Hacker News

news.ycombinator.com › item

Running Redshift at Scale | Hacker News

November 18, 2023 - On using ra3, I agree generally, but for cost would recommend Refshift Serverless in Dev/QA for cost reasons unless you have steady workloads there · On CPU, even 1 query will cause CPU to hit 100% so I don’t consider it that helpful a metric on a well used cluster.

Amazon Web Services

docs.amazonaws.cn › 亚马逊云科技 › amazon redshift › management guide › amazon redshift provisioned clusters › monitoring amazon redshift cluster performance › performance data in amazon redshift

Performance data in Amazon Redshift - Amazon Redshift

Amazon Redshift has the following two dimensions: Metrics that have a NodeID dimension are metrics that provide performance data for nodes of a cluster. This set of metrics includes leader and compute nodes. Examples of these metrics include CPUUtilization, ReadIOPS, WriteIOPS.

Artie

artie.com › blogs › best-practices-on-running-redshift-at-scale

Best Practices on Running Redshift at Scale

November 15, 2023 - Resize the cluster by adding more nodes or upgrading to a more powerful node type. Set up alerts to notify you when CPU utilization exceeds a threshold so you can take proactive steps. Use workload management (WLM) to prioritize workloads better such that fast running queries are not backlogged ...

AllCloud

allcloud.io › home › blog › 5 areas to consider for running an optimized redshift-based cloud data warehouse

5 Areas to Consider for Running an Optimized Redshift-Based Cloud Data Warehouse | AllCloud

March 24, 2020 - In addition, Redshift supports multi-node clusters so when your requirement grows, you can scale by just adding a node. You also get a leader node compute engine without any extra cost when you run your data warehouse in a multi-node cluster. Storage of each node is used for storing data in a distributed fashion to achieve high degree of parallel processing.

repost.aws › questions › QUH7rPexFJQGmJOf7qGbUnkg › redshift-copy-command-cpu-spikes

Redshift COPY command CPU spikes | AWS re:Post

1 of 1

While proper splitting of files is very important and highly recommended, it shouldn't cause a CPU spike across the cluster. What is usually the cause of a CPU spike like what you're describing is if you are loading into a table without any compression settings. The default setting for COPY is that [COMPUPDATE][1] is ON. What happens is that Redshift will take the incoming rows, run them through every compression setting we have and return the the appropriate (smallest) compression. To fix the issue, it's best to make sure that compression is applied to the target table of the COPY statement. Run [Analyze Compression][2] command if necessary to figure out what the compression should be and manually apply it to the DDL. For temporary tables LZO can be an excellent choice to choose because it's faster to encode on these transient tables than say ZSTD. Just to be sure also set COMPUPDATE OFF in the COPY statement. [1]: http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-load.html#copy-compupdate [2]: http://docs.aws.amazon.com/redshift/latest/dg/r_ANALYZE_COMPRESSION.html

repost.aws › questions › QUZzYMbU3lS2OPW8i2IO6wuw › redshift-cluster-what-type-of-node-leader-is

Redshift Cluster - what type of node Leader is? | AWS re:Post

1 of 3