🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframewriter class
DynamicFrameWriter class - AWS Glue
txId = glueContext.start_transaction(read_only=False) glueContext.write_dynamic_frame.from_catalog( frame=dyf, database = db, table_name = tbl, transformation_ctx = "datasource0", additional_options={"transactionId":txId}) ... glueContext.commit_transaction(txId) Writes a DynamicFrame using the specified JDBC connection information. frame – The DynamicFrame to write. catalog_connection – A catalog connection to use. connection_options – Connection options, such as path and database table (optional). redshift_tmp_dir – An Amazon Redshift temporary directory to use (optional).
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › features and optimizations for programming aws glue for spark etl scripts › connection types and options for etl in aws glue for spark › redshift connections
Redshift connections - AWS Glue
You will identify your Amazon S3 temporary directory with redshift_tmp_dir. You will also provide rs-role-name using the aws_iam_role key in the additional_options parameter. glueContext.write_dynamic_frame.from_catalog( frame = input dynamic frame, database = "redshift-dc-database-name", table_name = "redshift-table-name", redshift_tmp_dir = args["temp-s3-dir"], additional_options = {"aws_iam_role": "arn:aws:iam::account-id:role/rs-role-name"})
Discussions

Glue etl job fails to write to Redshift using dynamic frame - reason ?
We are observing that writing to redshift using glue dynamic frame errors out when the input file >1GB. **Setup :** Redshift Cluster : 2 node DC2 **Glue job** temp_df = glueContext.create_dyn... More on repost.aws
🌐 repost.aws
1
0
April 25, 2020
Write AWS Glue DynamicFrame to redshift table - Stack Overflow
I have a dynamic frame with following schema root |-- source_id: long |-- scrape_timestamp_last: timestamp |-- scrap_timestamp_orig: timestamp |-- job_id_init: string |-- post_date: timestamp |-- More on stackoverflow.com
🌐 stackoverflow.com
Glue and the write_dynamic_frame preactions and postactions options help
You can use a lambda to send your s3 put events to an SQS queue throughout the day. When it's time for your Glue batch to run, have it read the messages from SQS, create a list of keys you want to read and ingest them into a dataframe at once. You can dedupe at this stage or after loading to redshift. My hunch is you should be running the pre/stages once (aka doing one big write to Redshift rather than once for file) There are some caveats to doing it this way like visibility timeouts on SQS, but it's a decent way to batch up work. More on reddit.com
🌐 r/aws
3
3
February 15, 2020
AWS GLUE - JOB ERROR
Hello, I'm stuck with this error and I can't find anything help full. I'm trying to migrate data between s3 to Redshift, Note: i crawled both and both tables are in my glue databases but when i'm r... More on repost.aws
🌐 repost.aws
1
0
February 17, 2023
People also ask

Does AWS Glue support Redshift?
Yes, AWS Glue supports Amazon Redshift. You can use AWS Glue to extract, transform, and load (ETL) data into Redshift, as well as to read data from Redshift for further processing or analysis.
🌐
hevodata.com
hevodata.com › home › learn › data integration
AWS Glue to Redshift Integration: 4 Easy Steps (With Code)
What role does S3 play in AWS Glue to Redshift migration?
Amazon S3 acts as a temporary storage or staging layer during the data transfer. AWS Glue writes data to S3 before Redshift’s COPY command ingests it into the Data Warehouse.
🌐
hevodata.com
hevodata.com › home › learn › data integration
AWS Glue to Redshift Integration: 4 Easy Steps (With Code)
Can I automate Redshift data loading without writing code?
Yes. Using a no-code platform like Hevo Data, you can automate Redshift data loading from 150+ sources, apply transformations, handle schema changes, and monitor pipelines in real time—without any coding or manual setup.
🌐
hevodata.com
hevodata.com › home › learn › data integration
AWS Glue to Redshift Integration: 4 Easy Steps (With Code)
🌐
GitHub
github.com › awsdocs › aws-glue-developer-guide › blob › master › doc_source › aws-glue-programming-etl-redshift.md
aws-glue-developer-guide/doc_source/aws-glue-programming-etl-redshift.md at master · awsdocs/aws-glue-developer-guide
You can also specify a role when you use a dynamic frame and you use copy_from_options. The syntax is similar, but you put the additional parameter in the connection_options map. my_conn_options = { "url": "jdbc:redshift://host:port/redshift database name", "dbtable": "redshift table name", "user": "username", "password": "password", "redshiftTmpDir": args["TempDir"], "aws_iam_role": "arn:aws:iam::account id:role/role name" } df = glueContext.create_dynamic_frame_from_options("redshift", my_conn_options)
Author   awsdocs
🌐
Hevo
hevodata.com › home › learn › data integration
AWS Glue to Redshift Integration: 4 Easy Steps (With Code)
December 15, 2025 - glueContext.write_dynamic_frame.from_catalog( database = "database-name", table_name = "table-name", redshift_tmp_dir = args["TempDir"], additional_options = {"aws_iam_role": "arn:aws:iam::account-id:role/role-name"})
🌐
AWS re:Post
repost.aws › questions › QUWbskjPo9SOK7otb_eeTv5A › glue-etl-job-fails-to-write-to-redshift-using-dynamic-frame-reason
Glue etl job fails to write to Redshift using dynamic frame - reason ? | AWS re:Post
April 25, 2020 - Setup : Redshift Cluster : 2 node DC2 Glue job · temp_df = glueContext.create_dynamic_frame.from_options(connection_type="s3", format="csv", connection_options={"paths": [source]}, format_options={"withHeader": True, "separator": ","}, transformation_ctx="path={}".format(source)).toDF() redshift_df = DynamicFrame.fromDF(output_df, glueContext, "redshift_df") datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame=redshift_df, catalog_connection="pilot-rs", connection_options={"preactions": "truncate table tablename;", "dbtable": "tablename", "database": "dev"}, redshift_tmp_dir='s3://bucket/path/', transformation_ctx="datasink4")
🌐
GitHub
github.com › aws-samples › aws-glue-samples › blob › master › examples › join_and_relationalize.py
aws-glue-samples/examples/join_and_relationalize.py at master · aws-samples/aws-glue-samples
print("Writing to Redshift table: ", df_name, " ...") glueContext.write_dynamic_frame.from_jdbc_conf(frame = m_df, catalog_connection = "redshift3", connection_options = {"dbtable": df_name, "database": "testdb"}, redshift_tmp_dir = redshift_temp_dir)
Author   aws-samples
🌐
Aws-dojo
aws-dojo.com › ws9 › labs › script-to-move-data-s3-to-s3
AWS Dojo - Workshop - Building AWS Glue Job using PySpark - Part:2(of 2)
For instance, the following code snippet will load productlineDF to a Redshift database which is connected using dojoconnection Glue connection. glueContext.write_dynamic_frame.from_jdbc_conf(productlineDF, catalog_connection = "dojoconnection", connection_options = {"dbtable": "products", "database": "dojodatabase"}, redshift_tmp_dir = "s3://dojo-data-lake/data/script")
🌐
Aws-dojo
aws-dojo.com › ws30 › labs › write-code
AWS Dojo - Workshop - Using Amazon Redshift in AWS based Data Lake
glueContext.write_dynamic_frame.from_options(dojodfmini, connection_type = "s3", connection_options = {"path": "s3://dojo-rs-bkt/data"}, format = "csv") Next Run the following PySpark code snippet to write dojodfmini data to the Redshift database with the table name dojotablemini.
Find elsewhere
🌐
Stack Overflow
stackoverflow.com › questions › 71977566 › write-aws-glue-dynamicframe-to-redshift-table
Write AWS Glue DynamicFrame to redshift table - Stack Overflow
I am writing this frame to this redshift table as following code snippet: dest_table = "<redshift_schema>.<redshift_table>" pre_actions = f"DELETE FROM {dest_table} WHERE 1=1" datasink = glueContext.write_dynamic_frame.from_jdbc_conf( frame=<data_frame>, catalog_connection="redshift_connection", connection_options={ "preactions": pre_actions, "dbtable": dest_table, "database": "<redshift_database>", }, redshift_tmp_dir=args["TempDir"], transformation_ctx="datasink", ) job.commit()
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframereader class
DynamicFrameReader class - AWS Glue
March 12, 2026 - connection_options = {"url": "jdbc-url/database", "user": "username", "password": passwordVariable,"dbtable": "table-name", "redshiftTmpDir": "s3-tempdir-path"}
🌐
AWS re:Post
repost.aws › knowledge-center › sql-commands-redshift-glue-job
Run SQL commands on Amazon Redshift for an AWS Glue job | AWS re:Post
December 10, 2024 - Truncate an Amazon Redshift table before inserting records in AWS Glue · Use the preactions parameter. Python example: datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame= datasource0, catalog_connection = "test_red", connection_options = {"preactions":"truncate table schema.target_table;","dbtable": "schema.target_table", "database": "redshiftdb"}, redshift_tmp_dir = 's3://s3path', transformation_ctx = "datasink4") Scala example: val options = JsonOptions(Map( "dbtable" -> "schema.target_table", "database" -> "redshiftdb", "preactions" -> "truncate table schema.target_table;" )) glueContext.getJDBCSink(catalogConnection = "test_red", options = options, redshiftTmpDir = 's3://s3path', transformationContext = "datasource0").writeDynamicFrame(datasource0) In the preceding examples, replace the following values: test_red: The catalog connection to use.
🌐
GitHub
github.com › awsdocs › aws-glue-developer-guide › blob › master › doc_source › aws-glue-api-crawler-pyspark-extensions-dynamic-frame-writer.md
aws-glue-developer-guide/doc_source/aws-glue-api-crawler-pyspark-extensions-dynamic-frame-writer.md at master · awsdocs/aws-glue-developer-guide
December 11, 2018 - redshift_tmp_dir – An Amazon Redshift temporary directory to use (optional). transformation_ctx – A transformation context to use (optional). This example writes the output locally using a connection_type of S3 with a POSIX path argument in connection_options, which allows writing to local storage. glueContext.write_dynamic_frame.from_options(\ frame = dyf_splitFields,\ connection_options = {'path': '/home/glue/GlueLocalOutput/'},\ connection_type = 's3',\ format = 'json')
Author   awsdocs
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › gluecontext class
GlueContext class - AWS Glue
frame_or_dfc – The DynamicFrame or DynamicFrameCollection to write. catalog_connection – A catalog connection to use. connection_options – Connection options, such as path and database table (optional). For more information, see Connection types and options for ETL in AWS Glue for Spark. redshift_tmp_dir – An Amazon Redshift temporary directory to use (optional).
🌐
Reddit
reddit.com › r/aws › glue and the write_dynamic_frame preactions and postactions options help
r/aws on Reddit: Glue and the write_dynamic_frame preactions and postactions options help
February 15, 2020 -

I am working with a large number of files that hit S3 throughout the the day from several sources. The are all the same format but can have overlapping records, the good news is that when the records do overlap the are duplicates.

The destination for my ETL is redshift and I am very comfortable with the stage / dedupe / merge techniques.

to help control costs i want to fire my glue jobs on a schedule rather than triggering on files arriving. So when the job runs I may have 10-100 files to process all with potential for some duplicate records. I am typically using bookmarks and this all works nice when i do not have the potential for duplicates.

My goal is to use the options above as per https://aws.amazon.com/premiumsupport/knowledge-center/sql-commands-redshift-glue-job/ but this only works if the pre and post jobs are run PER FILE. so Is my glue job issuing a COPY command per file or is it reading all of the available files into the dataframe and performing a single COPY with the pre and post command run one time ?

🌐
Medium
medium.com › codex › using-aws-glue-to-stream-dynamodb-to-redshift-serverless-d339f79c34ff
Using AWS Glue to Stream DynamoDB to Redshift Serverless | by Zijing Zhao | CodeX | Medium
December 6, 2024 - Write the data into Redshift via glueContext.write_dynamic_frame.from_options · For the second step of transforming data, the script utilizes the Nested Map mentioned earlier.
🌐
AWS re:Post
repost.aws › questions › QUXjM5rP_CRti4OEiukbUCTA › aws-glue-job-error
AWS GLUE - JOB ERROR | AWS re:Post
February 17, 2023 - Sounds the job gives up waiting for Redshift to load the temporary csv files from s3, please check the full stack trace and check in Redshift what happened with the "COPY" command, did it finish or error? how long it took? ... Are these answers helpful? Upvote the correct answer to help the community benefit from your knowledge. ... Adding to Gonzalo's comment. You need more information from the logs to identify the exact issue. This error is happening when the write happens - your statement probably uses this method- glueContext.write_dynamic_frame.from_options
Top answer
1 of 6
23

Job bookmarks are the key. Just edit the job and enable "Job bookmarks" and it won't process already processed data. Note that the job has to rerun once before it will detect it does not have to reprocess the old data again.

For more info see: http://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html

The name "bookmark" is a bit far fetched in my opinion. I would have never looked at it if I did not coincidentally stumble upon it during my search.

2 of 6
10

This was the solution I got from AWS Glue Support:

As you may know, although you can create primary keys, Redshift doesn't enforce uniqueness. Therefore, if you are rerunning Glue jobs then duplicate rows can get inserted. Some of the ways to maintain uniqueness are:

  1. Use a staging table to insert all rows and then perform a upsert/merge [1] into the main table, this has to be done outside of glue.

  2. Add another column in your redshift table [1], like an insert timestamp, to allow duplicate but to know which one came first or last and then delete the duplicate afterwards if you need to.

  3. Load the previously inserted data into dataframe and then compare the data to be insert to avoid inserting duplicates[3]

[1] - http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html and http://www.silota.com/blog/amazon-redshift-upsert-support-staging-table-replace-rows/

[2] - https://github.com/databricks/spark-redshift/issues/238

[3] - https://kb.databricks.com/data/join-two-dataframes-duplicated-columns.html

🌐
Stack Overflow
stackoverflow.com › questions › 79323274 › writing-data-from-aws-glue-catalog-to-redshift
jdbc - writing data from aws glue catalog to redshift - Stack Overflow
dyf = glueContext.create_dynamic_frame.from_catalog( database=catalog_db, table_name=f"{table}", push_down_predicate=f"day = {day}" ) glueContext.write_dynamic_frame.from_jdbc_conf( frame=DynamicFrame.fromDF(df, glueContext, "trx_df"), catalog_connection="redshift_connection", connection_options={ "database": database, "dbtable": f"{schema}.{table}" }, redshift_tmp_dir= "s3://"
🌐
Medium
medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a
AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium
February 12, 2025 - glueContext.create_dynamic_frame_from_options — created with the specified connection and format. Example — The connection type, such as Amazon S3, Amazon Redshift, and JDBC
🌐
Progress Software
progress.com › tutorials › jdbc › accessing-data-using-jdbc-on-aws-glue
Accessing Data using JDBC on AWS Glue Example Tutorial
December 2, 2024 - ##Write Dynamic Frames to S3 in CSV format. You can write it to any rds/redshift, by using the connection that you have defined previously in Glue · datasink4 = glueContext.write_dynamic_frame.from_options(frame = dynamic_dframe, connection_type ...