DynamicFrame is safer when handling memory intensive jobs. "The executor memory with AWS Glue dynamic frames never exceeds the safe threshold," while on the other hand, Spark DataFrame could hit "Out of memory" issue on executors. (https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-debug-oom-abnormalities.html)

DynamicFrames are designed to provide maximum flexibility when dealing with messy data that may lack a declared schema. Records are represented in a flexible self-describing way that preserves information about schema inconsistencies in the data.

For example, with changing requirements, an address column stored as a string in some records might be stored as a struct in later rows. Rather than failing or falling back to a string, DynamicFrames will track both types and gives users a number of options in how to resolve these inconsistencies, providing fine grain resolution options via the ResolveChoice transforms.

DynamicFrames also provide a number of powerful high-level ETL operations that are not found in DataFrames. For example, the Relationalize transform can be used to flatten and pivot complex nested data into tables suitable for transfer to a relational database. In additon, the ApplyMapping transform supports complex renames and casting in a declarative fashion.

DynamicFrames are also integrated with the AWS Glue Data Catalog, so creating frames from tables is a simple operation. Writing to databases can be done through connections without specifying the password. Moreover, DynamicFrames are integrated with job bookmarks, so running these scripts in the job system can allow the script to implictly keep track of what was read and written.(https://github.com/aws-samples/aws-glue-samples/blob/master/FAQ_and_How_to.md)

Answer from Fang Zhang on Stack Overflow
Top answer
1 of 4
44

DynamicFrame is safer when handling memory intensive jobs. "The executor memory with AWS Glue dynamic frames never exceeds the safe threshold," while on the other hand, Spark DataFrame could hit "Out of memory" issue on executors. (https://docs.aws.amazon.com/glue/latest/dg/monitor-profile-debug-oom-abnormalities.html)

DynamicFrames are designed to provide maximum flexibility when dealing with messy data that may lack a declared schema. Records are represented in a flexible self-describing way that preserves information about schema inconsistencies in the data.

For example, with changing requirements, an address column stored as a string in some records might be stored as a struct in later rows. Rather than failing or falling back to a string, DynamicFrames will track both types and gives users a number of options in how to resolve these inconsistencies, providing fine grain resolution options via the ResolveChoice transforms.

DynamicFrames also provide a number of powerful high-level ETL operations that are not found in DataFrames. For example, the Relationalize transform can be used to flatten and pivot complex nested data into tables suitable for transfer to a relational database. In additon, the ApplyMapping transform supports complex renames and casting in a declarative fashion.

DynamicFrames are also integrated with the AWS Glue Data Catalog, so creating frames from tables is a simple operation. Writing to databases can be done through connections without specifying the password. Moreover, DynamicFrames are integrated with job bookmarks, so running these scripts in the job system can allow the script to implictly keep track of what was read and written.(https://github.com/aws-samples/aws-glue-samples/blob/master/FAQ_and_How_to.md)

2 of 4
11

You can refer to the documentation here: DynamicFrame Class. It says,

A DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially.

You want to use DynamicFrame when,

  • Data that does not conform to a fixed schema.

Note: You can also convert the DynamicFrame to DataFrame using toDF()

  • Refer here: def toDF
🌐
DEV Community
dev.to › tmyoda › spark-on-aws-glue-performance-tuning-2-glue-dynamicframe-vs-spark-dataframe-12gb
Spark on AWS Glue: Performance Tuning 2 (Glue DynamicFrame vs Spark DataFrame) - DEV Community
December 8, 2023 - Spark on AWS Glue: Performance Tuning 2 (Glue DynamicFrame vs Spark DataFrame) Spark on AWS Glue: Performance Tuning 3 (Impact of Partition Quantity) Spark on AWS Glue: Performance Tuning 4 (Spark Join) Spark on AWS Glue: Performance Tuning 5 (Using Cache) Let's compare them using the Parquet file which I created in the part 1. We will read a single large Parquet file and a highly partitioned Parquet file. with timer('df'): dyf = glueContext.create_dynamic_frame.from_options( "s3", { "paths": [ "s3://.../parquet-chunk-high/" ] }, "parquet", ) print(dyf.count()) with timer('df partition'): dyf = glueContext.create_dynamic_frame.from_options( "s3", { "paths": [ "s3:/.../parquet-partition-high/" ] }, "parquet", ) print(dyf.count()) 324917265 [df] done in 125.9965 s 324917265 [df partition] done in 55.9798 s ·
Discussions

Glue dynamicframes vs spark dynamicframe
Two big differences I know of are that DynamicFrames have tie ins with the glue catalog, and the rationalize() method is very nice for handling nested json data without having to write a bunch of explodes and such. Each have their own use cases though, neither is strictly always better than the other. More on reddit.com
🌐 r/dataengineering
3
4
February 2, 2025
amazon web services - write a spark dataframe or write a glue dynamic frame, which option is better in AWS Glue? - Stack Overflow
In AWS Glue, I read the data from data catalog in a glue dynamic frame. Then convert the dynamic frame to spark dataframe to apply schema transformations. To write the data back to s3 I have seen More on stackoverflow.com
🌐 stackoverflow.com
Performance of PySpark DataFrames vs Glue DynamicFrames - Stack Overflow
So I recently started using Glue and PySpark for the first time. The task was to create a Glue job that does the following: Load data from parquet files residing in an S3 bucket Apply a filter to ... More on stackoverflow.com
🌐 stackoverflow.com
April 20, 2022
AWS Glue, is using toDF costly?

Yes converting to / from DF is costly in terms of time, you also lose some of the glue specific features like better reliability for high memory pressure scenarios.

That being said, if you use the glue studio to do “add a column” you’ll see that even AWS uses toDf(), do a thing, fromDF(), everywhere.

I know lots of people just skip over glue’s dynamic frame entirely or they use it to get a dyf from the glue data catalog then immediately convert to a DF and just use that.

Dyf’s do also have the advantage of being forgiving when your data set has mixed data types for a single column. While this isn’t typically an issue once you’re passed the data cleaning stage of an ETL pipeline it can be invaluable for the data cleaning stage

More on reddit.com
🌐 r/aws
1
3
June 30, 2024
🌐
LinkedIn
linkedin.com › pulse › aws-glue-dynamicframes-advantages-over-spark-dataframes-hugo-tota
AWS Glue DynamicFrames: Advantages over Spark DataFrames
July 21, 2023 - AWS Glue DynamicFrames are optimized for performance within the AWS Glue ecosystem. They leverage Glue's query optimization capabilities to enhance data processing speed and resource utilization.
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframe class
DynamicFrame class - AWS Glue
A DynamicFrame is similar to a DataFrame, except that each record is self-describing, so no schema is required initially. Instead, AWS Glue computes a schema on-the-fly when required, and explicitly encodes schema inconsistencies using a choice (or union) type.
🌐
Medium
medium.com › @mahakg290399 › aws-glue-dynamicframe-vs-spark-dataframe-when-to-use-which-50f7776d8ceb
AWS Glue DynamicFrame vs Spark DataFrame: When to Use Which? | by Mahak Goyal | Medium
August 17, 2025 - DataFrames are statically typed with an enforced schema, which improves performance due to Spark’s Catalyst optimizer. A DynamicFrame is an AWS Glue-specific abstraction built on top of DataFrames, designed for semi-structured data and schema-evolving datasets commonly found in data lakes.
🌐
GitHub
github.com › aws-samples › aws-glue-samples › blob › master › FAQ_and_How_to.md
aws-glue-samples/FAQ_and_How_to.md at master · aws-samples/aws-glue-samples
Since DataFrames do not have the type flexibility that DynamicFrames do, you have to resolve the choice type in your DynamicFrame before conversion. Glue provides a transformation called ResolveChoice with the following signature: ResolveChoice.apply(self, frame, specs = None, choice = "", ...
Author   aws-samples
Find elsewhere
🌐
Medium
medium.com › @mailtoharshachunduru › glue-dynamic-frame-vs-apache-spark-scala-data-frames-a38229bb9af3
Glue Dynamic frame vs Apache spark scala Data frames | by Harsha Chunduru | Medium
March 22, 2023 - Both frameworks can handle big data processing, but Glue Dynamic Frame is better suited for larger datasets as it is built on top of AWS Glue, which is a serverless ETL service that can scale automatically based on the workload.
🌐
Jayendra's Cloud Certification Blog
jayendrapatil.com › tag › dynamic-frames
Dynamic Frames - AWS Glue
Conversion can be done between Dynamic frames and Spark dataframes, to take advantage of both AWS Glue and Spark transformations to do the kinds of analysis needed. AWS Glue enables performing ETL operations on streaming data using continuously-running jobs.
🌐
Stack Overflow
stackoverflow.com › questions › 71940964 › performance-of-pyspark-dataframes-vs-glue-dynamicframes
Performance of PySpark DataFrames vs Glue DynamicFrames - Stack Overflow
April 20, 2022 - def AddColumn(r): if r["option_type"] == 'S': r["option_code_derived"]= 'S'+ r["option_code_4"] elif r["option_type"] == 'P': r["option_code_derived"]= 'F'+ r["option_code_4"][1:] elif r["option_type"] == 'L': r["option_code_derived"]= 'P'+ r["option_code_4"] else: r["option_code_derived"]= None return r glueContext = GlueContext(create_spark_context(role_arn=args['role_arn'])) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) inputGDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": [source_path], "r
🌐
Medium
medium.com › @irshadalamtech › difference-between-pyspark-dataframe-and-dynamicframe-f5c863201afd
Difference between pyspark dataframe and aws glue dynamicframe | by Irshad Alam | Medium
November 26, 2024 - PySpark DataFrame: 1. Requires a schema to be defined upfront 2. Can handle nested data, but it can be cumbersome to work with 3. Can handle schema evolution, but it can be complex and error-prone 4. Generally faster than DynamicFrame · AWS Glue ...
🌐
BMC Software
bmc.com › blogs › aws-glue-etl-transformations
AWS Glue ETL Transformations – BMC Software | Blogs
August 21, 2020 - As we can turn DynamicFrames into Spark Dataframes, we can go the other way around.
🌐
GitHub
github.com › aws-samples › aws-glue-samples › issues › 49
AWS Glue error converting data frame to dynamic frame · Issue #49 aws-samples/aws-glue-samples · GitHub
June 11, 2019 - Here's my code where I am trying to create a new data frame out of the result set of my left join on other 2 data frames and then trying to convert it to a dynamic frame. dfs = sqlContext.read.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("query", "SELECT hashkey as hash From randomtable").load() #Source datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test", table_name = "randomtable", transformation_ctx = "datasource0") #add hash value df = datasource0.toDF() df.cache() df = df.withColumn("hashkey", sha2(concat_ws("||", *df.columns), 256)) #drop dupes df1 = df.dropDuplicates(subset=['hashkey']) #read incremental data inc = df1.join(dfs, df1["hashkey"] == dfs["hash"], how='left').filter(col('hash').isNull()) #convert it back to glue context datasource1 = DynamicFrame.fromDF(inc, glueContext, "datasource1")
Author   shanmukhakota
🌐
AWS re:Post
forums.aws.amazon.com › thread.jspa
Forums | AWS re:Post
November 30, 2018 - New user sign up using AWS Builder ID · New user sign up using AWS Builder ID is currently unavailable on re:Post. To sign up, please use the AWS Management Console instead
🌐
AWS re:Post
repost.aws › questions › QU51ax-cVsROSuk9-tWgOhQQ › using-pandas-in-glue-etl-job-how-to-convert-dynamic-dataframe-or-pyspark-dataframe-to-pandas-dataframe
Using Pandas in Glue ETL Job ( How to convert Dynamic DataFrame or PySpark Dataframe to Pandas Dataframe) | AWS re:Post
April 29, 2022 - Extract Transform & Load DataAWS Glue · Language · English · bfeeny · asked 4 years ago13K views · 1 Answer · Newest · Most votes · Most comments · Are these answers helpful? Upvote the correct answer to help the community benefit from your knowledge. 0 · Accepted Answer · Would say convert Dynamic frame to Spark data frame using .ToDF() method and from spark dataframe to pandas dataframe using link https://sparkbyexamples.com/pyspark/convert-pyspark-dataframe-to-pandas/#:~:text=Convert PySpark Dataframe to Pandas DataFrame,small subset of the data.
🌐
Kindatechnical
kindatechnical.com › aws-glue › lesson-37-understanding-dynamicframe.html
Kinda Technical | A Guide to AWS Glue - Understanding DynamicFrame
# Relationalizing a DynamicFrame ... important to consider performance optimizations: Parallel Processing: DynamicFrames are designed to work in parallel, which can significantly speed up processing....
🌐
Medium
medium.com › @mistryshaileshj › dynamicframe-vs-spark-dataframe-why-aws-glue-wins-at-schema-drift-d65fbeda3cb2
DynamicFrame vs Spark DataFrame: Why AWS Glue Wins at Schema Drift ?
July 31, 2025 - So, What should be the Ideal Strategy? -> Start with DynamicFrame for messy data + schema handling -> Convert to DataFrame for performance-heavy logic -> Convert back to DynamicFrame if you need to write to the Glue Catalog