🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframe class
DynamicFrame class - AWS Glue
If the staging frame has matching ... in AWS Glue. stage_dynamic_frame – The staging DynamicFrame to merge. primary_keys – The list of primary key fields to match records from the source and staging dynamic frames. transformation_ctx – A unique string that is used to retrieve metadata about the current transformation (optional)...
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframewriter class
DynamicFrameWriter class - AWS Glue
Writes a DynamicFrame using the specified connection and format. ... connection_type – The connection type. Valid values include s3, mysql, postgresql, redshift, sqlserver, and oracle. connection_options – Connection options, such as path and database table (optional).
Discussions

json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow
Summary: I've got a S3 bucket which contains list of JSON files. Bucket contains child folders which are created by date. All the files contain similar file structure. Files get added on daily basis. More on stackoverflow.com
🌐 stackoverflow.com
aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key
Hi AWS expert, I have a code read data from AWS aurora PostgreSQL, I want to bookmark the table with custom column named 'ceres_mono_index'. But it seems like the bookmark is still uses the primar... More on repost.aws
🌐 repost.aws
2
0
March 18, 2023
amazon web services - AWS Glue - Create Dynamic Frame for Dynamo DB Table having columns with mixed type - Stack Overflow
0 What can be an efficient design for a rule based engine in pypark running in AWS glue and rule repository as Dynamo DB? 1 Create dynamic frame from options (from rds - mysql) providing a custom query with where clause More on stackoverflow.com
🌐 stackoverflow.com
dataframe - Create dynamic frame from options (from rds - mysql) providing a custom query with where clause - Stack Overflow
I want to create a DynamicFrame in my Glue job from an Aurora-rds mysql table. Can I create DynamicFrame from my rds table using a custom query - having a where clause? I dont want to read the entire More on stackoverflow.com
🌐 stackoverflow.com
🌐
Sqlandhadoop
sqlandhadoop.com › aws-glue-create-dynamic-frame
AWS Glue create dynamic frame – SQL & Hadoop
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate()) # creating dynamic frame from S3 data dyn_frame_s3 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": ["s3://<bucket name>/data/sales/"], "inferSchema": "true" }, format = "csv", format_options={ "separator": "\t" }, transformation_ctx="") print (dyn_frame_s3.count()) # creating dynamic frame from Glue catalog table dyn_frame_catalog = glueContext.create_dynamic_frame_from_catalog( database = "db_readfile", table_name = "sales", transformation_ctx = "") print (dyn_frame_catalog.count())
🌐
GitHub
github.com › awslabs › aws-glue-libs › blob › master › awsglue › dynamicframe.py
aws-glue-libs/awsglue/dynamicframe.py at master · awslabs/aws-glue-libs
def from_catalog(self, frame, database ... = {}, catalog_id = None, **kwargs): """Creates a DynamicFrame with the specified catalog name space and table name....
Author   awslabs
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframereader class
DynamicFrameReader class - AWS Glue
March 12, 2026 - Valid values include s3, mysql, postgresql, redshift, sqlserver, oracle, dynamodb, and snowflake. connection_options – Connection options, such as path and database table (optional). For more information, see Connection types and options for ETL in AWS Glue for Spark .
🌐
Medium
medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a
AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium
February 12, 2025 - #create DynamicFame from S3 parquet files datasource0 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": [S3_location] }, format="parquet", transformation_ctx="datasource0")#create DynamicFame from glue catalog datasource0 = glueContext.create_dynamic_frame.from_catalog( database = "demo", table_name = "testtable", transformation_ctx = "datasource0")#convert to spark DataFrame #convert to Glue DynamicFrame df1 = datasource0.toDF() df2 = DynamicFrame.fromDF(df1, glueContext , "df2") df = dynamic_frame.toDF() df.show() print("Dataframe converte
🌐
Medium
medium.com › today-i-learnt › til-aws-glue-dynamic-dataframe-tips-todf-use-resolvechoice-for-mixed-data-types-in-a-column-374775d0c092
TIL: AWS Glue Dynamic Dataframe Tips toDf() — Use ResolveChoice for Mixed Data types in a column | by Satyaprakash Bommaraju | Today I Learnt | Medium
February 18, 2023 - raw_data_dydf = glueContext.create_dynamic_frame.from_options( format_options={"multiline": False}, connection_type="s3", format="json", connection_options={ "paths": [input_path], "recurse": False, }, transformation_ctx="raw_data", )
Find elsewhere
🌐
Stack Overflow
stackoverflow.com › questions › 74734233 › create-dynamic-frame-from-s3-bucket-aws-glue
json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow
Question I am trying to create dynamic frame from options where source is S3 and type is JSON. I'm using following code however it is not returning any value. Where am I going wrong? ... import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from functools import reduce from awsglue.dynamicframe import DynamicFrame ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) df = glueContext.create_dynamic_frame.from_options( connection_type = 's3', connection_options={'paths':['Location for S3 folder']}, format='json', # formatOptions=$..* ) print('Total Count:') df.count()
🌐
AWS re:Post
repost.aws › questions › QU3rukJUaHRpiMNjydfqLZgw › aws-glue-create-dynamic-frame-from-data-in-postgresql-with-custom-bookmark-key
aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key | AWS re:Post
March 18, 2023 - Hi AWS expert, I have a code read data from AWS aurora PostgreSQL, I want to bookmark the table with custom column named 'ceres_mono_index'. But it seems like the bookmark is still uses the primary key as the bookmark key instead of column 'ceres_mono_index'. Here is the code · cb_ceres = glueContext.create_dynamic_frame.from_options( connection_type="postgresql", connection_options={ "url": f"jdbc:postgresql://{ENDPOINT}:5432/{DBNAME}", "dbtable": "xxxxx_raw_ceres", "user": username, "password": password, }, additional_options={"jobBookmarkKeys": ["ceres_mono_index"], "jobBookmarkKeysSortOrder": "asc"}, transformation_ctx="cb_ceres_bookmark", )
🌐
Medium
swapnil-bhoite.medium.com › aws-glue-dynamicframe-transformations-with-example-code-and-output-26e14d13145f
AWS Glue DynamicFrame transformations with example code and output | by Swapnil Bhoite | Medium
April 28, 2022 - dyF = glueContext.create_dynamic_frame.from_options( 's3', {'paths': ['s3://awsglue-datasets/examples/medicare/Medicare_Hospital_Provider.csv']}, 'csv', {'withHeader': True})dyF.printSchema()root |-- DRG Definition: string |-- Provider Id: string |-- Provider Name: string |-- Provider Street Address: string |-- Provider City: string |-- Provider State: string |-- Provider Zip Code: string |-- Hospital Referral Region Description: string |-- Total Discharges: string |-- Average Covered Charges: string |-- Average Total Payments: string |-- Average Medicare Payments: string
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue python code samples › code example: data preparation using resolvechoice, lambda, and applymapping
Code example: Data preparation using ResolveChoice, Lambda, and ApplyMapping - AWS Glue
AWS Glue makes it easy to write the data in a format such as Apache Parquet that relational databases can effectively consume: glueContext.write_dynamic_frame.from_options( frame = medicare_nest_dyf, connection_type = "s3", connection_options = {"path": "s3://glue-sample-target/output-dir/medicare_parquet"}, format = "parquet")
🌐
AWS re:Post
repost.aws › articles › ARQSOCWRuiSI6KdxyvcVBKPw › aws-glue-dynamic-frame-jdbc-performance-tuning-configuration
AWS Glue Dynamic Frame – JDBC Performance Tuning Configuration | AWS re:Post
June 2, 2023 - 3. Query Push Down Queries with multiple tables can be executed with a similar approach as the second option as given below. ‘hashexpression’: AWS Glue generates SQL queries to read the JDBC data in parallel using the hashexpression in the WHERE clause to partition data. In case of queries with multiple tables assign a numeric attribute with distinct identifier (table_name.column_name, as given below). Code Snippet: JDBC_DF_QUERY = glueContext.create_dynamic_frame.from_catalog( database="dms_large", table_name="dms_large_dbo_sporting_event", transformation_ctx="JDBC_DF_QUERY", additional_options = {"hashpartitions": "20" ,"hashfiled":"pr.id","hashexpression":"pr.id", "enablePartitioningForSampleQuery":True, "sampleQuery":"select pr.id, fl.full_name from dms_large.dbo.person pr inner join dms_large.dbo.person_full fl on pr.id = fl.id and"} )
Top answer
1 of 2
3

Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

  • The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
  • The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

2 of 2
0

The way I was able to provide a custom query was by creating a Spark DataFrame and specifying it with options: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

Then transform that DataFrame into a DynamicFrame using said class: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

tmp_data_frame = spark.read.format("jbdc")
.option("url", jdbc_url)
.option("user", username)
.option("password", password)
.option("query", "select * from test where id<100")
.load()

dynamic_frame = DynamicFrame.fromDF(tmp_data_frame, glueContext)
🌐
Stack Overflow
stackoverflow.com › questions › 74974372 › gluecontext-create-dynamic-frame-from-options-exclude-one-file-type-from-loading
amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow
exclusions parameter will help you to exclude files on connection_options object https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-s3 · raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id) df = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options={ "paths": [raw_data_input_path], "recurse": True, "exclusions": ["**.txt"], }, format="json", transformation_ctx=dbInstance, )
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › tutorial: writing an aws glue for spark script
Tutorial: Writing an AWS Glue for Spark script - AWS Glue
You perform this operation by creating a target node in the AWS Glue Studio visual editor. In this step, you provide the write_dynamic_frame.from_options method a connection_type, connection_options, format, and format_options to load data into a target bucket in Amazon S3.
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › programming aws glue etl scripts in scala › apis in the aws glue scala library › aws glue scala dynamicframe apis › aws glue scala dynamicframe class
AWS Glue Scala DynamicFrame class - AWS Glue
DynamicFrames provide a range of transformations for data cleaning and ETL. They also support conversion to and from SparkSQL DataFrames to integrate with existing code and the many analytics operations that DataFrames provide. The following parameters are shared across many of the AWS Glue transformations that construct DynamicFrames:
🌐
Reddit
reddit.com › r/aws › gluecontext create_dynamic_frame_from_options exclude one file?
r/aws on Reddit: glueContext create_dynamic_frame_from_options exclude one file?
January 1, 2023 -
    raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id)

    df = glueContext.create_dynamic_frame_from_options(connection_type="s3",
                                                                connection_options={"paths": [raw_data_input_path],
                                                                                    "recurse": True},
                                                                format="json",
                                                                transformation_ctx=dbInstance)

Structure under component_id: The below folders contains jsons.

But now i've added a watermark.txt file, how do i exclude this particular file from the paths to recurse on, inside component_id?

I can't put this file in any other folder, is the only way to accomplish this is to put all ts=.. folders inside one folder called data?

🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframecollection class
DynamicFrameCollection class - AWS Glue
Uses a passed-in function to create and return a new DynamicFrameCollection based on the DynamicFrames in this collection. callable – A function that takes a DynamicFrame and the specified transformation context as parameters and returns a DynamicFrame. transformation_ctx – A transformation ...