Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

  • The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
  • The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

Answer from Jon Legendre on Stack Overflow
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframereader class
DynamicFrameReader class - AWS Glue
March 12, 2026 - Reads a DynamicFrame from a Resilient Distributed Dataset (RDD). ... Reads a DynamicFrame using the specified connection and format. connection_type – The connection type. Valid values include s3, mysql, postgresql, redshift, sqlserver, oracle, dynamodb, and snowflake. connection_options – Connection options, such as path and database table (optional).
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › gluecontext class
GlueContext class - AWS Glue
Returns a DynamicFrame that is created using a Data Catalog database and table name. When using this method, you provide format_options through table properties on the specified AWS Glue Data Catalog table and other options through the additional_options argument. Database – The database ...
Discussions

json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow
Question I am trying to create dynamic frame from options where source is S3 and type is JSON. I'm using following code however it is not returning any value. More on stackoverflow.com
🌐 stackoverflow.com
dataframe - Create dynamic frame from options (from rds - mysql) providing a custom query with where clause - Stack Overflow
I want to create a DynamicFrame in my Glue job from an Aurora-rds mysql table. Can I create DynamicFrame from my rds table using a custom query - having a where clause? I dont want to read the entire More on stackoverflow.com
🌐 stackoverflow.com
amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow
my Bucket key contains 10 json files 1 txt file, i want to only include json files in the dynamic frame. Is that what the 'format' param is for in create_dynamic_frame_from_options More on stackoverflow.com
🌐 stackoverflow.com
aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key
Move the arguments to 'connection_options' works ... if I use crawler to crawl rds database, how do I do the incremental load? in Crawler definition, I can only CRAWL NEW FOLDER ONLY and it is not for RDS databases, correct? ... AWS glue creates full data from oracle source to s3 target every time even when there is a job bookmark ... How to replicate Glue bookmark in custom spark aws glue script, where i am not able to use dynamic frame ... More on repost.aws
🌐 repost.aws
2
0
March 18, 2023
🌐
Sqlandhadoop
sqlandhadoop.com › aws-glue-create-dynamic-frame
AWS Glue create dynamic frame – SQL & Hadoop
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate()) # creating dynamic frame from S3 data dyn_frame_s3 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": ["s3://<bucket name>/data/sales/"], "inferSchema": "true" }, format = "csv", format_options={ "separator": "\t" }, transformation_ctx="") print (dyn_frame_s3.count()) # creating dynamic frame from Glue catalog table dyn_frame_catalog = glueContext.create_dynamic_frame_from_catalog( database = "db_readfile", table_name = "sales", transformation_ctx = "") print (dyn_frame_catalog.count())
🌐
Stack Overflow
stackoverflow.com › questions › 74734233 › create-dynamic-frame-from-s3-bucket-aws-glue
json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from functools import reduce from awsglue.dynamicframe import DynamicFrame ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) df = glueContext.create_dynamic_frame.from_options( connection_type = 's3', connection_options={'paths':['Location for S3 folder']}, format='json', # formatOptions=$..* ) print('Total Count:') df.count()
Top answer
1 of 2
3

Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

  • The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
  • The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

2 of 2
0

The way I was able to provide a custom query was by creating a Spark DataFrame and specifying it with options: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

Then transform that DataFrame into a DynamicFrame using said class: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

tmp_data_frame = spark.read.format("jbdc")
.option("url", jdbc_url)
.option("user", username)
.option("password", password)
.option("query", "select * from test where id<100")
.load()

dynamic_frame = DynamicFrame.fromDF(tmp_data_frame, glueContext)
🌐
Medium
medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a
AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium
February 12, 2025 - #create DynamicFame from S3 parquet files datasource0 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": [S3_location] }, format="parquet", transformation_ctx="datasource0")#create DynamicFame from glue catalog datasource0 = glueContext.create_dynamic_frame.from_catalog( database = "demo", table_name = "testtable", transformation_ctx = "datasource0")#convert to spark DataFrame #convert to Glue DynamicFrame df1 = datasource0.toDF() df2 = DynamicFrame.fromDF(df1, glueContext , "df2") df = dynamic_frame.toDF() df.show() print("Dataframe converted")
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframe class
DynamicFrame class - AWS Glue
Performs an equality join with another DynamicFrame and returns the resulting DynamicFrame. paths1 – A list of the keys in this frame to join. paths2 – A list of the keys in the other frame to join. ... stageThreshold – The number of errors encountered during this transformation at which the process should error out (optional).
🌐
Medium
medium.com › today-i-learnt › til-aws-glue-dynamic-dataframe-tips-todf-use-resolvechoice-for-mixed-data-types-in-a-column-374775d0c092
TIL: AWS Glue Dynamic Dataframe Tips toDf() — Use ResolveChoice for Mixed Data types in a column | by Satyaprakash Bommaraju | Today I Learnt | Medium
February 18, 2023 - raw_data_dydf = glueContext.create_dynamic_frame.from_options( format_options={"multiline": False}, connection_type="s3", format="json", connection_options={ "paths": [input_path], "recurse": False, }, transformation_ctx="raw_data", )
Find elsewhere
🌐
GitHub
github.com › aws-samples › aws-glue-samples › blob › master › examples › resolve_choice.py
aws-glue-samples/examples/resolve_choice.py at master · aws-samples/aws-glue-samples
glueContext.write_dynamic_frame.from_options(frame = medicare_res_make_struct, connection_type = "s3", connection_options = {"path": medicare_struct}, format = "json")
Author   aws-samples
🌐
GitHub
github.com › awsdocs › aws-glue-developer-guide › blob › master › doc_source › monitor-debug-multiple.md
aws-glue-developer-guide/doc_source/monitor-debug-multiple.md at master · awsdocs/aws-glue-developer-guide
datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="s3", connection_options = {"paths": [staging_path], "useS3ListImplementation":True,"recurse":True}, format="json", transformation_ctx = "bookmark_ctx")
Author   awsdocs
🌐
AWS re:Post
repost.aws › articles › ARQSOCWRuiSI6KdxyvcVBKPw › aws-glue-dynamic-frame-jdbc-performance-tuning-configuration
AWS Glue Dynamic Frame – JDBC Performance Tuning Configuration | AWS re:Post
June 2, 2023 - Code Snippet: JDBC_DF_PDP = glueContext.create_dynamic_frame.from_catalog( database="dms", table_name="dms_large_dbo_person", transformation_ctx="JDBC_DF_PDP", additional_options = { "hashexpression":"id", "enablePartitioningForSampleQuery":True, "sampleQuery":"select * from person where last_name <> 'rb' and"} )
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframewriter class
DynamicFrameWriter class - AWS Glue
This example writes the output locally using a connection_type of S3 with a POSIX path argument in connection_options, which allows writing to local storage. glueContext.write_dynamic_frame.from_options(\ frame = dyf_splitFields,\ connection_options = {'path': '/home/glue/GlueLocalOutput/'},\ connection_type = 's3',\ format = 'json') Document Conventions ·
🌐
Medium
swapnil-bhoite.medium.com › aws-glue-dynamicframe-transformations-with-example-code-and-output-26e14d13145f
AWS Glue DynamicFrame transformations with example code and output | by Swapnil Bhoite | Medium
April 28, 2022 - The SelectFields operation takes fields from a DynamicFrame to keep, just like a ‘SELECT’ query on SQL would do. The output is a DynamicFrame with only the selected fields. You provide the paths in the schema to the fields to keep. Let’s start by checking one of the tables’ schema: persons = glueContext.create_dynamic_frame.from_catalog(database="legislators", table_name="persons_json") print("Count: ", persons.count()) persons.printSchema()Count: 1961 root |-- family_name: string |-- name: string |-- links: array | |-- element: struct | | |-- note: string | | |-- url: string |-- gende
🌐
Stack Overflow
stackoverflow.com › questions › 74974372 › gluecontext-create-dynamic-frame-from-options-exclude-one-file-type-from-loading
amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow
raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id) df = glueContext.create_dynamic_frame_from_options(connection_type="s3", connection_options={"paths": [raw_data_input_path], "recurse": True}, format="json", transformation_ctx=dbInstance)
🌐
AWS re:Post
repost.aws › questions › QU3rukJUaHRpiMNjydfqLZgw › aws-glue-create-dynamic-frame-from-data-in-postgresql-with-custom-bookmark-key
aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key | AWS re:Post
March 18, 2023 - If you look into the documentation link, you would notice this - You can specify jobBookmarkKeys and jobBookmarkKeysSortOrder in the following ways: create_dynamic_frame.from_catalog — Use additional_options. create_dynamic_frame.from_options — Use connection_options.
🌐
Reddit
reddit.com › r/aws › gluecontext create_dynamic_frame_from_options exclude one file?
r/aws on Reddit: glueContext create_dynamic_frame_from_options exclude one file?
January 1, 2023 -
    raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id)

    df = glueContext.create_dynamic_frame_from_options(connection_type="s3",
                                                                connection_options={"paths": [raw_data_input_path],
                                                                                    "recurse": True},
                                                                format="json",
                                                                transformation_ctx=dbInstance)

Structure under component_id: The below folders contains jsons.

But now i've added a watermark.txt file, how do i exclude this particular file from the paths to recurse on, inside component_id?

I can't put this file in any other folder, is the only way to accomplish this is to put all ts=.. folders inside one folder called data?

🌐
Spark By {Examples}
sparkbyexamples.com › home › amazon aws › aws glue pyspark extensions reference
AWS Glue PySpark Extensions Reference - Spark By {Examples}
March 27, 2024 - # Creating a DynamicFrameCollection dynamic_frame_collection = { "frame1": dynamic_frame1, "frame2": dynamic_frame2 } DynamicFrameWriter class allows you to write out DynamicFrame objects to a variety of data sources. # Writing a DynamicFrame to an S3 bucket in CSV format glueContext.write_dynamic_frame.from_options(frame = dynamic_frame, connection_type = "s3", connection_options = {"path": "s3://mybucket/output"}, format = "csv")
🌐
AWS re:Post
repost.aws › questions › QUbfc34srzQzecWaUrW_CSoA › how-to-remove-unnamed-column-while-creating-dynamic-frame-from-catalog-options
How to remove Unnamed column while creating dynamic frame from catalog options | AWS re:Post
April 4, 2023 - o remove the unnamed column while creating a dynamic frame from the catalog options, you can use the ApplyMapping class from the awsglue.transforms module.