Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

  • The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
  • The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

Answer from Jon Legendre on Stack Overflow
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframereader class
DynamicFrameReader class - AWS Glue
March 12, 2026 - Reads a DynamicFrame from a Resilient Distributed Dataset (RDD). ... Reads a DynamicFrame using the specified connection and format. connection_type – The connection type. Valid values include s3, mysql, postgresql, redshift, sqlserver, oracle, dynamodb, and snowflake. connection_options – Connection options, such as path and database table (optional).
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › gluecontext class
GlueContext class - AWS Glue
Returns a DynamicFrame that is created using a Data Catalog database and table name. When using this method, you provide format_options through table properties on the specified AWS Glue Data Catalog table and other options through the additional_options argument. Database – The database ...
Discussions

json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow
Question I am trying to create dynamic frame from options where source is S3 and type is JSON. I'm using following code however it is not returning any value. More on stackoverflow.com
🌐 stackoverflow.com
dataframe - Create dynamic frame from options (from rds - mysql) providing a custom query with where clause - Stack Overflow
I want to create a DynamicFrame in my Glue job from an Aurora-rds mysql table. Can I create DynamicFrame from my rds table using a custom query - having a where clause? I dont want to read the entire More on stackoverflow.com
🌐 stackoverflow.com
amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow
my Bucket key contains 10 json files 1 txt file, i want to only include json files in the dynamic frame. Is that what the 'format' param is for in create_dynamic_frame_from_options More on stackoverflow.com
🌐 stackoverflow.com
glueContext create_dynamic_frame_from_options exclude one file?
Did you try it out, did it return an error. I know glue doesn’t allow creation of tables if source location has data in different formats. Not sure able dynamic frames though More on reddit.com
🌐 r/aws
2
2
January 1, 2023
🌐
Sqlandhadoop
sqlandhadoop.com › aws-glue-create-dynamic-frame
AWS Glue create dynamic frame – SQL & Hadoop
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate()) # creating dynamic frame from S3 data dyn_frame_s3 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": ["s3://<bucket name>/data/sales/"], "inferSchema": "true" }, format = "csv", format_options={ "separator": "\t" }, transformation_ctx="") print (dyn_frame_s3.count()) # creating dynamic frame from Glue catalog table dyn_frame_catalog = glueContext.create_dynamic_frame_from_catalog( database = "db_readfile", table_name = "sales", transformation_ctx = "") print (dyn_frame_catalog.count())
🌐
Stack Overflow
stackoverflow.com › questions › 74734233 › create-dynamic-frame-from-s3-bucket-aws-glue
json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from functools import reduce from awsglue.dynamicframe import DynamicFrame ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) df = glueContext.create_dynamic_frame.from_options( connection_type = 's3', connection_options={'paths':['Location for S3 folder']}, format='json', # formatOptions=$..* ) print('Total Count:') df.count()
Top answer
1 of 2
3

Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

  • The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
  • The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

2 of 2
0

The way I was able to provide a custom query was by creating a Spark DataFrame and specifying it with options: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

Then transform that DataFrame into a DynamicFrame using said class: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

tmp_data_frame = spark.read.format("jbdc")
.option("url", jdbc_url)
.option("user", username)
.option("password", password)
.option("query", "select * from test where id<100")
.load()

dynamic_frame = DynamicFrame.fromDF(tmp_data_frame, glueContext)
🌐
Medium
medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a
AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium
February 12, 2025 - #create DynamicFame from S3 parquet files datasource0 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": [S3_location] }, format="parquet", transformation_ctx="datasource0")#create DynamicFame from glue catalog datasource0 = glueContext.create_dynamic_frame.from_catalog( database = "demo", table_name = "testtable", transformation_ctx = "datasource0")#convert to spark DataFrame #convert to Glue DynamicFrame df1 = datasource0.toDF() df2 = DynamicFrame.fromDF(df1, glueContext , "df2") df = dynamic_frame.toDF() df.show() print("Dataframe converted")
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframe class
DynamicFrame class - AWS Glue
Performs an equality join with another DynamicFrame and returns the resulting DynamicFrame. paths1 – A list of the keys in this frame to join. paths2 – A list of the keys in the other frame to join. ... stageThreshold – The number of errors encountered during this transformation at which the process should error out (optional).
🌐
Medium
medium.com › today-i-learnt › til-aws-glue-dynamic-dataframe-tips-todf-use-resolvechoice-for-mixed-data-types-in-a-column-374775d0c092
TIL: AWS Glue Dynamic Dataframe Tips toDf() — Use ResolveChoice for Mixed Data types in a column | by Satyaprakash Bommaraju | Today I Learnt | Medium
February 18, 2023 - raw_data_dydf = glueContext.create_dynamic_frame.from_options( format_options={"multiline": False}, connection_type="s3", format="json", connection_options={ "paths": [input_path], "recurse": False, }, transformation_ctx="raw_data", )
Find elsewhere
🌐
GitHub
github.com › aws-samples › aws-glue-samples › blob › master › examples › resolve_choice.py
aws-glue-samples/examples/resolve_choice.py at master · aws-samples/aws-glue-samples
medicare_df.createOrReplaceTempView("medicareTable") medicare_sql_df = spark.sql("SELECT * FROM medicareTable WHERE `total discharges` > 30") medicare_sql_dyf = DynamicFrame.fromDF(medicare_sql_df, glueContext, "medicare_sql_dyf") · # Write it out in Json · glueContext.write_dynamic_frame.from_options(frame = medicare_res_cast, connection_type = "s3", connection_options = {"path": medicare_cast}, format = "json") glueContext.write_dynamic_frame.from_options(frame = medicare_res_project, connection_type = "s3", connection_options = {"path": medicare_project}, format = "json") glueContext.wri
Author   aws-samples
🌐
GitHub
github.com › awsdocs › aws-glue-developer-guide › blob › master › doc_source › monitor-debug-multiple.md
aws-glue-developer-guide/doc_source/monitor-debug-multiple.md at master · awsdocs/aws-glue-developer-guide
datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="s3", connection_options = {"paths": [staging_path], "useS3ListImplementation":True,"recurse":True}, format="json") applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [map_spec]) datasink2 = glueContext.write_dynamic_frame.from_options(frame = applymapping1, connection_type = "s3", connection_options = {"path": output_path}, format = "json")
Author   awsdocs
🌐
AWS re:Post
repost.aws › articles › ARQSOCWRuiSI6KdxyvcVBKPw › aws-glue-dynamic-frame-jdbc-performance-tuning-configuration
AWS Glue Dynamic Frame – JDBC Performance Tuning Configuration | AWS re:Post
June 2, 2023 - Code Snippet: JDBC_DF_PDP = glueContext.create_dynamic_frame.from_catalog( database="dms", table_name="dms_large_dbo_person", transformation_ctx="JDBC_DF_PDP", additional_options = { "hashexpression":"id", "enablePartitioningForSampleQuery":True, "sampleQuery":"select * from person where last_name <> 'rb' and"} )
🌐
Medium
swapnil-bhoite.medium.com › aws-glue-dynamicframe-transformations-with-example-code-and-output-26e14d13145f
AWS Glue DynamicFrame transformations with example code and output | by Swapnil Bhoite | Medium
April 28, 2022 - The SelectFields operation takes fields from a DynamicFrame to keep, just like a ‘SELECT’ query on SQL would do. The output is a DynamicFrame with only the selected fields. You provide the paths in the schema to the fields to keep. Let’s start by checking one of the tables’ schema: persons = glueContext.create_dynamic_frame.from_catalog(database="legislators", table_name="persons_json") print("Count: ", persons.count()) persons.printSchema()Count: 1961 root |-- family_name: string |-- name: string |-- links: array | |-- element: struct | | |-- note: string | | |-- url: string |-- gende
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframewriter class
DynamicFrameWriter class - AWS Glue
This example writes the output locally using a connection_type of S3 with a POSIX path argument in connection_options, which allows writing to local storage. glueContext.write_dynamic_frame.from_options(\ frame = dyf_splitFields,\ connection_options = {'path': '/home/glue/GlueLocalOutput/'},\ connection_type = 's3',\ format = 'json') Document Conventions ·
🌐
Stack Overflow
stackoverflow.com › questions › 74974372 › gluecontext-create-dynamic-frame-from-options-exclude-one-file-type-from-loading
amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow
raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id) df = glueContext.create_dynamic_frame_from_options(connection_type="s3", connection_options={"paths": [raw_data_input_path], "recurse": True}, format="json", transformation_ctx=dbInstance)
🌐
Reddit
reddit.com › r/aws › gluecontext create_dynamic_frame_from_options exclude one file?
r/aws on Reddit: glueContext create_dynamic_frame_from_options exclude one file?
January 1, 2023 -
    raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id)

    df = glueContext.create_dynamic_frame_from_options(connection_type="s3",
                                                                connection_options={"paths": [raw_data_input_path],
                                                                                    "recurse": True},
                                                                format="json",
                                                                transformation_ctx=dbInstance)

Structure under component_id: The below folders contains jsons.

But now i've added a watermark.txt file, how do i exclude this particular file from the paths to recurse on, inside component_id?

I can't put this file in any other folder, is the only way to accomplish this is to put all ts=.. folders inside one folder called data?

🌐
AWS re:Post
repost.aws › questions › QU3rukJUaHRpiMNjydfqLZgw › aws-glue-create-dynamic-frame-from-data-in-postgresql-with-custom-bookmark-key
aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key | AWS re:Post
March 18, 2023 - If you look into the documentation link, you would notice this - You can specify jobBookmarkKeys and jobBookmarkKeysSortOrder in the following ways: create_dynamic_frame.from_catalog — Use additional_options. create_dynamic_frame.from_options — Use connection_options.
🌐
Spark By {Examples}
sparkbyexamples.com › home › amazon aws › aws glue pyspark extensions reference
AWS Glue PySpark Extensions Reference - Spark By {Examples}
March 27, 2024 - # Creating a DynamicFrameCollection dynamic_frame_collection = { "frame1": dynamic_frame1, "frame2": dynamic_frame2 } DynamicFrameWriter class allows you to write out DynamicFrame objects to a variety of data sources. # Writing a DynamicFrame to an S3 bucket in CSV format glueContext.write_dynamic_frame.from_options(frame = dynamic_frame, connection_type = "s3", connection_options = {"path": "s3://mybucket/output"}, format = "csv")
🌐
AWS re:Post
repost.aws › questions › QUbfc34srzQzecWaUrW_CSoA › how-to-remove-unnamed-column-while-creating-dynamic-frame-from-catalog-options
How to remove Unnamed column while creating dynamic frame from catalog options | AWS re:Post
April 4, 2023 - o remove the unnamed column while creating a dynamic frame from the catalog options, you can use the ApplyMapping class from the awsglue.transforms module.