create dynamic frame from options

Create dynamic frame from options (from rds - mysql) providing a custom query with where clause

stackoverflow.com › questions › 60251975 › create-dynamic-frame-from-options-from-rds-mysql-providing-a-custom-query-wi

Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

Answer from Jon Legendre on Stack Overflow

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframereader class

DynamicFrameReader class - AWS Glue

March 12, 2026 - Reads a DynamicFrame from a Resilient Distributed Dataset (RDD). ... Reads a DynamicFrame using the specified connection and format. connection_type – The connection type. Valid values include s3, mysql, postgresql, redshift, sqlserver, oracle, dynamodb, and snowflake. connection_options – Connection options, such as path and database table (optional).

AWS

GlueContext class - AWS Glue

Returns a DynamicFrame that is created using a Data Catalog database and table name. When using this method, you provide format_options through table properties on the specified AWS Glue Data Catalog table and other options through the additional_options argument. Database – The database ...

Discussions

json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow

Question I am trying to create dynamic frame from options where source is S3 and type is JSON. I'm using following code however it is not returning any value. More on stackoverflow.com

stackoverflow.com

dataframe - Create dynamic frame from options (from rds - mysql) providing a custom query with where clause - Stack Overflow

I want to create a DynamicFrame in my Glue job from an Aurora-rds mysql table. Can I create DynamicFrame from my rds table using a custom query - having a where clause? I dont want to read the entire More on stackoverflow.com

stackoverflow.com

amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow

my Bucket key contains 10 json files 1 txt file, i want to only include json files in the dynamic frame. Is that what the 'format' param is for in create_dynamic_frame_from_options More on stackoverflow.com

stackoverflow.com

glueContext create_dynamic_frame_from_options exclude one file?

Did you try it out, did it return an error. I know glue doesn’t allow creation of tables if source location has data in different formats. Not sure able dynamic frames though More on reddit.com

r/aws

January 1, 2023

Sqlandhadoop

sqlandhadoop.com › aws-glue-create-dynamic-frame

AWS Glue create dynamic frame – SQL & Hadoop

import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate()) # creating dynamic frame from S3 data dyn_frame_s3 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": ["s3://<bucket name>/data/sales/"], "inferSchema": "true" }, format = "csv", format_options={ "separator": "\t" }, transformation_ctx="") print (dyn_frame_s3.count()) # creating dynamic frame from Glue catalog table dyn_frame_catalog = glueContext.create_dynamic_frame_from_catalog( database = "db_readfile", table_name = "sales", transformation_ctx = "") print (dyn_frame_catalog.count())

Stack Overflow

stackoverflow.com › questions › 74734233 › create-dynamic-frame-from-s3-bucket-aws-glue

json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow

import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from functools import reduce from awsglue.dynamicframe import DynamicFrame ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) df = glueContext.create_dynamic_frame.from_options( connection_type = 's3', connection_options={'paths':['Location for S3 folder']}, format='json', # formatOptions=$..* ) print('Total Count:') df.count()

Stack Overflow

stackoverflow.com › questions › 60251975 › create-dynamic-frame-from-options-from-rds-mysql-providing-a-custom-query-wi

dataframe - Create dynamic frame from options (from rds - mysql) providing a custom query with where clause - Stack Overflow

Top answer

1 of 2

Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

2 of 2

The way I was able to provide a custom query was by creating a Spark DataFrame and specifying it with options: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

Then transform that DataFrame into a DynamicFrame using said class: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

tmp_data_frame = spark.read.format("jbdc")
.option("url", jdbc_url)
.option("user", username)
.option("password", password)
.option("query", "select * from test where id<100")
.load()

dynamic_frame = DynamicFrame.fromDF(tmp_data_frame, glueContext)

Medium

medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a

AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium

February 12, 2025 - #create DynamicFame from S3 parquet files datasource0 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": [S3_location] }, format="parquet", transformation_ctx="datasource0")#create DynamicFame from glue catalog datasource0 = glueContext.create_dynamic_frame.from_catalog( database = "demo", table_name = "testtable", transformation_ctx = "datasource0")#convert to spark DataFrame #convert to Glue DynamicFrame df1 = datasource0.toDF() df2 = DynamicFrame.fromDF(df1, glueContext , "df2") df = dynamic_frame.toDF() df.show() print("Dataframe converted")

AWS

DynamicFrame class - AWS Glue

Performs an equality join with another DynamicFrame and returns the resulting DynamicFrame. paths1 – A list of the keys in this frame to join. paths2 – A list of the keys in the other frame to join. ... stageThreshold – The number of errors encountered during this transformation at which the process should error out (optional).

Medium

medium.com › today-i-learnt › til-aws-glue-dynamic-dataframe-tips-todf-use-resolvechoice-for-mixed-data-types-in-a-column-374775d0c092

TIL: AWS Glue Dynamic Dataframe Tips toDf() — Use ResolveChoice for Mixed Data types in a column | by Satyaprakash Bommaraju | Today I Learnt | Medium

February 18, 2023 - raw_data_dydf = glueContext.create_dynamic_frame.from_options( format_options={"multiline": False}, connection_type="s3", format="json", connection_options={ "paths": [input_path], "recurse": False, }, transformation_ctx="raw_data", )

Find elsewhere

Google Bing Mojeek

GitHub

github.com › aws-samples › aws-glue-samples › blob › master › examples › resolve_choice.py

aws-glue-samples/examples/resolve_choice.py at master · aws-samples/aws-glue-samples

medicare_df.createOrReplaceTempView("medicareTable") medicare_sql_df = spark.sql("SELECT * FROM medicareTable WHERE `total discharges` > 30") medicare_sql_dyf = DynamicFrame.fromDF(medicare_sql_df, glueContext, "medicare_sql_dyf") · # Write it out in Json · glueContext.write_dynamic_frame.from_options(frame = medicare_res_cast, connection_type = "s3", connection_options = {"path": medicare_cast}, format = "json") glueContext.write_dynamic_frame.from_options(frame = medicare_res_project, connection_type = "s3", connection_options = {"path": medicare_project}, format = "json") glueContext.wri

Author aws-samples

GitHub

github.com › awsdocs › aws-glue-developer-guide › blob › master › doc_source › monitor-debug-multiple.md

aws-glue-developer-guide/doc_source/monitor-debug-multiple.md at master · awsdocs/aws-glue-developer-guide

datasource0 = glueContext.create_dynamic_frame.from_options(connection_type="s3", connection_options = {"paths": [staging_path], "useS3ListImplementation":True,"recurse":True}, format="json") applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [map_spec]) datasink2 = glueContext.write_dynamic_frame.from_options(frame = applymapping1, connection_type = "s3", connection_options = {"path": output_path}, format = "json")

Author awsdocs

Brainly

brainly.com › computers and technology › high school › in aws glue, if you want to create a dynamic frame from options, which function would you use? a. create_dynamic_frame.from_options b. generate_dynamic_frame.from_config c. load_dynamic_frame.from_options d. convert_dynamic_frame.from_options

[FREE] In AWS Glue, if you want to create a dynamic frame from options, which function would you use? A. - brainly.com

To explain this further, a dynamic ... The create_dynamic_frame.from_options function allows you to specify various options such as the data source, connection type, and the format of the data....

AWS re:Post

repost.aws › articles › ARQSOCWRuiSI6KdxyvcVBKPw › aws-glue-dynamic-frame-jdbc-performance-tuning-configuration

AWS Glue Dynamic Frame – JDBC Performance Tuning Configuration | AWS re:Post

June 2, 2023 - Code Snippet: JDBC_DF_PDP = glueContext.create_dynamic_frame.from_catalog( database="dms", table_name="dms_large_dbo_person", transformation_ctx="JDBC_DF_PDP", additional_options = { "hashexpression":"id", "enablePartitioningForSampleQuery":True, "sampleQuery":"select * from person where last_name <> 'rb' and"} )

Medium

swapnil-bhoite.medium.com › aws-glue-dynamicframe-transformations-with-example-code-and-output-26e14d13145f

AWS Glue DynamicFrame transformations with example code and output | by Swapnil Bhoite | Medium

April 28, 2022 - The SelectFields operation takes fields from a DynamicFrame to keep, just like a ‘SELECT’ query on SQL would do. The output is a DynamicFrame with only the selected fields. You provide the paths in the schema to the fields to keep. Let’s start by checking one of the tables’ schema: persons = glueContext.create_dynamic_frame.from_catalog(database="legislators", table_name="persons_json") print("Count: ", persons.count()) persons.printSchema()Count: 1961 root |-- family_name: string |-- name: string |-- links: array | |-- element: struct | | |-- note: string | | |-- url: string |-- gende

AWS

DynamicFrameWriter class - AWS Glue

This example writes the output locally using a connection_type of S3 with a POSIX path argument in connection_options, which allows writing to local storage. glueContext.write_dynamic_frame.from_options(\ frame = dyf_splitFields,\ connection_options = {'path': '/home/glue/GlueLocalOutput/'},\ connection_type = 's3',\ format = 'json') Document Conventions ·

Stack Overflow

stackoverflow.com › questions › 74974372 › gluecontext-create-dynamic-frame-from-options-exclude-one-file-type-from-loading

amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow

raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id) df = glueContext.create_dynamic_frame_from_options(connection_type="s3", connection_options={"paths": [raw_data_input_path], "recurse": True}, format="json", transformation_ctx=dbInstance)

reddit.com › r/aws › gluecontext create_dynamic_frame_from_options exclude one file?

r/aws on Reddit: glueContext create_dynamic_frame_from_options exclude one file?

January 1, 2023 -

    raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id)

    df = glueContext.create_dynamic_frame_from_options(connection_type="s3",
                                                                connection_options={"paths": [raw_data_input_path],
                                                                                    "recurse": True},
                                                                format="json",
                                                                transformation_ctx=dbInstance)

Structure under component_id: The below folders contains jsons.

But now i've added a watermark.txt file, how do i exclude this particular file from the paths to recurse on, inside component_id?

I can't put this file in any other folder, is the only way to accomplish this is to put all ts=.. folders inside one folder called data?