gluecontext.create_dynamic_frame.from_catalog additional options

glueContext.create_dynamic_frame.from_catalog(...) not using supplied JDBC connection

repost.aws › questions › QUZPA6LwI5Ty6TQGPu6JXSjQ › gluecontext-create-dynamic-frame-from-catalog-not-using-supplied-jdbc-connection

Don't know if you found the answer, but i was having the same error yesterday. I downgrade my Glue version from 5.0 to 4.0 in Job Details, and it worked as expected. AWS needs to fix this bug, at least 8 months since it's been up. It's impossible to make the postgresql connection work in 5.0. Answer from Gabriel Zarpelon Oldakoski on repost.aws

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › gluecontext class

GlueContext class - AWS Glue

... Database – The Data Catalog ... transformation_ctx – The transformation context to use (optional). additional_options – A collection of optional name-value pairs....

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframe class

DynamicFrame class - AWS Glue

For CSV parsing and other format options, specify these in the from_options method when creating the DynamicFrame, not in the toDF method. Here's an example of the correct way to handle CSV format options: from awsglue.context import GlueContext from awsglue.dynamicframe import DynamicFrame ...

Discussions

aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key

Hi AWS expert, I have a code read data from AWS aurora PostgreSQL, I want to bookmark the table with custom column named 'ceres_mono_index'. But it seems like the bookmark is still uses the primar... More on repost.aws

repost.aws

2

0

March 18, 2023

How to create dynamic dataframe from AWS Glue catalog in local environment?

Now, I try to create a dynamic dataframe with the from_catalog method in this way: import sys from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame source_activities = glueContext.create_dynamic_frame... More on repost.aws

repost.aws

1

0

May 27, 2022

pyspark - create_dynamic_frame_from_catalog returning zero results - Stack Overflow

I'm trying to create a dynamic glue dataframe from an athena table but I keep getting an empty data frame. The athena table is part of my glue data catalog The create_dynamic_frame_method call does... More on stackoverflow.com

stackoverflow.com

dataframe - Create dynamic frame from options (from rds - mysql) providing a custom query with where clause - Stack Overflow

I want to create a DynamicFrame in my Glue job from an Aurora-rds mysql table. Can I create DynamicFrame from my rds table using a custom query - having a where clause? I dont want to read the entire More on stackoverflow.com

stackoverflow.com

AWS re:Post

repost.aws › questions › QUZPA6LwI5Ty6TQGPu6JXSjQ › gluecontext-create-dynamic-frame-from-catalog-not-using-supplied-jdbc-connection

glueContext.create_dynamic_frame.from_catalog(...) not using supplied JDBC connection | AWS re:Post

Top answer

1 of 2

1

Don't know if you found the answer, but i was having the same error yesterday. I downgrade my Glue version from 5.0 to 4.0 in Job Details, and it worked as expected. AWS needs to fix this bug, at least 8 months since it's been up. It's impossible to make the postgresql connection work in 5.0.

2 of 2

-1

Yes, AWS Glue does support reading from Data Catalog tables that are backed by PostgreSQL databases. However, the error you're encountering suggests there might be an issue with how the connection is being utilized or configured. When working with PostgreSQL in AWS Glue, it's important to ensure that the connection is properly set up and that the necessary information is being passed to the Glue job. Here are a few things to check: 1. Verify that the connection type for your PostgreSQL RDS instance is correctly set to "JDBC" in the Glue connection settings. 2. Make sure the JDBC URL, username, and password are correctly configured in the connection properties. 3. Check that the VPC, subnet, and security group settings in the Glue connection match those of your RDS instance. 4. Ensure that the IAM role associated with your Glue job has the necessary permissions to access both the Glue Data Catalog and the RDS instance. 5. When creating your dynamic frame, you may need to explicitly specify the connection name. Try modifying your code to include the connection_type parameter: ```python node_name = glueContext.create_dynamic_frame.from_catalog( database="default", table_name="my_table_name", transformation_ctx="ctx_name", connection_type="postgresql" ) ``` 6. If you're using a custom JDBC driver for PostgreSQL, make sure it's properly uploaded and configured in your Glue environment. 7. Check if there are any version compatibility issues between your PostgreSQL database and the JDBC driver being used by Glue. If you've verified all these points and are still encountering issues, it may be worth testing the connection directly using the AWS Glue connection test feature or the AWSSupport-TroubleshootGlueConnection runbook in AWS Systems Manager. These tools can help identify any network, IAM, or credential issues that might be preventing the connection from working properly. Remember that AWS Glue Studio creates unified connections for PostgreSQL data sources, which may require additional steps for accessing Secrets Manager and VPC resources. Make sure you've completed any necessary additional configuration for these unified connections. If the problem persists after checking these items, you may need to contact AWS support for further assistance, as there could be an underlying issue with the Glue service or how it's interacting with your specific PostgreSQL setup. **Sources** AWS Glue connection properties - AWS Glue Troublehsoot AWS Glue connection to a JDBC data source | AWS re:Post Using custom connectors and connections with AWS Glue Studio - AWS Glue

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframewriter class

DynamicFrameWriter class - AWS Glue

txId = glueContext.start_transaction(read_only=False) glueContext.write_dynamic_frame.from_catalog( frame=dyf, database = db, table_name = tbl, transformation_ctx = "datasource0", additional_options={"transactionId":txId}) ...

AWS re:Post

repost.aws › articles › ARQSOCWRuiSI6KdxyvcVBKPw › aws-glue-dynamic-frame-jdbc-performance-tuning-configuration

AWS Glue Dynamic Frame – JDBC Performance Tuning Configuration | AWS re:Post

June 2, 2023 - ‘hashexpression’ can be used instead of the ‘hashfield’ too Code Snippet: JDBC_DF = glueContext.create_dynamic_frame.from_catalog( database="dms", table_name="dms_large_dbo_person", transformation_ctx="JDBC_DF", additional_options = { 'hashfield': 'last_name', 'hashpartitions': '10' } )

GitHub

github.com › awslabs › aws-glue-libs › blob › master › awsglue › dynamicframe.py

aws-glue-libs/awsglue/dynamicframe.py at master · awslabs/aws-glue-libs

def from_catalog(self, frame, database ... = {}, catalog_id = None, **kwargs): """Creates a DynamicFrame with the specified catalog name space and table name....

Author awslabs

AWS re:Post

repost.aws › questions › QU3rukJUaHRpiMNjydfqLZgw › aws-glue-create-dynamic-frame-from-data-in-postgresql-with-custom-bookmark-key

aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key | AWS re:Post

March 18, 2023 - Alternatively, you can also try ... using 'glueContext.create_dynamic_frame.from_catalog' function and pass in bookmark keys in 'additional_options' param....

Sqlandhadoop

sqlandhadoop.com › aws-glue-create-dynamic-frame

AWS Glue create dynamic frame – SQL & Hadoop

import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate()) # creating dynamic frame from S3 data dyn_frame_s3 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": ["s3://<bucket name>/data/sales/"], "inferSchema": "true" }, format = "csv", format_options={ "separator": "\t" }, transformation_ctx="") print (dyn_frame_s3.count()) # creating dynamic frame from Glue catalog table dyn_frame_catalog = glueContext.create_dynamic_frame_from_catalog( database = "db_readfile", table_name = "sales", transformation_ctx = "") print (dyn_frame_catalog.count())

Find elsewhere

Google Bing Mojeek

Medium

swapnil-bhoite.medium.com › aws-glue-dynamicframe-transformations-with-example-code-and-output-26e14d13145f

AWS Glue DynamicFrame transformations with example code and output | by Swapnil Bhoite | Medium

April 28, 2022 - Some transforms have collection-specific versions that allow them to be applied to all DynamicFrames wihtin the collection simultaneously (MapToCollection, FlatMap), and the SelectFromCollection operation lets users pick an individual item from the collection: frame_collection.select('low').toDF().show() frame_collection.select('high').toDF().show()+---+-----+------------------------+-------------------------+ | id|index|contact_details.val.type|contact_details.val.value| +---+-----+------------------------+-------------------------+ | 11| 0| phone| 202-224-3542| | 11| 1| twitter| sencortezmas

AWS re:Post

repost.aws › questions › QU9wdyXEKlTby0QQGotAnhQQ › how-to-create-dynamic-dataframe-from-aws-glue-catalog-in-local-environment

How to create dynamic dataframe from AWS Glue catalog in local environment? | AWS re:Post

Top answer

1 of 1

1

It looks like you're missing the MySQL driver, you can provide your own JAR files via the "Dependent jars path" parameter. Your code looks fine so I'd assume the error is right, missing drivers/libraries.

GitHub

github.com › aws-samples › aws-glue-samples › blob › master › examples › join_and_relationalize.py

aws-glue-samples/examples/join_and_relationalize.py at master · aws-samples/aws-glue-samples

glueContext.write_dynamic_frame.from_options(frame = l_history, connection_type = "s3", connection_options = {"path": output_history_dir}, format = "parquet")

Author aws-samples

GitHub

github.com › awslabs › aws-glue-libs › blob › master › awsglue › context.py

aws-glue-libs/awsglue/context.py at master · awslabs/aws-glue-libs

def create_sample_dynamic_frame_from_catalog(self, database = None, table_name = None, num = None, sample_options = {}, redshift_tmp_dir = "",

Author awslabs

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframereader class

DynamicFrameReader class - AWS Glue

March 12, 2026 - To pass a catalog expression to filter based on the index columns, you can see the catalogPartitionPredicate option.

Stack Overflow

stackoverflow.com › questions › 53137425 › create-dynamic-frame-from-catalog-returning-zero-results

pyspark - create_dynamic_frame_from_catalog returning zero results - Stack Overflow

Top answer

1 of 2

4

There are several poorly documented features/gotchas in Glue which is sometimes frustrating.

I would suggest to investigate the following configurations of your Glue job:

Does the S3 bucket name has aws-glue-* prefix?
Put the files in S3 folder and make sure the crawler table definition is on folder rather than actual file.

I have also written a blog on LinkedIn about other Glue gotchas if that helps.

2 of 2

2

Do you have subfolders under the path where your Athena table points to? glueContext.create_dynamic_frame.from_catalog does not recursively read the data. Either put the data in the root of where the table is pointing to or add additional_options = {"recurse": True} to your from_catalog call.

credit: https://stackoverflow.com/a/56873939/5112418

GitHub

github.com › aws-samples › aws-glue-samples › blob › master › examples › resolve_choice.py

aws-glue-samples/examples/resolve_choice.py at master · aws-samples/aws-glue-samples

medicare_dyf = glueContext.create_dynamic_frame.from_catalog(database = db_name, table_name = tbl_name) · # The `provider id` field will be choice between long and string · · # Cast choices into integers, those values that cannot cast result in null ·

Author aws-samples

Spark By {Examples}

sparkbyexamples.com › home › amazon aws › aws glue pyspark extensions reference

AWS Glue PySpark Extensions Reference - Spark By {Examples}

March 27, 2024 - # Create a DynamicFrame from a catalog table dynamic_frame = glueContext.create_dynamic_frame.from_catalog(database = "mydatabase", table_name = "mytable") # Convert a DynamicFrame to DataFrame data_frame = dynamic_frame.toDF() # Convert a DataFrame to DynamicFrame dynamic_frame = DynamicFrame.fromDF(data_frame, glueContext, "dynamic_frame")

Medium

medium.com › bazaar-tech › aws-glue-hands-on-520cd8e6b4b0

AWS Glue: Hands-on. This article is in continuation of my… | by Syeda Marium Faheem | Bazaar Engineering | Medium

September 18, 2021 - ## @params: [JOB_NAME]args = getResolvedOptions(sys.argv, [‘JOB_NAME’])sc = SparkContext()SparkContext() create spark clusterglueContext = GlueContext(sc)spark = glueContext.spark_sessionjob = Job(glueContext)job.init(args[‘JOB_NAME’], args)datasource0 = glueContext.create_dynamic_frame.from_catalog(database = “gluedb”, table_name = “mytbl”, transformation_ctx = “datasource0”)

Medium

medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a

AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium

February 12, 2025 - # Import required libraries from awsglue.context import GlueContext from pyspark.context import SparkContext # Create a GlueContext sc = SparkContext() glueContext = GlueContext(sc) # Read data from the data source dynamic_frame= glueContext.create_dynamic_frame.from_catalog( database="my_database", table_name="my_table" ) # Apply data transformations using PySpark transformed_data = dynamic_frame.apply_mapping([ ("column_name", "string", "new_column_name", "string"), # Add more transformations as needed ]) df = dynamic_frame.toDF() df.show() print("Dataframe converted") # convert column names

Stack Overflow

stackoverflow.com › questions › 60251975 › create-dynamic-frame-from-options-from-rds-mysql-providing-a-custom-query-wi

dataframe - Create dynamic frame from options (from rds - mysql) providing a custom query with where clause - Stack Overflow

Top answer

1 of 2

3

Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

2 of 2

0

The way I was able to provide a custom query was by creating a Spark DataFrame and specifying it with options: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

Then transform that DataFrame into a DynamicFrame using said class: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

tmp_data_frame = spark.read.format("jbdc")
.option("url", jdbc_url)
.option("user", username)
.option("password", password)
.option("query", "select * from test where id<100")
.load()

dynamic_frame = DynamicFrame.fromDF(tmp_data_frame, glueContext)

Amazon Web Services

docs.amazonaws.cn › 亚马逊云科技 › amazon glue › user guide › amazon glue programming guide › programming spark scripts › program amazon glue etl scripts in pyspark › amazon glue pyspark extensions reference › dynamicframewriter class

DynamicFrameWriter class - Amazon Glue

txId = glueContext.start_transaction(read_only=False) glueContext.write_dynamic_frame.from_catalog( frame=dyf, database = db, table_name = tbl, transformation_ctx = "datasource0", additional_options={"transactionId":txId}) ...