gluecontext create_dynamic_frame from_catalog

glueContext.create_dynamic_frame.from_catalog(...) not using supplied JDBC connection

repost.aws › questions › QUZPA6LwI5Ty6TQGPu6JXSjQ › gluecontext-create-dynamic-frame-from-catalog-not-using-supplied-jdbc-connection

Don't know if you found the answer, but i was having the same error yesterday. I downgrade my Glue version from 5.0 to 4.0 in Job Details, and it worked as expected. AWS needs to fix this bug, at least 8 months since it's been up. It's impossible to make the postgresql connection work in 5.0. Answer from Gabriel Zarpelon Oldakoski on repost.aws

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › gluecontext class

GlueContext class - AWS Glue

__init__ — creating —getSourcecreate_dynamic_frame_from_rddcreate_dynamic_frame_from_catalogcreate_dynamic_frame_from_optionscreate_sample_dynamic_frame_from_catalogcreate_sample_dynamic_frame_from_optionsadd_ingestion_time_columnscreate_data_frame_from_catalogcreate_data_frame_from_optionsforEachBatch — Amazon S3 datasets —purge_tablepurge_s3_pathtransition_tabletransition_s3_path — extracting —extract_jdbc_conf— transactions —start_transactioncommit_transactioncancel_transaction — writing —getSinkwrite_dynamic_frame_from_optionswrite_from_optionswrite_dynamic_frame_from_catalogwrite_data_frame_from_catalogwrite_dynamic_frame_from_jdbc_confwrite_from_jdbc_conf

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframe class

DynamicFrame class - AWS Glue

For CSV parsing and other format options, specify these in the from_options method when creating the DynamicFrame, not in the toDF method. Here's an example of the correct way to handle CSV format options: from awsglue.context import GlueContext from awsglue.dynamicframe import DynamicFrame from pyspark.context import SparkContext sc = SparkContext() glueContext = GlueContext(sc) # Correct: Specify format options in from_options csv_dyf = glueContext.create_dynamic_frame.from_options( connection_type="s3", connection_options={"paths": ["s3://my-bucket/path/to/csv/"]}, format="csv", format_options={ "withHeader": True, "separator": ",", "inferSchema": True } ) # Convert to DataFrame (no format options needed here) csv_df = csv_dyf.toDF()

Discussions

glueContext.create_dynamic_frame.from_catalog(...) not using supplied JDBC connection

I have created a Glue Connection to my PostgreSQL (RDS) instance. I have created and run a Crawler for a table to create a corresponding table in my Glue Data Catalog. I have been unable to read fr... More on repost.aws

repost.aws

2

0

March 19, 2025

pyspark - create_dynamic_frame_from_catalog returning zero results - Stack Overflow

I'm trying to create a dynamic glue dataframe from an athena table but I keep getting an empty data frame. The athena table is part of my glue data catalog The create_dynamic_frame_method call does... More on stackoverflow.com

stackoverflow.com

How to create dynamic dataframe from AWS Glue catalog in local environment?

Now, I try to create a dynamic dataframe with the from_catalog method in this way: import sys from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame source_activities = glueContext.create_dynamic_frame... More on repost.aws

repost.aws

1

0

May 27, 2022

aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key

Hi AWS expert, I have a code read data from AWS aurora PostgreSQL, I want to bookmark the table with custom column named 'ceres_mono_index'. But it seems like the bookmark is still uses the primar... More on repost.aws

repost.aws

2

0

March 18, 2023

Medium

medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a

AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium

February 12, 2025 - #create DynamicFame from S3 parquet files datasource0 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": [S3_location] }, format="parquet", transformation_ctx="datasource0")#create DynamicFame from glue catalog datasource0 = glueContext.create_dynamic_frame.from_catalog( database = "demo", table_name = "testtable", transformation_ctx = "datasource0")#convert to spark DataFrame #convert to Glue DynamicFrame df1 = datasource0.toDF() df2 = DynamicFrame.fromDF(df1, glueContext , "df2") df = dynamic_frame.toDF() df.show() print("Dataframe converted")

AWS re:Post

repost.aws › questions › QUZPA6LwI5Ty6TQGPu6JXSjQ › gluecontext-create-dynamic-frame-from-catalog-not-using-supplied-jdbc-connection

glueContext.create_dynamic_frame.from_catalog(...) not using supplied JDBC connection | AWS re:Post

Top answer

1 of 2

1

Don't know if you found the answer, but i was having the same error yesterday. I downgrade my Glue version from 5.0 to 4.0 in Job Details, and it worked as expected. AWS needs to fix this bug, at least 8 months since it's been up. It's impossible to make the postgresql connection work in 5.0.

2 of 2

-1

Yes, AWS Glue does support reading from Data Catalog tables that are backed by PostgreSQL databases. However, the error you're encountering suggests there might be an issue with how the connection is being utilized or configured. When working with PostgreSQL in AWS Glue, it's important to ensure that the connection is properly set up and that the necessary information is being passed to the Glue job. Here are a few things to check: 1. Verify that the connection type for your PostgreSQL RDS instance is correctly set to "JDBC" in the Glue connection settings. 2. Make sure the JDBC URL, username, and password are correctly configured in the connection properties. 3. Check that the VPC, subnet, and security group settings in the Glue connection match those of your RDS instance. 4. Ensure that the IAM role associated with your Glue job has the necessary permissions to access both the Glue Data Catalog and the RDS instance. 5. When creating your dynamic frame, you may need to explicitly specify the connection name. Try modifying your code to include the connection_type parameter: ```python node_name = glueContext.create_dynamic_frame.from_catalog( database="default", table_name="my_table_name", transformation_ctx="ctx_name", connection_type="postgresql" ) ``` 6. If you're using a custom JDBC driver for PostgreSQL, make sure it's properly uploaded and configured in your Glue environment. 7. Check if there are any version compatibility issues between your PostgreSQL database and the JDBC driver being used by Glue. If you've verified all these points and are still encountering issues, it may be worth testing the connection directly using the AWS Glue connection test feature or the AWSSupport-TroubleshootGlueConnection runbook in AWS Systems Manager. These tools can help identify any network, IAM, or credential issues that might be preventing the connection from working properly. Remember that AWS Glue Studio creates unified connections for PostgreSQL data sources, which may require additional steps for accessing Secrets Manager and VPC resources. Make sure you've completed any necessary additional configuration for these unified connections. If the problem persists after checking these items, you may need to contact AWS support for further assistance, as there could be an underlying issue with the Glue service or how it's interacting with your specific PostgreSQL setup. **Sources** AWS Glue connection properties - AWS Glue Troublehsoot AWS Glue connection to a JDBC data source | AWS re:Post Using custom connectors and connections with AWS Glue Studio - AWS Glue

Sqlandhadoop

sqlandhadoop.com › aws-glue-create-dynamic-frame

AWS Glue create dynamic frame – SQL & Hadoop

# creating dynamic frame from S3 data dyn_frame_s3 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": ["s3://<bucket name>/data/sales/"], "inferSchema": "true" }, format = "csv", format_options={ "separator": "\t" }, transformation_ctx="") print (dyn_frame_s3.count()) We will use one of the table which we created in previous tutorial. # creating dynamic frame from Glue catalog table dyn_frame_catalog = glueContext.create_dynamic_frame_from_catalog( database = "db_readfile", table_name = "sales", transformation_ctx = "") print (dyn_frame_catalog.count())

Stack Overflow

stackoverflow.com › questions › 53137425 › create-dynamic-frame-from-catalog-returning-zero-results

pyspark - create_dynamic_frame_from_catalog returning zero results - Stack Overflow

Top answer

1 of 1

1

It looks like you're missing the MySQL driver, you can provide your own JAR files via the "Dependent jars path" parameter. Your code looks fine so I'd assume the error is right, missing drivers/libraries.

Find elsewhere

Google Bing Mojeek

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframewriter class

DynamicFrameWriter class - AWS Glue

txId = glueContext.start_transaction(read_only=False) glueContext.write_dynamic_frame.from_catalog( frame=dyf, database = db, table_name = tbl, transformation_ctx = "datasource0", additional_options={"transactionId":txId}) ...

GitHub

github.com › aws-samples › aws-glue-samples › blob › master › examples › join_and_relationalize.py

aws-glue-samples/examples/join_and_relationalize.py at master · aws-samples/aws-glue-samples

persons = glueContext.create_dynamic_frame.from_catalog(database=db_name, table_name=tbl_persons) memberships = glueContext.create_dynamic_frame.from_catalog(database=db_name, table_name=tbl_membership) orgs = glueContext.create_dynamic_frame.from_catalog(database=db_name, table_name=tbl_organization) ·

Author aws-samples

GitHub

github.com › awslabs › aws-glue-libs › blob › master › awsglue › dynamicframe.py

aws-glue-libs/awsglue/dynamicframe.py at master · awslabs/aws-glue-libs

def from_catalog(self, frame, database ... = {}, catalog_id = None, **kwargs): """Creates a DynamicFrame with the specified catalog name space and table name....

Author awslabs

Spark By {Examples}

sparkbyexamples.com › home › amazon aws › aws glue pyspark extensions reference

AWS Glue PySpark Extensions Reference - Spark By {Examples}

March 27, 2024 - # Create a DynamicFrame from a catalog table dynamic_frame = glueContext.create_dynamic_frame.from_catalog(database = "mydatabase", table_name = "mytable") # Convert a DynamicFrame to DataFrame data_frame = dynamic_frame.toDF() # Convert a DataFrame to DynamicFrame dynamic_frame = DynamicFrame.fromDF(data_frame, glueContext, "dynamic_frame")

AWS re:Post

repost.aws › questions › QU3rukJUaHRpiMNjydfqLZgw › aws-glue-create-dynamic-frame-from-data-in-postgresql-with-custom-bookmark-key

aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key | AWS re:Post

March 18, 2023 - Alternatively, you can also try another way - create a glue crawler to crawl your PostgreSQL datastore and run it to create glue catalog metadata table. you can pass in classifiers in crawler config or even map columns to different name/type using applyMapping feature. Then create the dynamic frame using 'glueContext.create_dynamic_frame.from_catalog' function and pass in bookmark keys in 'additional_options' param.

tecracer

tecracer.com › blog › 2021 › 06 › what-i-wish-somebody-had-explained-to-me-before-i-started-to-use-aws-glue.html

What I wish somebody had explained to me before I started to use AWS Glue | tecRacer Amazon AWS Blog

June 22, 2021 - We can then use this GlueContext to read data from our data stores. The create_dynamic_frame.from_catalog uses the Glue data catalog to figure out where the actual data is stored and reads it from there.

Stack Overflow

stackoverflow.com › questions › 77796630 › creating-dynamic-frame-using-glue-context-catalog-from-athena-view

python - creating dynamic frame using glue context catalog from Athena view - Stack Overflow

Copyfrom awsglue.context import GlueContext dataframe = glueContext.create_dynamic_frame.from_catalog( database=db_name, table_name=view_name, push_down_predicate=f"year='2023' and month='1' and date='12'", ) However I get the following error: CopyAn error occurred while calling o117.getDynamicFrame.

GitHub

github.com › aws-samples › aws-glue-samples › blob › master › examples › resolve_choice.py

aws-glue-samples/examples/resolve_choice.py at master · aws-samples/aws-glue-samples

medicare_dyf = glueContext.create_dynamic_frame.from_catalog(database = db_name, table_name = tbl_name) · # The `provider id` field will be choice between long and string · · # Cast choices into integers, those values that cannot cast result in null ·

Author aws-samples

AWS re:Post

repost.aws › articles › ARQSOCWRuiSI6KdxyvcVBKPw › aws-glue-dynamic-frame-jdbc-performance-tuning-configuration

AWS Glue Dynamic Frame – JDBC Performance Tuning Configuration | AWS re:Post

June 2, 2023 - Code Snippet: JDBC_DF_PDP = glueContext.create_dynamic_frame.from_catalog( database="dms", table_name="dms_large_dbo_person", transformation_ctx="JDBC_DF_PDP", additional_options = { "hashexpression":"id", "enablePartitioningForSampleQuery":True, "sampleQuery":"select * from person where last_name <> 'rb' and"} )

Medium

medium.com › @gokulnath.raghavan › title-mastering-pyspark-in-aws-glue-5-best-practices-with-examples-800679c0a140

Title: Mastering PySpark in AWS Glue: 5 Best Practices with Examples | by Gokulnath Raghavan | Medium

March 15, 2024 - Example: ```python # Read data with predicate pushdown dynamic_frame = glueContext.create_dynamic_frame.from_catalog( database="my_database", table_name="my_table", push_down_predicate="year = 2023" ) ```

Stack Overflow

stackoverflow.com › questions › 77377292 › attributeerror-gluecontext-object-has-no-attribute-create-sample-dynamic-fra

python - AttributeError: 'GlueContext' object has no attribute 'create_sample_dynamic_frame' - Stack Overflow

If you carefully look at the 'create_sample_dynamic_frame_from_catalog' definition in the official document, you will see the three mandatory parameters: database, table_name, num. You have missed the 'num' parameter, which is the reason why you are getting the above error message. I was able to create a sample dynamic frame using the below code: df1 = glueContext.create_sample_dynamic_frame_from_catalog( database="default", table_name="test", num = 100, transformation_ctx="df1", )

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue python code samples › code example: joining and relationalizing data

Code example: Joining and relationalizing data - AWS Glue

memberships = glueContext.create_dynamic_frame.from_catalog( database="legislators", table_name="memberships_json") print "Count: ", memberships.count() memberships.printSchema()