Don't know if you found the answer, but i was having the same error yesterday. I downgrade my Glue version from 5.0 to 4.0 in Job Details, and it worked as expected. AWS needs to fix this bug, at least 8 months since it's been up. It's impossible to make the postgresql connection work in 5.0. Answer from Gabriel Zarpelon Oldakoski on repost.aws
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › gluecontext class
GlueContext class - AWS Glue
... Database – The Data Catalog ... transformation_ctx – The transformation context to use (optional). additional_options – A collection of optional name-value pairs....
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframe class
DynamicFrame class - AWS Glue
For CSV parsing and other format options, specify these in the from_options method when creating the DynamicFrame, not in the toDF method. Here's an example of the correct way to handle CSV format options: from awsglue.context import GlueContext from awsglue.dynamicframe import DynamicFrame ...
Discussions

aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key
Hi AWS expert, I have a code read data from AWS aurora PostgreSQL, I want to bookmark the table with custom column named 'ceres_mono_index'. But it seems like the bookmark is still uses the primar... More on repost.aws
🌐 repost.aws
2
0
March 18, 2023
How to create dynamic dataframe from AWS Glue catalog in local environment?
Now, I try to create a dynamic dataframe with the from_catalog method in this way: import sys from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from awsglue.dynamicframe import DynamicFrame source_activities = glueContext.create_dynamic_frame... More on repost.aws
🌐 repost.aws
1
0
May 27, 2022
pyspark - create_dynamic_frame_from_catalog returning zero results - Stack Overflow
I'm trying to create a dynamic glue dataframe from an athena table but I keep getting an empty data frame. The athena table is part of my glue data catalog The create_dynamic_frame_method call does... More on stackoverflow.com
🌐 stackoverflow.com
dataframe - Create dynamic frame from options (from rds - mysql) providing a custom query with where clause - Stack Overflow
I want to create a DynamicFrame in my Glue job from an Aurora-rds mysql table. Can I create DynamicFrame from my rds table using a custom query - having a where clause? I dont want to read the entire More on stackoverflow.com
🌐 stackoverflow.com
Top answer
1 of 2
1
Don't know if you found the answer, but i was having the same error yesterday. I downgrade my Glue version from 5.0 to 4.0 in Job Details, and it worked as expected. AWS needs to fix this bug, at least 8 months since it's been up. It's impossible to make the postgresql connection work in 5.0.
2 of 2
-1
Yes, AWS Glue does support reading from Data Catalog tables that are backed by PostgreSQL databases. However, the error you're encountering suggests there might be an issue with how the connection is being utilized or configured. When working with PostgreSQL in AWS Glue, it's important to ensure that the connection is properly set up and that the necessary information is being passed to the Glue job. Here are a few things to check: 1. Verify that the connection type for your PostgreSQL RDS instance is correctly set to "JDBC" in the Glue connection settings. 2. Make sure the JDBC URL, username, and password are correctly configured in the connection properties. 3. Check that the VPC, subnet, and security group settings in the Glue connection match those of your RDS instance. 4. Ensure that the IAM role associated with your Glue job has the necessary permissions to access both the Glue Data Catalog and the RDS instance. 5. When creating your dynamic frame, you may need to explicitly specify the connection name. Try modifying your code to include the connection_type parameter: ```python node_name = glueContext.create_dynamic_frame.from_catalog( database="default", table_name="my_table_name", transformation_ctx="ctx_name", connection_type="postgresql" ) ``` 6. If you're using a custom JDBC driver for PostgreSQL, make sure it's properly uploaded and configured in your Glue environment. 7. Check if there are any version compatibility issues between your PostgreSQL database and the JDBC driver being used by Glue. If you've verified all these points and are still encountering issues, it may be worth testing the connection directly using the AWS Glue connection test feature or the AWSSupport-TroubleshootGlueConnection runbook in AWS Systems Manager. These tools can help identify any network, IAM, or credential issues that might be preventing the connection from working properly. Remember that AWS Glue Studio creates unified connections for PostgreSQL data sources, which may require additional steps for accessing Secrets Manager and VPC resources. Make sure you've completed any necessary additional configuration for these unified connections. If the problem persists after checking these items, you may need to contact AWS support for further assistance, as there could be an underlying issue with the Glue service or how it's interacting with your specific PostgreSQL setup. **Sources** AWS Glue connection properties - AWS Glue Troublehsoot AWS Glue connection to a JDBC data source | AWS re:Post Using custom connectors and connections with AWS Glue Studio - AWS Glue
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframewriter class
DynamicFrameWriter class - AWS Glue
txId = glueContext.start_transaction(read_only=False) glueContext.write_dynamic_frame.from_catalog( frame=dyf, database = db, table_name = tbl, transformation_ctx = "datasource0", additional_options={"transactionId":txId}) ...
🌐
AWS re:Post
repost.aws › articles › ARQSOCWRuiSI6KdxyvcVBKPw › aws-glue-dynamic-frame-jdbc-performance-tuning-configuration
AWS Glue Dynamic Frame – JDBC Performance Tuning Configuration | AWS re:Post
June 2, 2023 - ‘hashexpression’ can be used instead of the ‘hashfield’ too Code Snippet: JDBC_DF = glueContext.create_dynamic_frame.from_catalog( database="dms", table_name="dms_large_dbo_person", transformation_ctx="JDBC_DF", additional_options = { 'hashfield': 'last_name', 'hashpartitions': '10' } )
🌐
GitHub
github.com › awslabs › aws-glue-libs › blob › master › awsglue › dynamicframe.py
aws-glue-libs/awsglue/dynamicframe.py at master · awslabs/aws-glue-libs
def from_catalog(self, frame, database ... = {}, catalog_id = None, **kwargs): """Creates a DynamicFrame with the specified catalog name space and table name....
Author   awslabs
🌐
AWS re:Post
repost.aws › questions › QU3rukJUaHRpiMNjydfqLZgw › aws-glue-create-dynamic-frame-from-data-in-postgresql-with-custom-bookmark-key
aws glue create_dynamic_frame from data in PostgreSQL with custom bookmark key | AWS re:Post
March 18, 2023 - Alternatively, you can also try ... using 'glueContext.create_dynamic_frame.from_catalog' function and pass in bookmark keys in 'additional_options' param....
🌐
Sqlandhadoop
sqlandhadoop.com › aws-glue-create-dynamic-frame
AWS Glue create dynamic frame – SQL & Hadoop
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate()) # creating dynamic frame from S3 data dyn_frame_s3 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": ["s3://<bucket name>/data/sales/"], "inferSchema": "true" }, format = "csv", format_options={ "separator": "\t" }, transformation_ctx="") print (dyn_frame_s3.count()) # creating dynamic frame from Glue catalog table dyn_frame_catalog = glueContext.create_dynamic_frame_from_catalog( database = "db_readfile", table_name = "sales", transformation_ctx = "") print (dyn_frame_catalog.count())
Find elsewhere
🌐
Medium
swapnil-bhoite.medium.com › aws-glue-dynamicframe-transformations-with-example-code-and-output-26e14d13145f
AWS Glue DynamicFrame transformations with example code and output | by Swapnil Bhoite | Medium
April 28, 2022 - Some transforms have collection-specific versions that allow them to be applied to all DynamicFrames wihtin the collection simultaneously (MapToCollection, FlatMap), and the SelectFromCollection operation lets users pick an individual item from the collection: frame_collection.select('low').toDF().show() frame_collection.select('high').toDF().show()+---+-----+------------------------+-------------------------+ | id|index|contact_details.val.type|contact_details.val.value| +---+-----+------------------------+-------------------------+ | 11| 0| phone| 202-224-3542| | 11| 1| twitter| sencortezmas
🌐
GitHub
github.com › aws-samples › aws-glue-samples › blob › master › examples › join_and_relationalize.py
aws-glue-samples/examples/join_and_relationalize.py at master · aws-samples/aws-glue-samples
glueContext.write_dynamic_frame.from_options(frame = l_history, connection_type = "s3", connection_options = {"path": output_history_dir}, format = "parquet")
Author   aws-samples
🌐
GitHub
github.com › awslabs › aws-glue-libs › blob › master › awsglue › context.py
aws-glue-libs/awsglue/context.py at master · awslabs/aws-glue-libs
def create_sample_dynamic_frame_from_catalog(self, database = None, table_name = None, num = None, sample_options = {}, redshift_tmp_dir = "",
Author   awslabs
🌐
GitHub
github.com › aws-samples › aws-glue-samples › blob › master › examples › resolve_choice.py
aws-glue-samples/examples/resolve_choice.py at master · aws-samples/aws-glue-samples
medicare_dyf = glueContext.create_dynamic_frame.from_catalog(database = db_name, table_name = tbl_name) · # The `provider id` field will be choice between long and string · · # Cast choices into integers, those values that cannot cast result in null ·
Author   aws-samples
🌐
Spark By {Examples}
sparkbyexamples.com › home › amazon aws › aws glue pyspark extensions reference
AWS Glue PySpark Extensions Reference - Spark By {Examples}
March 27, 2024 - # Create a DynamicFrame from a catalog table dynamic_frame = glueContext.create_dynamic_frame.from_catalog(database = "mydatabase", table_name = "mytable") # Convert a DynamicFrame to DataFrame data_frame = dynamic_frame.toDF() # Convert a DataFrame to DynamicFrame dynamic_frame = DynamicFrame.fromDF(data_frame, glueContext, "dynamic_frame")
🌐
Medium
medium.com › bazaar-tech › aws-glue-hands-on-520cd8e6b4b0
AWS Glue: Hands-on. This article is in continuation of my… | by Syeda Marium Faheem | Bazaar Engineering | Medium
September 18, 2021 - ## @params: [JOB_NAME]args = getResolvedOptions(sys.argv, [‘JOB_NAME’])sc = SparkContext()SparkContext() create spark clusterglueContext = GlueContext(sc)spark = glueContext.spark_sessionjob = Job(glueContext)job.init(args[‘JOB_NAME’], args)datasource0 = glueContext.create_dynamic_frame.from_catalog(database = “gluedb”, table_name = “mytbl”, transformation_ctx = “datasource0”)
🌐
Medium
medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a
AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium
February 12, 2025 - # Import required libraries from awsglue.context import GlueContext from pyspark.context import SparkContext # Create a GlueContext sc = SparkContext() glueContext = GlueContext(sc) # Read data from the data source dynamic_frame= glueContext.create_dynamic_frame.from_catalog( database="my_database", table_name="my_table" ) # Apply data transformations using PySpark transformed_data = dynamic_frame.apply_mapping([ ("column_name", "string", "new_column_name", "string"), # Add more transformations as needed ]) df = dynamic_frame.toDF() df.show() print("Dataframe converted") # convert column names
Top answer
1 of 2
3

Apologies, I would have made a comment but I do not have sufficient reputation. I was able to make the solution that Guillermo AMS provided work within AWS Glue, but it did require two changes:

  • The "jdbc" format was unrecognized (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o79.load. : java.lang.ClassNotFoundException: Failed to find data source: jbdc. Please find packages at http://spark.apache.org/third-party-projects.html") -- I had to use the full name: "org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider"
  • The query option was not working for me (the provided error was: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.load. : java.sql.SQLSyntaxErrorException: ORA-00911: invalid character"), but fortunately, the "dbtable" option supports passing in either a table or a subquery -- that is using parentheses around a query.

In my solution below I have also added a bit of context around the needed objects and imports.
My solution ended up looking like:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

tmp_data_frame = glue_context.spark_session.read\
  .format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider")\
  .option("url", jdbc_url)\
  .option("user", username)\
  .option("password", password)\
  .option("dbtable", "(select * from test where id<100)")\
  .load()

2 of 2
0

The way I was able to provide a custom query was by creating a Spark DataFrame and specifying it with options: https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#manually-specifying-options

Then transform that DataFrame into a DynamicFrame using said class: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-dynamic-frame.html

tmp_data_frame = spark.read.format("jbdc")
.option("url", jdbc_url)
.option("user", username)
.option("password", password)
.option("query", "select * from test where id<100")
.load()

dynamic_frame = DynamicFrame.fromDF(tmp_data_frame, glueContext)
🌐
Amazon Web Services
docs.amazonaws.cn › 亚马逊云科技 › amazon glue › user guide › amazon glue programming guide › programming spark scripts › program amazon glue etl scripts in pyspark › amazon glue pyspark extensions reference › dynamicframewriter class
DynamicFrameWriter class - Amazon Glue
txId = glueContext.start_transaction(read_only=False) glueContext.write_dynamic_frame.from_catalog( frame=dyf, database = db, table_name = tbl, transformation_ctx = "datasource0", additional_options={"transactionId":txId}) ...