I think the problem occured because you didn't use the query in parentheses and provide an alias. In my opinion it should look like in the following example:

 val t = glueContext.read.format("jdbc").option("url","jdbc:mysql://serverIP:port/database").option("user","username").option("password","password").option("dbtable","(select * from table1 where 1=1) as t1").option("driver","com.mysql.jdbc.Driver").load()

More information about parameters in SQL data sources:

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

When it comes to the Glue and the framework which the Glue provides, there is also the option "push_down_predicate", but I have only used this option on the data sources based on S3. I think it doesn't work on other sources than on S3 and non-partitioned data.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html

Answer from jbgorski on Stack Overflow
Discussions

aws glue - AWS glueContext read doesn't allow a sql query - Stack Overflow
I want to read filtered data from a Mysql instance using AWS glue job. Since a glue jdbc connection doesnt allow me to push down predicate, I am trying to explicitly create a jdbc connection in my ... More on stackoverflow.com
🌐 stackoverflow.com
python - GlueContext necessary for AWS Glue jobs? - Stack Overflow
I checked a few of the questions but none seem to be what I am wondering. I've created several Glue Jobs using GlueContext's syntax but now that I've learned a bit more about PySpark I wanted to cr... More on stackoverflow.com
🌐 stackoverflow.com
Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog
Is there a way to use `GlueContext.getSink().writeFrame(...)` to write Apache Iceberg tables? So far I only find the version `GlueContext.write_dynamic_frame.from_options(...)` working which is doc... More on repost.aws
🌐 repost.aws
1
0
July 6, 2022
amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow
raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id) df = glueContext.create_dynamic_frame_from_options( More on stackoverflow.com
🌐 stackoverflow.com
🌐
GitHub
github.com › awslabs › aws-glue-libs › blob › master › awsglue › context.py
aws-glue-libs/awsglue/context.py at master · awslabs/aws-glue-libs
return self._jvm.GlueContext(self._jsc.sc()) else: return self._jvm.GlueContext(self._jsc.sc(), min_partitions, target_partitions) · def getSource(self, connection_type, format = None, transformation_ctx = "", push_down_predicate= "", **options): """Creates a DataSource object.
Author   awslabs
🌐
Medium
medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a
AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium
February 12, 2025 - GlueContext is the entry point for reading and writing DynamicFrames in AWS Glue. It wraps the Apache SparkSQL SQLContext object providing…
Top answer
1 of 4
9

I think the problem occured because you didn't use the query in parentheses and provide an alias. In my opinion it should look like in the following example:

 val t = glueContext.read.format("jdbc").option("url","jdbc:mysql://serverIP:port/database").option("user","username").option("password","password").option("dbtable","(select * from table1 where 1=1) as t1").option("driver","com.mysql.jdbc.Driver").load()

More information about parameters in SQL data sources:

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

When it comes to the Glue and the framework which the Glue provides, there is also the option "push_down_predicate", but I have only used this option on the data sources based on S3. I think it doesn't work on other sources than on S3 and non-partitioned data.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html

2 of 4
2

For anyone who is still searching for further answers/examples, I can confirm that the push_down_predicate option works with ODBC data sources. Here's how I read from SQL Server (in Python).

df = glueContext.read.format("jdbc")
    .option("url","jdbc:sqlserver://server-ip:port;databaseName=db;")
    .option("user","username")
    .option("password","password")
    .option("dbtable","(select t1.*, t2.name from dbo.table1 t1 join dbo.table2 t2 on t1.id = t2.id) as users")
    .option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver")
    .load()

This also works but NOT as I expected. The predicate is not pushed down to the data source.

df = glueContext.create_dynamic_frame.from_catalog(database = "db", table_name = "db_dbo_table1", push_down_predicate = "(id >= 2850700 AND statusCode = 'ACT')")

The documentation on pushDownPredicate states: The option to enable or disable predicate push-down into the JDBC data source. The default value is true, in which case Spark will push down filters to the JDBC data source as much as possible.

🌐
Stack Overflow
stackoverflow.com › questions › 62779954 › gluecontext-necessary-for-aws-glue-jobs
python - GlueContext necessary for AWS Glue jobs? - Stack Overflow
I've created several Glue Jobs using GlueContext's syntax but now that I've learned a bit more about PySpark I wanted to create and execute a Glue Job fully scripted by me with PySpark.
🌐
Medium
medium.com › @rohitshrivastava87 › 7-shorticles-awsglue-context-4d0788990f82
#7 Shorticles: awsglue.context. The awsglue.context module in AWS Glue… | by Rohit Shrivastava | Medium
May 18, 2023 - The glueContext object can be used to interact with the AWS Glue environment, access AWS Glue features, and perform ETL operations using dynamic frames and other AWS Glue-specific functionalities.
Find elsewhere
🌐
Bluetab
bluetab.net › en › bluetab
Basic AWS Glue concepts
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.dynamicframe import DynamicFrame from awsglue.job import Job args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) ## Read Data from a RDS DB using JDBC driver connection_option = { "url": "jdbc:mysql://mysql–instance1.123456789012.us-east-1.rds.amazonaws.com:3306/database", "user": "test", "password": "passwor
🌐
AWS re:Post
repost.aws › questions › QU1CKM9J0sQKuP9LQEmMlhOg › use-gluecontext-getsink-for-writing-apache-iceberg-table-to-s3-bucket-and-data-catalog
Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog | AWS re:Post
July 6, 2022 - So far I only find the version GlueContext.write_dynamic_frame.from_options(...) working which is documented at the bottom of https://aws.amazon.com/de/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/. This version does not seem to provide an option to update the data catalog simultaneously.
🌐
Medium
medium.com › @rohitshrivastava87 › 4-shorticles-aws-glues-integration-with-apache-spark-7dff9c9c1190
#4 Shorticles: AWS Glue’s integration with Apache Spark | by Rohit Shrivastava | Medium
May 10, 2023 - GlueContext is a high-level wrapper around Apache SparkContext that provides additional AWS Glue-specific functionality.
🌐
tecracer
tecracer.com › blog › 2021 › 06 › what-i-wish-somebody-had-explained-to-me-before-i-started-to-use-aws-glue.html
What I wish somebody had explained to me before I started to use AWS Glue | tecRacer Amazon AWS Blog
June 22, 2021 - Let’s take a look at an example of a simple PySpark transformation script to get an idea of the kind of code we might write. First we initialize a connection to our Spark cluster and get a GlueContext object. We can then use this GlueContext to read data from our data stores.
🌐
Spark By {Examples}
sparkbyexamples.com › home › amazon aws › aws glue pyspark extensions reference
AWS Glue PySpark Extensions Reference - Spark By {Examples}
March 27, 2024 - The GlueContext class is an extension of PySpark’s SparkContext. It’s the entry point of any Glue job and simplifies reading, transforming, and writing data.
🌐
Stack Overflow
stackoverflow.com › questions › 74974372 › gluecontext-create-dynamic-frame-from-options-exclude-one-file-type-from-loading
amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow
raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id) df = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options={ "paths": [raw_data_input_path], "recurse": True, "exclusions": ["**.txt"], }, format="json", transformation_ctx=dbInstance, )
🌐
DEV Community
dev.to › aws-builders › 4-shorticles-aws-glues-integration-with-apache-spark-4jha
#4 Shorticles: AWS Glue’s integration with Apache Spark - DEV Community
May 16, 2023 - GlueContext is a high-level wrapper around Apache SparkContext that provides additional AWS Glue-specific functionality.
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue python code samples › code example: data preparation using resolvechoice, lambda, and applymapping
Code example: Data preparation using ResolveChoice, Lambda, and ApplyMapping - AWS Glue
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate())
Top answer
1 of 2
1
Don't know if you found the answer, but i was having the same error yesterday. I downgrade my Glue version from 5.0 to 4.0 in Job Details, and it worked as expected. AWS needs to fix this bug, at least 8 months since it's been up. It's impossible to make the postgresql connection work in 5.0.
2 of 2
-1
Yes, AWS Glue does support reading from Data Catalog tables that are backed by PostgreSQL databases. However, the error you're encountering suggests there might be an issue with how the connection is being utilized or configured. When working with PostgreSQL in AWS Glue, it's important to ensure that the connection is properly set up and that the necessary information is being passed to the Glue job. Here are a few things to check: 1. Verify that the connection type for your PostgreSQL RDS instance is correctly set to "JDBC" in the Glue connection settings. 2. Make sure the JDBC URL, username, and password are correctly configured in the connection properties. 3. Check that the VPC, subnet, and security group settings in the Glue connection match those of your RDS instance. 4. Ensure that the IAM role associated with your Glue job has the necessary permissions to access both the Glue Data Catalog and the RDS instance. 5. When creating your dynamic frame, you may need to explicitly specify the connection name. Try modifying your code to include the connection_type parameter: ```python node_name = glueContext.create_dynamic_frame.from_catalog( database="default", table_name="my_table_name", transformation_ctx="ctx_name", connection_type="postgresql" ) ``` 6. If you're using a custom JDBC driver for PostgreSQL, make sure it's properly uploaded and configured in your Glue environment. 7. Check if there are any version compatibility issues between your PostgreSQL database and the JDBC driver being used by Glue. If you've verified all these points and are still encountering issues, it may be worth testing the connection directly using the AWS Glue connection test feature or the AWSSupport-TroubleshootGlueConnection runbook in AWS Systems Manager. These tools can help identify any network, IAM, or credential issues that might be preventing the connection from working properly. Remember that AWS Glue Studio creates unified connections for PostgreSQL data sources, which may require additional steps for accessing Secrets Manager and VPC resources. Make sure you've completed any necessary additional configuration for these unified connections. If the problem persists after checking these items, you may need to contact AWS support for further assistance, as there could be an underlying issue with the Glue service or how it's interacting with your specific PostgreSQL setup. **Sources** AWS Glue connection properties - AWS Glue Troublehsoot AWS Glue connection to a JDBC data source | AWS re:Post Using custom connectors and connections with AWS Glue Studio - AWS Glue
🌐
GitHub
github.com › awslabs › aws-glue-libs › issues › 108
GlueContext.write_dynamic_frame.from_options · Issue #108 · awslabs/aws-glue-libs
December 9, 2021 - Is it possible to specify when writing the dynamicframe out to S3 that we can pick the storage class to throw it in in S3? glueContext.write_dynamic_frame.from_options(frame=dynamicFrame, connection_type="s3", connection_options={ "path"...
Author   theonlyway