Brave Search

AWS glueContext read doesn't allow a sql query

stackoverflow.com › questions › 54094382 › aws-gluecontext-read-doesnt-allow-a-sql-query

I think the problem occured because you didn't use the query in parentheses and provide an alias. In my opinion it should look like in the following example:

 val t = glueContext.read.format("jdbc").option("url","jdbc:mysql://serverIP:port/database").option("user","username").option("password","password").option("dbtable","(select * from table1 where 1=1) as t1").option("driver","com.mysql.jdbc.Driver").load()

More information about parameters in SQL data sources:

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

When it comes to the Glue and the framework which the Glue provides, there is also the option "push_down_predicate", but I have only used this option on the data sources based on S3. I think it doesn't work on other sources than on S3 and non-partitioned data.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html

Answer from jbgorski on Stack Overflow

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › gluecontext class

GlueContext class - AWS Glue

The GlueContext class wraps the Apache Spark SparkContext object in AWS Glue.

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › programming aws glue etl scripts in scala › apis in the aws glue scala library › aws glue scala gluecontext apis

AWS Glue Scala GlueContext APIs - AWS Glue

GlueContext is the entry point for reading and writing a DynamicFrame from and to Amazon Simple Storage Service (Amazon S3), the AWS Glue Data Catalog, JDBC, and so on.

Discussions

aws glue - AWS glueContext read doesn't allow a sql query - Stack Overflow

I want to read filtered data from a Mysql instance using AWS glue job. Since a glue jdbc connection doesnt allow me to push down predicate, I am trying to explicitly create a jdbc connection in my ... More on stackoverflow.com

stackoverflow.com

python - GlueContext necessary for AWS Glue jobs? - Stack Overflow

I checked a few of the questions but none seem to be what I am wondering. I've created several Glue Jobs using GlueContext's syntax but now that I've learned a bit more about PySpark I wanted to cr... More on stackoverflow.com

stackoverflow.com

Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog

Is there a way to use `GlueContext.getSink().writeFrame(...)` to write Apache Iceberg tables? So far I only find the version `GlueContext.write_dynamic_frame.from_options(...)` working which is doc... More on repost.aws

repost.aws

1

0

July 6, 2022

amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow

raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id) df = glueContext.create_dynamic_frame_from_options( More on stackoverflow.com

stackoverflow.com

GitHub

github.com › awslabs › aws-glue-libs › blob › master › awsglue › context.py

aws-glue-libs/awsglue/context.py at master · awslabs/aws-glue-libs

return self._jvm.GlueContext(self._jsc.sc()) else: return self._jvm.GlueContext(self._jsc.sc(), min_partitions, target_partitions) · def getSource(self, connection_type, format = None, transformation_ctx = "", push_down_predicate= "", **options): """Creates a DataSource object.

Author awslabs

Medium

medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a

AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium

February 12, 2025 - GlueContext is the entry point for reading and writing DynamicFrames in AWS Glue. It wraps the Apache SparkSQL SQLContext object providing…

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › features and optimizations for programming aws glue for spark etl scripts › data format options for inputs and outputs in aws glue for spark

Data format options for inputs and outputs in AWS Glue for Spark - AWS Glue

It will then store a representation of your data in the AWS Glue Data Catalog, which can be used within a AWS Glue ETL script to retrieve your data with the GlueContext.create_dynamic_frame.from_catalog method.

Stack Overflow

stackoverflow.com › questions › 54094382 › aws-gluecontext-read-doesnt-allow-a-sql-query

aws glue - AWS glueContext read doesn't allow a sql query - Stack Overflow

Top answer

1 of 4

9

I think the problem occured because you didn't use the query in parentheses and provide an alias. In my opinion it should look like in the following example:

 val t = glueContext.read.format("jdbc").option("url","jdbc:mysql://serverIP:port/database").option("user","username").option("password","password").option("dbtable","(select * from table1 where 1=1) as t1").option("driver","com.mysql.jdbc.Driver").load()

More information about parameters in SQL data sources:

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

When it comes to the Glue and the framework which the Glue provides, there is also the option "push_down_predicate", but I have only used this option on the data sources based on S3. I think it doesn't work on other sources than on S3 and non-partitioned data.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html

2 of 4

2

For anyone who is still searching for further answers/examples, I can confirm that the push_down_predicate option works with ODBC data sources. Here's how I read from SQL Server (in Python).

df = glueContext.read.format("jdbc")
    .option("url","jdbc:sqlserver://server-ip:port;databaseName=db;")
    .option("user","username")
    .option("password","password")
    .option("dbtable","(select t1.*, t2.name from dbo.table1 t1 join dbo.table2 t2 on t1.id = t2.id) as users")
    .option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver")
    .load()

This also works but NOT as I expected. The predicate is not pushed down to the data source.

df = glueContext.create_dynamic_frame.from_catalog(database = "db", table_name = "db_dbo_table1", push_down_predicate = "(id >= 2850700 AND statusCode = 'ACT')")

The documentation on pushDownPredicate states: The option to enable or disable predicate push-down into the JDBC data source. The default value is true, in which case Spark will push down filters to the JDBC data source as much as possible.

Stack Overflow

stackoverflow.com › questions › 62779954 › gluecontext-necessary-for-aws-glue-jobs

python - GlueContext necessary for AWS Glue jobs? - Stack Overflow

I've created several Glue Jobs using GlueContext's syntax but now that I've learned a bit more about PySpark I wanted to create and execute a Glue Job fully scripted by me with PySpark.

Medium

medium.com › @rohitshrivastava87 › 7-shorticles-awsglue-context-4d0788990f82

#7 Shorticles: awsglue.context. The awsglue.context module in AWS Glue… | by Rohit Shrivastava | Medium

May 18, 2023 - The glueContext object can be used to interact with the AWS Glue environment, access AWS Glue features, and perform ETL operations using dynamic frames and other AWS Glue-specific functionalities.

Find elsewhere

Google Bing Mojeek

Bluetab

bluetab.net › en › bluetab

Basic AWS Glue concepts

import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.dynamicframe import DynamicFrame from awsglue.job import Job args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) ## Read Data from a RDS DB using JDBC driver connection_option = { "url": "jdbc:mysql://mysql–instance1.123456789012.us-east-1.rds.amazonaws.com:3306/database", "user": "test", "password": "passwor

AWS re:Post

repost.aws › questions › QU1CKM9J0sQKuP9LQEmMlhOg › use-gluecontext-getsink-for-writing-apache-iceberg-table-to-s3-bucket-and-data-catalog

Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog | AWS re:Post

July 6, 2022 - So far I only find the version GlueContext.write_dynamic_frame.from_options(...) working which is documented at the bottom of https://aws.amazon.com/de/blogs/big-data/implement-a-cdc-based-upsert-in-a-data-lake-using-apache-iceberg-and-aws-glue/. This version does not seem to provide an option to update the data catalog simultaneously.

Poscoict-glueframework

poscoict-glueframework.github.io › 5.1 › apidocs › com › poscoict › glueframework › context › GlueContext.html

GlueContext (glue-framework 5.1.2-RELEASE API)

void setActivityProperties(GlueActivity<GlueContext> activity, Map<String,String> props)

Medium

medium.com › @rohitshrivastava87 › 4-shorticles-aws-glues-integration-with-apache-spark-7dff9c9c1190

#4 Shorticles: AWS Glue’s integration with Apache Spark | by Rohit Shrivastava | Medium

May 10, 2023 - GlueContext is a high-level wrapper around Apache SparkContext that provides additional AWS Glue-specific functionality.

tecracer

tecracer.com › blog › 2021 › 06 › what-i-wish-somebody-had-explained-to-me-before-i-started-to-use-aws-glue.html

What I wish somebody had explained to me before I started to use AWS Glue | tecRacer Amazon AWS Blog

June 22, 2021 - Let’s take a look at an example of a simple PySpark transformation script to get an idea of the kind of code we might write. First we initialize a connection to our Spark cluster and get a GlueContext object. We can then use this GlueContext to read data from our data stores.

Spark By {Examples}

sparkbyexamples.com › home › amazon aws › aws glue pyspark extensions reference

AWS Glue PySpark Extensions Reference - Spark By {Examples}

March 27, 2024 - The GlueContext class is an extension of PySpark’s SparkContext. It’s the entry point of any Glue job and simplifies reading, transforming, and writing data.

Stack Overflow

stackoverflow.com › questions › 74974372 › gluecontext-create-dynamic-frame-from-options-exclude-one-file-type-from-loading

amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow

raw_data_input_path = "s3a://{}/logs/application_id={}/component_id={}/".format(s3BucketName, application_id, component_id) df = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options={ "paths": [raw_data_input_path], "recurse": True, "exclusions": ["**.txt"], }, format="json", transformation_ctx=dbInstance, )

DEV Community

dev.to › aws-builders › 4-shorticles-aws-glues-integration-with-apache-spark-4jha

#4 Shorticles: AWS Glue’s integration with Apache Spark - DEV Community

May 16, 2023 - GlueContext is a high-level wrapper around Apache SparkContext that provides additional AWS Glue-specific functionality.

AWS

docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue python code samples › code example: data preparation using resolvechoice, lambda, and applymapping

Code example: Data preparation using ResolveChoice, Lambda, and ApplyMapping - AWS Glue

import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate())

Stack Overflow

stackoverflow.com › questions › 76245897 › how-to-access-spark-and-gluecontext-in-glue-custom-visual-transform-script

How to access spark and glueContext in Glue Custom Visual Transform script - Stack Overflow

Top answer

1 of 2

1

Don't know if you found the answer, but i was having the same error yesterday. I downgrade my Glue version from 5.0 to 4.0 in Job Details, and it worked as expected. AWS needs to fix this bug, at least 8 months since it's been up. It's impossible to make the postgresql connection work in 5.0.

2 of 2

-1

Yes, AWS Glue does support reading from Data Catalog tables that are backed by PostgreSQL databases. However, the error you're encountering suggests there might be an issue with how the connection is being utilized or configured. When working with PostgreSQL in AWS Glue, it's important to ensure that the connection is properly set up and that the necessary information is being passed to the Glue job. Here are a few things to check: 1. Verify that the connection type for your PostgreSQL RDS instance is correctly set to "JDBC" in the Glue connection settings. 2. Make sure the JDBC URL, username, and password are correctly configured in the connection properties. 3. Check that the VPC, subnet, and security group settings in the Glue connection match those of your RDS instance. 4. Ensure that the IAM role associated with your Glue job has the necessary permissions to access both the Glue Data Catalog and the RDS instance. 5. When creating your dynamic frame, you may need to explicitly specify the connection name. Try modifying your code to include the connection_type parameter: ```python node_name = glueContext.create_dynamic_frame.from_catalog( database="default", table_name="my_table_name", transformation_ctx="ctx_name", connection_type="postgresql" ) ``` 6. If you're using a custom JDBC driver for PostgreSQL, make sure it's properly uploaded and configured in your Glue environment. 7. Check if there are any version compatibility issues between your PostgreSQL database and the JDBC driver being used by Glue. If you've verified all these points and are still encountering issues, it may be worth testing the connection directly using the AWS Glue connection test feature or the AWSSupport-TroubleshootGlueConnection runbook in AWS Systems Manager. These tools can help identify any network, IAM, or credential issues that might be preventing the connection from working properly. Remember that AWS Glue Studio creates unified connections for PostgreSQL data sources, which may require additional steps for accessing Secrets Manager and VPC resources. Make sure you've completed any necessary additional configuration for these unified connections. If the problem persists after checking these items, you may need to contact AWS support for further assistance, as there could be an underlying issue with the Glue service or how it's interacting with your specific PostgreSQL setup. **Sources** AWS Glue connection properties - AWS Glue Troublehsoot AWS Glue connection to a JDBC data source | AWS re:Post Using custom connectors and connections with AWS Glue Studio - AWS Glue

GitHub

github.com › awslabs › aws-glue-libs › issues › 108

GlueContext.write_dynamic_frame.from_options · Issue #108 · awslabs/aws-glue-libs

December 9, 2021 - Is it possible to specify when writing the dynamicframe out to S3 that we can pick the storage class to throw it in in S3? glueContext.write_dynamic_frame.from_options(frame=dynamicFrame, connection_type="s3", connection_options={ "path"...

Author theonlyway