I think the problem occured because you didn't use the query in parentheses and provide an alias. In my opinion it should look like in the following example:
val t = glueContext.read.format("jdbc").option("url","jdbc:mysql://serverIP:port/database").option("user","username").option("password","password").option("dbtable","(select * from table1 where 1=1) as t1").option("driver","com.mysql.jdbc.Driver").load()
More information about parameters in SQL data sources:
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
When it comes to the Glue and the framework which the Glue provides, there is also the option "push_down_predicate", but I have only used this option on the data sources based on S3. I think it doesn't work on other sources than on S3 and non-partitioned data.
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html
Answer from jbgorski on Stack Overflowaws glue - AWS glueContext read doesn't allow a sql query - Stack Overflow
python - GlueContext necessary for AWS Glue jobs? - Stack Overflow
Use GlueContext.getSink for writing Apache Iceberg table to S3 bucket and Data Catalog
amazon web services - glueContext create_dynamic_frame_from_options exclude one file type from loading? - Stack Overflow
I think the problem occured because you didn't use the query in parentheses and provide an alias. In my opinion it should look like in the following example:
val t = glueContext.read.format("jdbc").option("url","jdbc:mysql://serverIP:port/database").option("user","username").option("password","password").option("dbtable","(select * from table1 where 1=1) as t1").option("driver","com.mysql.jdbc.Driver").load()
More information about parameters in SQL data sources:
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
When it comes to the Glue and the framework which the Glue provides, there is also the option "push_down_predicate", but I have only used this option on the data sources based on S3. I think it doesn't work on other sources than on S3 and non-partitioned data.
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-partitions.html
For anyone who is still searching for further answers/examples, I can confirm that the push_down_predicate option works with ODBC data sources. Here's how I read from SQL Server (in Python).
df = glueContext.read.format("jdbc")
.option("url","jdbc:sqlserver://server-ip:port;databaseName=db;")
.option("user","username")
.option("password","password")
.option("dbtable","(select t1.*, t2.name from dbo.table1 t1 join dbo.table2 t2 on t1.id = t2.id) as users")
.option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver")
.load()
This also works but NOT as I expected. The predicate is not pushed down to the data source.
df = glueContext.create_dynamic_frame.from_catalog(database = "db", table_name = "db_dbo_table1", push_down_predicate = "(id >= 2850700 AND statusCode = 'ACT')")
The documentation on pushDownPredicate states: The option to enable or disable predicate push-down into the JDBC data source. The default value is true, in which case Spark will push down filters to the JDBC data source as much as possible.