Found a solution for this problem, looks like the dictionary accepts more parameters, the one I needed was "recurse". You can also exclude certain patterns with "exclusions".

Source https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-s3

dyf = glueContext.create_dynamic_frame.from_options(
    's3',
    {
        "paths": [
            's3://bucket/2017/'
        ],
        "recurse" : True
    },
    "json",
    transformation_ctx = "dyf")

Answer from Joshua on Stack Overflow
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframe class
DynamicFrame class - AWS Glue
To access the dataset that is used in this example, see Code example: Data preparation using ResolveChoice, Lambda, and ApplyMapping and follow the instructions in Step 1: Crawl the data in the Amazon S3 bucket. # Example: Use filter to create a new DynamicFrame # with a filtered selection of records from pyspark.context import SparkContext from awsglue.context import GlueContext # Create GlueContext sc = SparkContext.getOrCreate() glueContext = GlueContext(sc) # Create DynamicFrame from Glue Data Catalog medicare = glueContext.create_dynamic_frame.from_options( "s3", { "paths": [ "s3://awsglu
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › gluecontext class
GlueContext class - AWS Glue
__init__ — creating —getSourcecreate_dynamic_frame_from_rddcreate_dynamic_frame_from_catalogcreate_dynamic_frame_from_optionscreate_sample_dynamic_frame_from_catalogcreate_sample_dynamic_frame_from_optionsadd_ingestion_time_columnscreate_data_frame_from_catalogcreate_data_frame_from_optionsforEachBatch — Amazon S3 datasets —purge_tablepurge_s3_pathtransition_tabletransition_s3_path — extracting —extract_jdbc_conf— transactions —start_transactioncommit_transactioncancel_transaction — writing —getSinkwrite_dynamic_frame_from_optionswrite_from_optionswrite_dynamic_frame_from_catalogwrite_data_frame_from_catalogwrite_dynamic_frame_from_jdbc_confwrite_from_jdbc_conf
Discussions

python - glue etl jobs - get s3 subfolders using create_dynamic_frame.from_options - Stack Overflow
I am creating an AWS Glue ETL job, but I'm running into some roadblocks with file retrieval. It seems that the following code only gets the files at the root folder 2017 and not any further. Is th... More on stackoverflow.com
🌐 stackoverflow.com
json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow
Summary: I've got a S3 bucket which contains list of JSON files. Bucket contains child folders which are created by date. All the files contain similar file structure. Files get added on daily basis. More on stackoverflow.com
🌐 stackoverflow.com
Support for format options to be pushed down when using `parquet` as the format
Hope The create_dynamic_frame_from_options method signature accepts format_options object. aws-glue-libs/awsglue/context.py Lines 143 to 144 in 28805fe def create_dynamic_frame_from_options(self, c... More on github.com
🌐 github.com
1
July 25, 2021
amazon web services - How to create dynamic data frame from S3 files in Glue Job in Scala? - Stack Overflow
I'm having problems in converting a Python Glue Job to Scala Glue Job, namely create_dynamic_data_frame_options method. In python the syntax is: dyf = glueContext.create_dynamic_frame_from_options... More on stackoverflow.com
🌐 stackoverflow.com
October 12, 2019
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › program aws glue etl scripts in pyspark › aws glue pyspark extensions reference › dynamicframewriter class
DynamicFrameWriter class - AWS Glue
This example writes the output locally using a connection_type of S3 with a POSIX path argument in connection_options, which allows writing to local storage. glueContext.write_dynamic_frame.from_options(\ frame = dyf_splitFields,\ connection_options = {'path': '/home/glue/GlueLocalOutput/'},\ connection_type = 's3',\ format = 'json') Document Conventions ·
🌐
GitHub
github.com › aws-samples › aws-glue-samples › blob › master › examples › resolve_choice.py
aws-glue-samples/examples/resolve_choice.py at master · aws-samples/aws-glue-samples
glueContext.write_dynamic_frame.from_options(frame = medicare_res_cast, connection_type = "s3", connection_options = {"path": medicare_cast}, format = "json") glueContext.write_dynamic_frame.from_options(frame = medicare_res_project, connection_type = "s3", connection_options = {"path": medicare_project}, format = "json") glueContext.write_dynamic_frame.from_options(frame = medicare_res_make_cols, connection_type = "s3", connection_options = {"path": medicare_cols}, format = "json") glueContext.write_dynamic_frame.from_options(frame = medicare_res_make_struct, connection_type = "s3", connection_options = {"path": medicare_struct}, format = "json") glueContext.write_dynamic_frame.from_options(frame = medicare_sql_dyf, connection_type = "s3", connection_options = {"path": medicare_sql}, format = "json")
Author   aws-samples
🌐
Stack Overflow
stackoverflow.com › questions › 74734233 › create-dynamic-frame-from-s3-bucket-aws-glue
json - Create dynamic frame from S3 bucket AWS Glue - Stack Overflow
Question I am trying to create dynamic frame from options where source is S3 and type is JSON. I'm using following code however it is not returning any value. Where am I going wrong? ... import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job from functools import reduce from awsglue.dynamicframe import DynamicFrame ## @params: [JOB_NAME] args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) df = glueContext.create_dynamic_frame.from_options( connection_type = 's3', connection_options={'paths':['Location for S3 folder']}, format='json', # formatOptions=$..* ) print('Total Count:') df.count()
🌐
Sqlandhadoop
sqlandhadoop.com › aws-glue-create-dynamic-frame
AWS Glue create dynamic frame – SQL & Hadoop
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job glueContext = GlueContext(SparkContext.getOrCreate()) # creating dynamic frame from S3 data dyn_frame_s3 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": ["s3://<bucket name>/data/sales/"], "inferSchema": "true" }, format = "csv", format_options={ "separator": "\t" }, transformation_ctx="") print (dyn_frame_s3.count()) # creating dynamic frame from Glue catalog table dyn_frame_catalog = glueContext.create_dynamic_frame_from_catalog( database = "db_readfile", table_name = "sales", transformation_ctx = "") print (dyn_frame_catalog.count())
Find elsewhere
🌐
Medium
medium.com › @kundansingh0619 › aws-glue-3-aae089693d5a
AWS_Glue_3: Glue(DynamicFrame). GlueContext is the entry point for… | by Kundan Singh | Medium
February 12, 2025 - #create DynamicFame from S3 parquet files datasource0 = glueContext.create_dynamic_frame_from_options( connection_type="s3", connection_options = { "paths": [S3_location] }, format="parquet", transformation_ctx="datasource0")#create DynamicFame from glue catalog datasource0 = glueContext.create_dynamic_frame.from_catalog( database = "demo", table_name = "testtable", transformation_ctx = "datasource0")#convert to spark DataFrame #convert to Glue DynamicFrame df1 = datasource0.toDF() df2 = DynamicFrame.fromDF(df1, glueContext , "df2") df = dynamic_frame.toDF() df.show() print("Dataframe converted")
🌐
GitHub
github.com › awslabs › aws-glue-libs › issues › 90
Support for format options to be pushed down when using `parquet` as the format · Issue #90 · awslabs/aws-glue-libs
July 25, 2021 - def create_dynamic_frame_from_options(self, connection_type, connection_options={}, format=None, format_options={}, transformation_ctx = "", push_down_predicate= "", **kwargs): At first sight, this seems to options that can be configured for a given format. With parquet as the format, something like a basePath is very useful when reading partitioned data from S3. from awsglue.context import GlueContext from awsglue import DynamicFrame # initialize SparkContext and SparkSession spark_context = SparkContext.getOrCreate() glue_context = GlueContext(spark_context) spark = glue_context.spark_sessio
Author   mnoumanshahzad
🌐
Aws-dojo
aws-dojo.com › ws9 › labs › script-to-move-data-s3-to-s3
AWS Dojo - Workshop - Building AWS Glue Job using PySpark - Part:2(of 2)
glueContext.write_dynamic_frame.from_options(productlineDF, connection_type = "s3", connection_options = {"path": "s3://dojo-data-lake/data/productline"}, format = "json")
🌐
GitHub
github.com › aws-samples › aws-glue-samples › blob › master › examples › join_and_relationalize.py
aws-glue-samples/examples/join_and_relationalize.py at master · aws-samples/aws-glue-samples
glueContext.write_dynamic_frame.from_options(frame = l_history, connection_type = "s3", connection_options = {"path": output_history_dir}, format = "parquet") · # Write out a single file to directory "legislator_single" s_history = l_history.toDF().repartition(1) print("Writing to /legislator_single ...") s_history.write.parquet(output_lg_single_dir) ·
Author   aws-samples
🌐
Medium
swapnil-bhoite.medium.com › aws-glue-dynamicframe-transformations-with-example-code-and-output-26e14d13145f
AWS Glue DynamicFrame transformations with example code and output | by Swapnil Bhoite | Medium
April 28, 2022 - This is also a good opportunity to showcase how to load a dataset directly from S3: dyF = glueContext.create_dynamic_frame.from_options( 's3', {'paths': ['s3://awsglue-datasets/examples/medicare/Medicare_Hospital_Provider.csv']}, 'csv', {'withHeader': True})dyF.printSchema()root |-- DRG Definition: string |-- Provider Id: string |-- Provider Name: string |-- Provider Street Address: string |-- Provider City: string |-- Provider State: string |-- Provider Zip Code: string |-- Hospital Referral Region Description: string |-- Total Discharges: string |-- Average Covered Charges: string |-- Average Total Payments: string |-- Average Medicare Payments: string
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › features and optimizations for programming aws glue for spark etl scripts › data format options for inputs and outputs in aws glue for spark › using the json format in aws glue
Using the JSON format in AWS Glue - AWS Glue
// Example: Read JSON from S3 // For show, we handle a nested JSON file that we can limit with the JsonPath parameter // For show, we also handle a JSON where a single entry spans multiple lines // Consider whether optimizePerformance is right for your workflow. import com.amazonaws.services.glue.util.JsonOptions import com.amazonaws.services.glue.{DynamicFrame, GlueContext} import org.apache.spark.SparkContext object GlueApp { def main(sysArgs: Array[String]): Unit = { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) val dynamicFrame = glueContext.getSourceWithFormat( formatOptions=JsonOptions("""{"jsonPath": "$.id", "multiline": true, "optimizePerformance":false}"""), connectionType="s3", format="json", options=JsonOptions("""{"paths": ["s3://s3path"], "recurse": true}""") ).getDynamicFrame() } }
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › features and optimizations for programming aws glue for spark etl scripts › connection types and options for etl in aws glue for spark › amazon s3 connections › reading input files in larger groups
Reading input files in larger groups - AWS Glue
If you are reading from Amazon S3 directly using the create_dynamic_frame.from_options method, add these connection options. For example, the following attempts to group files into 1 MB groups. df = glueContext.create_dynamic_frame.from_options("s3", {'paths': ["s3://s3path/"], 'recurse':True, 'groupFiles': 'inPartition', 'groupSize': '1048576'}, format="json")
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › features and optimizations for programming aws glue for spark etl scripts › connection types and options for etl in aws glue for spark › amazon s3 connections › managing partitions for etl output in aws glue
Managing partitions for ETL output in AWS Glue - AWS Glue
For example, the following Python code writes out a dataset to Amazon S3 in the Parquet format, into directories partitioned by the type field. From there, you can process these partitions using other systems, such as Amazon Athena. glue_context.write_dynamic_frame.from_options( frame = projectedEvents, connection_type = "s3", connection_options = {"path": "$outpath", "partitionKeys": ["type"]}, format = "parquet")
🌐
Stack Overflow
stackoverflow.com › questions › 61119685 › aws-glue-job-fails-at-create-dynamic-frame-from-options-when-reading-from-s3-buc
apache spark - AWS Glue Job fails at create_dynamic_frame_from_options when reading from s3 bucket with lot of files - Stack Overflow
April 9, 2020 - The data inside my s3 bucket looks like this... s3://bucketName/prefix/userId/XYZ.gz · There are around 20 million users, and within each user's subfolder, there will be 1 - 10 files. My glue job starts like this... datasource0 = glueContext.create_dynamic_frame_from_options("s3", {'paths': ["s3://bucketname/prefix/"], 'useS3ListImplementation':True, 'recurse':True, 'groupFiles': 'inPartition', 'groupSize': 100 * 1024 * 1024}, format="json", transformation_ctx = "datasource0") There are a bunch of optimizations like groupFiles, groupSize & useS3ListImplementations I have attempted, as shown above.
🌐
AWS
docs.aws.amazon.com › aws glue › user guide › aws glue programming guide › programming spark scripts › features and optimizations for programming aws glue for spark etl scripts › data format options for inputs and outputs in aws glue for spark › using the parquet format in aws glue
Using the Parquet format in AWS Glue - AWS Glue
// Example: Read Parquet from S3 import com.amazonaws.services.glue.util.JsonOptions import com.amazonaws.services.glue.{DynamicFrame, GlueContext} import org.apache.spark.SparkContext object GlueApp { def main(sysArgs: Array[String]): Unit = { val spark: SparkContext = new SparkContext() val glueContext: GlueContext = new GlueContext(spark) val dynamicFrame = glueContext.getSourceWithFormat( connectionType="s3", format="parquet", options=JsonOptions("""{"paths": ["s3://s3path"]}""") ).getDynamicFrame() } }