You can use something like: for i, data in enumerate(data_list): df_name = f'df{i + 1}' data_frames[df_name] = pd.DataFrame(data) to dynamically create data frames, or you can use dictionary comprehension: data_frames = {name: pd.DataFrame(data) for name, data in data_frames_data.items()} Answer from Bitwise_Gamgee on reddit.com
🌐
Reddit
reddit.com › r/learnpython › how can i create multiple dataframes at the same time using a single script?
r/learnpython on Reddit: How can I create multiple dataframes at the same time using a single script?
September 25, 2023 -

I'm currently working on simulated data. For each simulation, I have to create a dataframe with around 1500-4500 rows each. All those rows depend on the data of previous rows, so I must iterate over the dataframe to create a new row. I want to repeat this process 500 times, but each of these instances ale completely independent from eachother.

I have scrpits to generate all this data, but it takes too much time since my scripts are only able to run one simulation at a time. Is it possible for a single script to run each simulation in parallel so I can merge then into a single .csv file at the end of all calculations?

🌐
AskPython
askpython.com › home › multiple dataframes in a loop using python
Multiple Dataframes in a Loop Using Python - AskPython
March 31, 2023 - These dataframes are created using the ‘.DataFrame()’ function. For representing these dataframes as a single/combined dataframe ‘.merge()’ function is used in the code.
Find elsewhere
🌐
Reddit
reddit.com › r/learnpython › efficient pandas code when creating multiple dataframes from two initial dataframes
r/learnpython on Reddit: Efficient pandas code when creating multiple dataframes from two initial dataframes
June 25, 2018 -

I have two dataframes from which i want to create multiple new dataframes. My code currently looks like this:

import pandas as pd
df_h = pd.read_csv('filename1.csv',skiprows=6)
df_c = pd.read_csv('filename2.csv', skiprows=6)

merged_tables, sheet_titles = ( [] for i in range(2))

c1 = df_c[(df_c['Document'].str.startswith("AB")) & (df_c['Symbol '] == "ARD")]
h1 = df_h[df_h["Code "] == 7]
h1.at['Total', 'Amount '] = h1['Amount '].sum()
c1.at['Total', 'Amount '] = c1['Amount '].sum()
h1.reset_index(drop=True, inplace=True)
c1.reset_index(drop=True, inplace=True)
merged_table1 = pd.concat([h1,c1],axis=1)
merged_tables.append(merged_table1)
sheet_titles.append(7)

So what I'm doing is basically checking two conditions in first dataframe, one condition in second dataframe and assigning it as new dataframes. Then i'm adding new row to sum one column, reseting index in both dataframes, merging them and appending new dataframe to a list, which i'm later using to create excel file from it.

But i want to create more new dataframes like this:

c10 = df_c[(df_c['Document'].str.startswith("CD")) & (df_c['Symbol '] == "ARD")]
h10 = df_h[df_h["Code "] == 23]
h10.at['Total', 'Amount '] = h10['Amount '].sum()
c10.at['Total', 'Amount '] = c10['Amount '].sum()
h10.reset_index(drop=True, inplace=True)
c10.reset_index(drop=True, inplace=True)
merged_table10 = pd.concat([h10,c10],axis=1)
merged_tables.append(merged_table10)
sheet_titles.append(23)

c19 = df_c[(df_c['Document'].str.startswith("EF")) & (df_c['Symbol '] == "ARD")]
h19 = df_h[df_h["Code "] == 30]
h19.at['Total', 'Amount '] = h19['Amount '].sum()
c19.at['Total', 'Amount '] = c19['Amount '].sum()
h19.reset_index(drop=True, inplace=True)
c19.reset_index(drop=True, inplace=True)
merged_table19 = pd.concat([h19,c19],axis=1)
merged_tables.append(merged_table19)
sheet_titles.append(30)

Currently i'm just explicitly repeating the same code for all new dataframes that i want to create, only changing the conditions and variables name, as i don't know how to wrap my head around writing some for loop to it and reducing amount of code.

Basically, what's always changing for each new dataframe are starting characters from first condition, code number from second and sheet title that's being appended to a list. All the other operations, so suming a column, reseting index, merging selected tables and appending it to a list will always remain the same.

If it would be a csv file or data stored in a different lists, i would just make for loop with many elifs, but as it's pandas dataframe and instead of every element you're usually accessing whole column, i don't know how to write it efficently, as i know that writing up that many variables and repeating such amount of code isn't very efficent.

I know that i have to declare those changing conditions anyway, but wrapping it up in some concise for loop or function would definitely make it more efficient and scalable.

🌐
IncludeHelp
includehelp.com › python › create-multiple-dataframes-in-loop.aspx
Create multiple dataframes in loop in Python
October 3, 2022 - DataFrames are 2-dimensional data structures in pandas. DataFrames consist of rows, columns, and data. Python Loops: Loop is functionality that runs n number of times where the value of n can be defined by the user, hence we are going to use a for loop to create DataFrames.
Top answer
1 of 1
3

I am not sure if you can create the names of dataframes dynamically in PySpark. In Python, you cannot even dynamically assign the names of variables, let alone dataframes.

One way is to create a dictionary of the dataframes, where the key corresponds to each date and the value of that dictionary corresponds to the dataframe.

For Python: Refer to this link, where someone has asked a similar Q on name dynamism.

Here is a small PySpark implementation -

from pyspark.sql.functions import col
values = [('2018-01-01','M',100),('2018-02-01','F',100),('2018-03-01','M',100)]
df = sqlContext.createDataFrame(values,['date','gender','balance'])
df.show()
+----------+------+-------+
|      date|gender|balance|
+----------+------+-------+
|2018-01-01|     M|    100|
|2018-02-01|     F|    100|
|2018-03-01|     M|    100|
+----------+------+-------+

# Creating a dictionary to store the dataframes.
# Key: It contains the date from my_list.
# Value: Contains the corresponding dataframe.
dictionary_df = {}  

my_list = ['2018-01-01', '2018-02-01', '2018-03-01']
for i in my_list:
    dictionary_df[i] = df.filter(col('date')==i)

for i in my_list:
    print('DF: '+i)
    dictionary_df[i].show() 

DF: 2018-01-01
+----------+------+-------+
|      date|gender|balance|
+----------+------+-------+
|2018-01-01|     M|    100|
+----------+------+-------+

DF: 2018-02-01
+----------+------+-------+
|      date|gender|balance|
+----------+------+-------+
|2018-02-01|     F|    100|
+----------+------+-------+

DF: 2018-03-01
+----------+------+-------+
|      date|gender|balance|
+----------+------+-------+
|2018-03-01|     M|    100|
+----------+------+-------+

print(dictionary_df)
    {'2018-01-01': DataFrame[date: string, gender: string, balance: bigint], '2018-02-01': DataFrame[date: string, gender: string, balance: bigint], '2018-03-01': DataFrame[date: string, gender: string, balance: bigint]}
🌐
Saturn Cloud
saturncloud.io › blog › how-to-create-multiple-dataframes-from-other-dataframes-in-pandas
How to Create Multiple DataFrames from Other DataFrames in Pandas | Saturn Cloud Blog
August 25, 2023 - In this blog post, we explored how to create multiple DataFrames from other DataFrames in Pandas. We first looked at how to create multiple DataFrames from a single DataFrame by filtering based on a specific condition.