create multiple dataframes in for loop python with different names

stackoverflow.com › questions › 30635145 › create-multiple-dataframes-in-loop

Just to underline my comment to @maxymoo's answer, it's almost invariably a bad idea ("code smell") to add names dynamically to a Python namespace. There are a number of reasons, the most salient being:

Created names might easily conflict with variables already used by your logic.
Since the names are dynamically created, you typically also end up using dynamic techniques to retrieve the data.

This is why dicts were included in the language. The correct way to proceed is:

d = {}
for name in companies:
    d[name] = pd.DataFrame()

Nowadays you can write a single dict comprehension expression to do the same thing, but some people find it less readable:

d = {name: pd.DataFrame() for name in companies}

Once d is created the DataFrame for company x can be retrieved as d[x], so you can look up a specific company quite easily. To operate on all companies you would typically use a loop like:

for name, df in d.items():
    # operate on DataFrame 'df' for company 'name'

In Python 2 you were better writing

for name, df in d.iteritems():

because this avoids instantiating the list of (name, df) tuples that .items() creates in the older version. That's now largely of historical interest, though there will of course be Python 2 applications still extant and requiring (hopefully occasional) maintenance.

Answer from holdenweb on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 30635145 › create-multiple-dataframes-in-loop

python - Create multiple dataframes in loop - Stack Overflow

Top answer

1 of 7

161

Created names might easily conflict with variables already used by your logic.
Since the names are dynamically created, you typically also end up using dynamic techniques to retrieve the data.

This is why dicts were included in the language. The correct way to proceed is:

d = {}
for name in companies:
    d[name] = pd.DataFrame()

Nowadays you can write a single dict comprehension expression to do the same thing, but some people find it less readable:

d = {name: pd.DataFrame() for name in companies}

Once d is created the DataFrame for company x can be retrieved as d[x], so you can look up a specific company quite easily. To operate on all companies you would typically use a loop like:

for name, df in d.items():
    # operate on DataFrame 'df' for company 'name'

In Python 2 you were better writing

for name, df in d.iteritems():

2 of 7

You can do this (although obviously use exec with extreme caution if this is going to be public-facing code)

for c in companies:
     exec('{} = pd.DataFrame()'.format(c))

Posit Community

forum.posit.co › general

Create a for loop to make multiple data frames? - General - Posit Community

June 3, 2021 - I'm having trouble making a loop that will iterate through my data and create multiple data frames. Here's some dummy data: mydf <- data.frame("color"=c("blue","yellow","red","green","pink","orange","cyan"), …

Videos

youtube.com

Pandas : Create multiple dataframes in loop - YouTube

09:00

YouTube

Python Tutorial: Working with multiple dataframes in Pandas - Concat ...

March 22, 2020

04:50

YouTube

Python 3 Programming Tutorial 13 | Loops | How to loop over dataframe ...

July 4, 2018

View all

Databricks Community

community.databricks.com › t5 › data-engineering › python-generate-new-dfs-from-a-list-of-dataframes-using-for-loop › td-p › 21650

Python: Generate new dfs from a list of dataframes using for loop

December 2, 2022 - df_list=[df2_b2b_fast, df2_b2c_fast] # Subset of dfs for x in df_list['b2b_b2c_prod']: locals()['corrs_' + x ] = df_list[(df_list['b2b_b2c_prod'] == x ) ] # Create new 2 dfs from main df 'b2b_b2c_prod' x= x.groupby(['bus_nm','id']).corr(method='spearman').unstack().iloc[:,1] # calculate corr between pkg_yld and ADV # stuck here...lines needed to create dataframes corrs_b2b_fast, corrs_b2c_fast?? ... Is there a Sample Java Program using Databricks Connect Library to query a table In the Free Editio? in Warehousing & Analytics Tuesday · AI Model to Generate Python Utilities in Generative AI 02-10-2026

AskPython

askpython.com › home › multiple dataframes in a loop using python

Multiple Dataframes in a Loop Using Python - AskPython

March 31, 2023 - So, after printing the dictionary, we can see that empty dataframes are created for each element of the list. Here, we have not entered any data for each column so it’ll be printed as empty data columns. ... This way, we can create multiple data frames using a loop in Python language.

IncludeHelp

includehelp.com › python › create-multiple-dataframes-in-loop.aspx

Create multiple dataframes in loop in Python

October 3, 2022 - Write a Python program to create multiple dataframes in loop · To create multiple dataframes in loop, you can create a list that contains the name of different fruits, and then loop over this list, and on each traversal of the element.

reddit.com › r/learnpython › how can i create multiple dataframes at the same time using a single script?

r/learnpython on Reddit: How can I create multiple dataframes at the same time using a single script?

September 25, 2023 -

I'm currently working on simulated data. For each simulation, I have to create a dataframe with around 1500-4500 rows each. All those rows depend on the data of previous rows, so I must iterate over the dataframe to create a new row. I want to repeat this process 500 times, but each of these instances ale completely independent from eachother.

I have scrpits to generate all this data, but it takes too much time since my scripts are only able to run one simulation at a time. Is it possible for a single script to run each simulation in parallel so I can merge then into a single .csv file at the end of all calculations?

Top answer

1 of 2

You can use something like: for i, data in enumerate(data_list): df_name = f'df{i + 1}' data_frames[df_name] = pd.DataFrame(data) to dynamically create data frames, or you can use dictionary comprehension: data_frames = {name: pd.DataFrame(data) for name, data in data_frames_data.items()}

2 of 2

You don't want to build dataframes row by row because that's very slow. You want to simulate all the data then build a dataframe once at the end. But as for doing 500 simulations in parallel you're going to want a multiprocessing pool.

reddit.com › r/learnpython › python looping multiple dataframes

r/learnpython on Reddit: Python Looping multiple dataframes

November 3, 2021 -

I am learning python and is having trouble accessing data from multiple dataframes.

I want to make multiple bar plot with different dataframes. All of the dataframes have the same columns. So I thought, instead of writing the code one by one, maybe I could somehow iterate through the dataframes. But I haven't find the right way to do it. Could anyone advice me? I am curious if it can be done in one go instead of writing it for every dataframe.

Top answer

1 of 4

I recommend using matplotlib when plotting, specifically its objected-oriented approach. From here you should be able to save the axes object for later.

2 of 4

for df in [df1, df2, df3]: pd.plot.bar(df["test"])

Stack Overflow

stackoverflow.com › questions › 48888001 › creating-multiple-dataframes-with-a-loop › 48888621

python 3.x - Creating multiple dataframes with a loop - Stack Overflow

Top answer

1 of 3

I think you think your code is doing something that it is not actually doing.

Specifically, this line: df = pd.read_csv(file)

You might think that in each iteration through the for loop this line is being executed and modified with df being replaced with a string in dfs and file being replaced with a filename in files. While the latter is true, the former is not.

Each iteration through the for loop is reading a csv file and storing it in the variable df effectively overwriting the csv file that was read in during the previous for loop. In other words, df in your for loop is not being replaced with the variable names you defined in dfs.

The key takeaway here is that strings (e.g., 'df1', 'df2', etc.) cannot be substituted and used as variable names when executing code.

One way to achieve the result you want is store each csv file read by pd.read_csv() in a dictionary, where the key is name of the dataframe (e.g., 'df1', 'df2', etc.) and value is the dataframe returned by pd.read_csv().

list_of_dfs = {}
for df, file in zip(dfs, files):
    list_of_dfs[df] = pd.read_csv(file)
    print(list_of_dfs[df].shape)
    print(list_of_dfs[df].dtypes)
    print(list(list_of_dfs[df]))

You can then reference each of your dataframes like this:

print(list_of_dfs['df1'])
print(list_of_dfs['df2'])

You can learn more about dictionaries here:

https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries

2 of 3

Use dictionary to store you DataFrames and access them by name

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
dfs_names = ('df1', 'df2', 'df3', 'df4', 'df5', 'df6')
dfs ={}
for dfn,file in zip(dfs_names, files):
    dfs[dfn] = pd.read_csv(file)
    print(dfs[dfn].shape)
    print(dfs[dfn].dtypes)
print(dfs['df3'])

Use list to store you DataFrames and access them by index

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
dfs = []
for file in  files:
    dfs.append( pd.read_csv(file))
    print(dfs[len(dfs)-1].shape)
    print(dfs[len(dfs)-1].dtypes)
print (dfs[2])

Do not store intermediate DataFrame, just process them and add to resulting DataFrame.

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
df = pd.DataFrame()
for file in  files:
    df_n =  pd.read_csv(file)
    print(df_n.shape)
    print(df_n.dtypes)
    # do you want to do
    df = df.append(df_n)
print (df)

If you will process them differently, then you do not need a general structure to store them. Do it simply independent.

df = pd.DataFrame()
def do_general_stuff(d): #here we do common things with DataFrame
    print(d.shape,d.dtypes)

df1 = pd.read_csv("data1.csv")
# do you want to with df1

do_general_stuff(df1)
df = df.append(df1)
del df1

df2 = pd.read_csv("data2.csv")
# do you want to with df2

do_general_stuff(df2)
df = df.append(df2)
del df2

df3 = pd.read_csv("data3.csv")
# do you want to with df3

do_general_stuff(df3)
df = df.append(df3)
del df3

# ... and so on

And one geeky way, but don't ask how it works:)

from collections import namedtuple
files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv']

df = namedtuple('Cdfs',
                ['df1', 'df2', 'df3', 'df4', 'df5', 'df6']
               )(*[pd.read_csv(file) for file in files])

for df_n in df._fields:
    print(getattr(df, df_n).shape,getattr(df, df_n).dtypes)

print(df.df3)

Find elsewhere

Google Bing Mojeek

Stack Overflow

stackoverflow.com › questions › 66770603 › how-to-use-a-for-loop-to-create-multiple-dataframes-with-different-names

python - How to use a for loop to create multiple dataframes with different names - Stack Overflow

Top answer

1 of 2

You are trying to create variable variable names. This is not recommended. Rather you could create a dictionary of dataframes with their variable names as keys:

dataframes = {key[‘name’] + 'DF': CalcDF[CalcDF.Cat == CatDict[key]['number']] for key in CatDict}

2 of 2

You may utilize the built-in function exec as:

for key in CatDict:
CatName = CatDict[key]['name']
print('This is the ' + CatName + 'info')
exec('{}DF = CalcDF[CalcDF.Cat == CatDict[key]['number']]'.format(CatName))

reddit.com › r/learnpython › efficient pandas code when creating multiple dataframes from two initial dataframes

r/learnpython on Reddit: Efficient pandas code when creating multiple dataframes from two initial dataframes

June 25, 2018 -

I have two dataframes from which i want to create multiple new dataframes. My code currently looks like this:

import pandas as pd
df_h = pd.read_csv('filename1.csv',skiprows=6)
df_c = pd.read_csv('filename2.csv', skiprows=6)

merged_tables, sheet_titles = ( [] for i in range(2))

c1 = df_c[(df_c['Document'].str.startswith("AB")) & (df_c['Symbol '] == "ARD")]
h1 = df_h[df_h["Code "] == 7]
h1.at['Total', 'Amount '] = h1['Amount '].sum()
c1.at['Total', 'Amount '] = c1['Amount '].sum()
h1.reset_index(drop=True, inplace=True)
c1.reset_index(drop=True, inplace=True)
merged_table1 = pd.concat([h1,c1],axis=1)
merged_tables.append(merged_table1)
sheet_titles.append(7)

So what I'm doing is basically checking two conditions in first dataframe, one condition in second dataframe and assigning it as new dataframes. Then i'm adding new row to sum one column, reseting index in both dataframes, merging them and appending new dataframe to a list, which i'm later using to create excel file from it.

But i want to create more new dataframes like this:

c10 = df_c[(df_c['Document'].str.startswith("CD")) & (df_c['Symbol '] == "ARD")]
h10 = df_h[df_h["Code "] == 23]
h10.at['Total', 'Amount '] = h10['Amount '].sum()
c10.at['Total', 'Amount '] = c10['Amount '].sum()
h10.reset_index(drop=True, inplace=True)
c10.reset_index(drop=True, inplace=True)
merged_table10 = pd.concat([h10,c10],axis=1)
merged_tables.append(merged_table10)
sheet_titles.append(23)

c19 = df_c[(df_c['Document'].str.startswith("EF")) & (df_c['Symbol '] == "ARD")]
h19 = df_h[df_h["Code "] == 30]
h19.at['Total', 'Amount '] = h19['Amount '].sum()
c19.at['Total', 'Amount '] = c19['Amount '].sum()
h19.reset_index(drop=True, inplace=True)
c19.reset_index(drop=True, inplace=True)
merged_table19 = pd.concat([h19,c19],axis=1)
merged_tables.append(merged_table19)
sheet_titles.append(30)

Currently i'm just explicitly repeating the same code for all new dataframes that i want to create, only changing the conditions and variables name, as i don't know how to wrap my head around writing some for loop to it and reducing amount of code.

Basically, what's always changing for each new dataframe are starting characters from first condition, code number from second and sheet title that's being appended to a list. All the other operations, so suming a column, reseting index, merging selected tables and appending it to a list will always remain the same.

If it would be a csv file or data stored in a different lists, i would just make for loop with many elifs, but as it's pandas dataframe and instead of every element you're usually accessing whole column, i don't know how to write it efficently, as i know that writing up that many variables and repeating such amount of code isn't very efficent.

I know that i have to declare those changing conditions anyway, but wrapping it up in some concise for loop or function would definitely make it more efficient and scalable.

Top answer

1 of 1

You're definitely along the right lines that you want a for loop. Also this merged_tables list looks hopeful. if you put all the different parameters into lists, you can do something like this:

names = ('me', 'you')
nums = (6, 9)
tables = []
for name, num in zip(names, nums):
    tables.append(make_table(name, num)

Whenever you start naming variables thing1, thing2, thing3... It's a pretty sure sign you should put them into a list or dictionary.

Stack Exchange

datascience.stackexchange.com › questions › 70325 › iterate-over-multiple-dataframe-rows-at-the-same-time

python - Iterate over multiple dataframe rows at the same time - Data Science Stack Exchange

Top answer

1 of 1

First: I think you want the product functionality, not zip, since you are checking every df with every ref. In zip, you would check df_a with ref_1 and df_b with ref_2 only.

Second: Your can look at the equation $(1+2+3+4)−(5+5+5+5)$ as $(1-5) + (2-5) + ...$ which is simply subtracting data frames and sum over columns.

With these two consideration, assuming you have defined your objects as follows:

df_a = {
    'name': 'df_a',
    'value': pd.DataFrame([[1, 2, 3, 4], [2, 4, 6, 8]])
}
df_b = {
    'name': 'df_b',
    'value': pd.DataFrame([[10, 5, 2, 1], [4, 4, 6, 2]])
}

ref_1 = {
    'name': 'ref_1',
    'value': pd.DataFrame([[5, 5, 5, 5], [5, 5, 5, 5]])
}
ref_2 = {
    'name': 'ref_b',
    'value': pd.DataFrame([[3, 3, 3, 3], [3, 3, 3, 3]])
}

I did this because I want to use the names in creating the name of the columns of your final df. Then your code would be:

from itertools import product

final_result = pd.DataFrame(
    {
        '{}_{}'.format(df['name'], ref['name']): (df['value']-ref['value']).sum(axis=1)
        for (df, ref) in product([df_a, df_b], [ref_1, ref_2])
    }
)

I have used dictionary comprehension to skip the ugly loop/append solution.
product function from itertools does your iteration. product on (ab, cd) gives you ac, ad, bc, bd
as for keys, df names are joined together with _, and as for values, I have subtracted two dfs and sum over columns (axis=1)

The result would then be as you expect:

   df_a_ref_1  df_a_ref_b  df_b_ref_1  df_b_ref_b
0         -10          -2          -2           6
1           0           8          -4           4

Still if you want to expand the dictionary comprehension or do not want to define dictionaries of names/values, of course you can imagine how you can write simple for loops with the same logic:

for (df, ref) in product([df_a, df_b], [ref_1, ref_2]):
    # your desired columns
    col = (df - ref).sum(axis=1)

Kaggle

kaggle.com › questions-and-answers › 93779

Looping multiple dataframes?

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

Stack Overflow

stackoverflow.com › questions › 52598090 › how-can-i-create-a-multiple-new-dataframes-inside-a-for-loop

python - How can I create a multiple new dataframes inside a for loop? - Stack Overflow

Top answer

1 of 3

I would generally discourage you from creating lots of variables with related names which is a dangerous design pattern in Python (although it's common in SAS for example). A better option would be to create a dictionary of dataframes with the key as your 'variable name'

df_dict = dict()
for df in 2011, 2012, 2013:
   df_dict["pivot_"+df.name] = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')

I'm assuming here that your dataframes have the names "2011", "2012", "2013"

2 of 3

I don't see any other way but to create a list or a dictionary of data frames, you'd have to name them manually otherwise.

df_list = [pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count') for df in 2011, 2012, 2013]

You can find an example here.

Stack Overflow

stackoverflow.com › questions › 71979137 › create-multiple-data-frame-by-for-loop-with-new-name-base-on-another-data-frame

python - create multiple Data frame by for loop with new name, base on another data frame - Stack Overflow

... Aside - Do not flood your global environment with many similarly structured data frames. Instead, create single list or dict of many data frame elements. Also avoid growing a data frame in a loop.

Stack Overflow

stackoverflow.com › questions › 35374995 › create-multiple-dataframe-using-for-loop-in-python-2-7 › 35394283

pandas - Create multiple dataframe using for loop in python 2.7 - Stack Overflow

Top answer

1 of 3

I got the answer which I was looking for

import pandas as pd
gbl = globals()
for i in locations:
gbl['df_'+i] = df[df.Start_Location==i]

This will create 3 data frames df_HOME, df_office and df_SHOPPING

Thanks,

2 of 3

Use groupby() and then call it's get_group() method:

import pandas as pd
import io

text = b"""Start_Location  End_Location    Date
OFFICE          HOME            3-Apr-15
OFFICE          HOME            3-Apr-15
HOME            SHOPPING    3-Apr-15
HOME            SHOPPING    4-Apr-15
HOME            SHOPPING    4-Apr-15
SHOPPING    HOME            5-Apr-15
SHOPPING    HOME            5-Apr-15
HOME            SHOPPING    5-Apr-15"""

locations = ["HOME", "OFFICE", "SHOPPING"]

df = pd.read_csv(io.BytesIO(text), delim_whitespace=True)
g = df.groupby("Start_Location")
for name, df2 in g:
    globals()["df_" + name.lower()] = df2

but I think add global variables in a for loop isn't a good method, you can convert the groupby to a dict by:

d = dict(iter(g))

then you can use d["HOME"] to get the data.

Stack Overflow

stackoverflow.com › questions › 49841954 › python-create-dataframes-with-different-names-in-loop

Python: Create Dataframes with different names in Loop - Stack Overflow

Top answer

1 of 1

I think better is create dictionary of DataFrames like:

tables={x: pd.read_csv(x + ".csv", header = 0, encoding = "latin1") for x in languages_list}

print (tables['EN'])

Stack Overflow

stackoverflow.com › questions › 68579646 › multiple-dataframes-with-different-names-using-for-loop

python - multiple dataframes, with different names, using for loop - Stack Overflow

im trying to t to create 4 dataframes with an unique id. For example df1, df2,...,df4. And each one should contain data from a main dataframe. For example, the main df looks like this id ...

Stack Overflow

stackoverflow.com › questions › 58872310 › rename-and-create-a-dataframe-inside-a-for-loop

python - Rename and create a dataframe inside a For loop? - Stack Overflow

To do this, you could use a dictionary: import pandas as pd df_dict = {} name = ['jan','feb'] for month in name: df_dict[month] = pd.DataFrame([month]) for month in name: print("key: ", month) print("dataframe:") print(df_dict[month], end='\n\n')

Stack Overflow

stackoverflow.com › questions › 70069743 › create-multiple-dataframes-with-a-for-loop

python - Create multiple DataFrames with a for loop - Stack Overflow

@KrishnaGupta "I'm trying to create code that will output a single-row DataFrame with a column showing the percentage difference of the current price, to the median value for lag (in days) in the 'cutoffs' array." (expected output table directly below that)