Just to underline my comment to @maxymoo's answer, it's almost invariably a bad idea ("code smell") to add names dynamically to a Python namespace. There are a number of reasons, the most salient being:

  1. Created names might easily conflict with variables already used by your logic.

  2. Since the names are dynamically created, you typically also end up using dynamic techniques to retrieve the data.

This is why dicts were included in the language. The correct way to proceed is:

d = {}
for name in companies:
    d[name] = pd.DataFrame()

Nowadays you can write a single dict comprehension expression to do the same thing, but some people find it less readable:

d = {name: pd.DataFrame() for name in companies}

Once d is created the DataFrame for company x can be retrieved as d[x], so you can look up a specific company quite easily. To operate on all companies you would typically use a loop like:

for name, df in d.items():
    # operate on DataFrame 'df' for company 'name'

In Python 2 you were better writing

for name, df in d.iteritems():

because this avoids instantiating the list of (name, df) tuples that .items() creates in the older version. That's now largely of historical interest, though there will of course be Python 2 applications still extant and requiring (hopefully occasional) maintenance.

Answer from holdenweb on Stack Overflow
🌐
AskPython
askpython.com › home › multiple dataframes in a loop using python
Multiple Dataframes in a Loop Using Python - AskPython
March 31, 2023 - So, after printing the dictionary, we can see that empty dataframes are created for each element of the list. Here, we have not entered any data for each column so it’ll be printed as empty data columns. ... This way, we can create multiple data frames using a loop in Python language.
Discussions

python 3.x - Creating multiple dataframes with a loop - Stack Overflow
This undoubtedly reflects lack of knowledge on my part, but I can't find anything online to help. I am very new to programming. I want to load 6 csvs and do a few things to them before combining them More on stackoverflow.com
🌐 stackoverflow.com
February 20, 2018
Create a for loop to make multiple data frames?
I'm having trouble making a loop that will iterate through my data and create multiple data frames. Here's some dummy data: mydf More on forum.posit.co
🌐 forum.posit.co
0
1
June 3, 2021
python - How can I create a multiple new dataframes inside a for loop? - Stack Overflow
You should use a dict to save the dataframes you are creating, where "2011_pivot", "2012_pivot" and "2013_pivot" are the keys. ... Did an answer below help? If so, feel free to accept one, or ask for clarification. ... I would generally discourage you from creating lots of variables with related names which is a dangerous design pattern in Python ... More on stackoverflow.com
🌐 stackoverflow.com
Python Looping multiple dataframes
I recommend using matplotlib when plotting, specifically its objected-oriented approach. From here you should be able to save the axes object for later. More on reddit.com
🌐 r/learnpython
5
6
November 3, 2021
🌐
IncludeHelp
includehelp.com › python › create-multiple-dataframes-in-loop.aspx
Create multiple dataframes in loop in Python
October 3, 2022 - Python Loops: Loop is functionality ... a for loop to create DataFrames. Python Dictionaries: Dictionaries are used to store heterogeneous data. The data is stored in key:value pair. A dictionary is a collection that is mutable and ordered in nature and does not allow duplicates which mean there are unique keys in a dictionary. A dictionary key can have any type of data as its value, for example, a list, tuple, string, or dictionary itself. Write a Python program to create multiple dataframes ...
Top answer
1 of 3
3

I think you think your code is doing something that it is not actually doing.

Specifically, this line: df = pd.read_csv(file)

You might think that in each iteration through the for loop this line is being executed and modified with df being replaced with a string in dfs and file being replaced with a filename in files. While the latter is true, the former is not.

Each iteration through the for loop is reading a csv file and storing it in the variable df effectively overwriting the csv file that was read in during the previous for loop. In other words, df in your for loop is not being replaced with the variable names you defined in dfs.

The key takeaway here is that strings (e.g., 'df1', 'df2', etc.) cannot be substituted and used as variable names when executing code.

One way to achieve the result you want is store each csv file read by pd.read_csv() in a dictionary, where the key is name of the dataframe (e.g., 'df1', 'df2', etc.) and value is the dataframe returned by pd.read_csv().

list_of_dfs = {}
for df, file in zip(dfs, files):
    list_of_dfs[df] = pd.read_csv(file)
    print(list_of_dfs[df].shape)
    print(list_of_dfs[df].dtypes)
    print(list(list_of_dfs[df]))

You can then reference each of your dataframes like this:

print(list_of_dfs['df1'])
print(list_of_dfs['df2'])

You can learn more about dictionaries here:

https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries

2 of 3
3

Use dictionary to store you DataFrames and access them by name

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
dfs_names = ('df1', 'df2', 'df3', 'df4', 'df5', 'df6')
dfs ={}
for dfn,file in zip(dfs_names, files):
    dfs[dfn] = pd.read_csv(file)
    print(dfs[dfn].shape)
    print(dfs[dfn].dtypes)
print(dfs['df3'])

Use list to store you DataFrames and access them by index

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
dfs = []
for file in  files:
    dfs.append( pd.read_csv(file))
    print(dfs[len(dfs)-1].shape)
    print(dfs[len(dfs)-1].dtypes)
print (dfs[2])

Do not store intermediate DataFrame, just process them and add to resulting DataFrame.

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv')
df = pd.DataFrame()
for file in  files:
    df_n =  pd.read_csv(file)
    print(df_n.shape)
    print(df_n.dtypes)
    # do you want to do
    df = df.append(df_n)
print (df)

If you will process them differently, then you do not need a general structure to store them. Do it simply independent.

df = pd.DataFrame()
def do_general_stuff(d): #here we do common things with DataFrame
    print(d.shape,d.dtypes)

df1 = pd.read_csv("data1.csv")
# do you want to with df1

do_general_stuff(df1)
df = df.append(df1)
del df1

df2 = pd.read_csv("data2.csv")
# do you want to with df2

do_general_stuff(df2)
df = df.append(df2)
del df2

df3 = pd.read_csv("data3.csv")
# do you want to with df3

do_general_stuff(df3)
df = df.append(df3)
del df3

# ... and so on

And one geeky way, but don't ask how it works:)

from collections import namedtuple
files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv']

df = namedtuple('Cdfs',
                ['df1', 'df2', 'df3', 'df4', 'df5', 'df6']
               )(*[pd.read_csv(file) for file in files])

for df_n in df._fields:
    print(getattr(df, df_n).shape,getattr(df, df_n).dtypes)

print(df.df3)
🌐
Posit Community
forum.posit.co › general
Create a for loop to make multiple data frames? - General - Posit Community
June 3, 2021 - I'm having trouble making a loop that will iterate through my data and create multiple data frames. Here's some dummy data: mydf <- data.frame("color"=c("blue","yellow","red","green","pink","orange","cyan"), …
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › python-generate-new-dfs-from-a-list-of-dataframes-using-for-loop › td-p › 21650
Python: Generate new dfs from a list of dataframes using for loop
December 2, 2022 - I have a list of dataframes (for this example 2) and want to apply a for-loop to the list of frames to generate 2 new dataframes. To start, here is my starting dataframe called df_final: First, I create 2 dataframes: df2_b2c_fast, df2_b2b_fast: for x in df_final['b2b_b2c_prod'].unique(): local...
🌐
Cnasolution
cnasolution.com › questions › 196553 › create-multiple-dataframes-in-loop
cna solution | Create multiple dataframes in loop
February 20, 2020 - To operate on all companies you would typically use a loop like: for name, df in d.items(): # operate on DataFrame 'df' for company 'name' In Python 2 you are better writing for name, df in d.iteritems(): because this avoids instantiating a list of (name, df) tuples. Adding to the above great answers. The above will work flawless if you need to create empty data frames but if you need to create multiple dataframe based on some filtering: Suppose the list you got is a column of some dataframe and you want to make multiple data frames for each unique companies fro the bigger data frame:- First t
Find elsewhere
🌐
Kaggle
kaggle.com › questions-and-answers › 93779
Looping multiple dataframes?
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Reddit
reddit.com › r/learnpython › python looping multiple dataframes
r/learnpython on Reddit: Python Looping multiple dataframes
November 3, 2021 -

I am learning python and is having trouble accessing data from multiple dataframes.

I want to make multiple bar plot with different dataframes. All of the dataframes have the same columns. So I thought, instead of writing the code one by one, maybe I could somehow iterate through the dataframes. But I haven't find the right way to do it. Could anyone advice me? I am curious if it can be done in one go instead of writing it for every dataframe.

Top answer
1 of 3
2

This should do it:

for i in province_id:
    for j in year:
        locals()['sub_data_{}_{}'.format(i,j)] = data[(data.provid==i) & (data.wave==j)]

I initially suggested using exec, which is not usually considered best practice for safety reasons. Having said so, if your code is not exposed to anyone with malicious intentions, it should be OK, and I'll leave it here for the sake of completeness:

for i in province_id:
    for j in year:
        exec "sub_data_{}_{} = data[(data.provid==i) & (data.wave==j)]".format(i,j)

Nevertheless, for most use cases, it's probably better to use a collection of some sort, e.g. a dictionary, because it will be cumbersome to reference dynamically generated variable names in subsequent parts of your code. It's also a one-liner:

data_dict = {key:g for key,g in data.groupby(['provid','wave'])}
2 of 3
2

I think the best is create dictionary of DataFrames with groupby with filtering first by boolean indexing:

df = pd.DataFrame({'A':list('abcdef'),
                   'wave':[2004,2005,2004,2005,2005,2004],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'provid':list('aaabbb')})

print (df)
   A  C  D  E provid  wave
0  a  7  1  5      a  2004
1  b  8  3  3      a  2005
2  c  9  5  6      a  2004
3  d  4  7  9      b  2005
4  e  2  1  2      b  2005
5  f  3  0  4      b  2004


province_id = ['a','b']
year = [2004]
df = df[(df.provid.isin(province_id)) &(df.wave.isin(year))]
print (df)
   A  C  D  E provid  wave
0  a  7  1  5      a  2004
2  c  9  5  6      a  2004
5  f  3  0  4      b  2004

dfs = {'{0[0]}_{0[1]}'.format(i) : x for i, x in df.groupby(['provid','wave'])}

Another solution:

dfs = dict(tuple(df.groupby(df['provid'] + '_' + df['wave'].astype(str))))

print (dfs)
{'a_2004':    A  C  D  E provid  wave
0  a  7  1  5      a  2004
2  c  9  5  6      a  2004, 'b_2004':    A  C  D  E provid  wave
5  f  3  0  4      b  2004}

Last you can select each DataFrame:

print (dfs['b_2004'])
   A  C  D  E provid  wave
5  f  3  0  4      b  2004

Your answer should be changed by:

sub_data = {}
province_id = ['a','b']
year = [2004]
for i in province_id:
    for j in year:
         sub_data[i + '_' + str(j)] = df[(df.provid==i) &(df.wave==j)]

print (sub_data)
{'a_2004':    A  C  D  E provid  wave
0  a  7  1  5      a  2004
2  c  9  5  6      a  2004, 'b_2004':    A  C  D  E provid  wave
5  f  3  0  4      b  2004}
🌐
Medium
medium.com › @zeebrockeraa › create-list-of-dataframes-for-loop-python-b64acb9369e2
Create List of dataframes For Loop Python | by Zeeshan Ali | Medium
July 15, 2023 - Create List of dataframes For Loop Python To create a list of DataFrames in Python, you can use the pandas library. Here’s an example of how you can create a list of DataFrames and then loop …
Top answer
1 of 1
1

IIUC, I was able to achieve what you wanted.

import pandas as pd
import numpy as np

# source data for the dataframe
data = {
"ID":["x","y","z","x","y","z","x","y","a","b","x"],
"Date":["May 01","May 02","May 04","May 01","May 01","May 02","May 01","May 05","May 06","May 08","May 10"],
"Amount":[10,20,30,40,50,60,70,80,90,100,110]
}

df = pd.DataFrame(data)

# convert the Date column to datetime and still maintain the format like "May 01"
df['Date'] = pd.to_datetime(df['Date'], format='%b %d').dt.strftime('%b %d')

# sort the values on ID and Date
df.sort_values(by=['ID', 'Date'], inplace=True)
df.reset_index(inplace=True, drop=True)

print(df)

Original Dataframe:

    Amount    Date ID
0       90  May 06  a
1      100  May 08  b
2       10  May 01  x
3       40  May 01  x
4       70  May 01  x
5      110  May 10  x
6       50  May 01  y
7       20  May 02  y
8       80  May 05  y
9       60  May 02  z
10      30  May 04  z

.

# create a list of unique ids
list_id = sorted(list(set(df['ID'])))

# create an empty list that would contain dataframes
df_list = []

# count of iterations that must be seperated out
# for example if we want to record 3 entries for 
# each id, the iter would be 3. This will create
# three new dataframes that will hold transactions
# respectively. 
iter = 3
for i in range(iter):
    df_list.append(pd.DataFrame())


for val in list_id:
    tmp_df = df.loc[df['ID'] == val].reset_index(drop=True)

    # consider only the top iter(=3) values to be distributed
    counter = np.minimum(tmp_df.shape[0], iter)
    for idx in range(counter):
        df_list[idx] = df_list[idx].append(tmp_df.loc[tmp_df.index == idx])

for df in df_list:
    df.reset_index(drop=True, inplace=True)
    print(df)

Transaction #1:

   Amount    Date ID
0      90  May 06  a
1     100  May 08  b
2      10  May 01  x
3      50  May 01  y
4      60  May 02  z

Transaction #2:

   Amount    Date ID
0      40  May 01  x
1      20  May 02  y
2      30  May 04  z

Transaction #3:

   Amount    Date ID
0      70  May 01  x
1      80  May 05  y

Note that in your data, there are four transactions for 'x'. If lets say you wanted to track the 4th iterative transaction as well. All you need to do is change the value if 'iter' to 4 and you will get the fourth dataframe as well with the following value:

   Amount    Date ID
0     110  May 10  x
🌐
Stack Overflow
stackoverflow.com › questions › 61490942 › python-import-multiple-dataframes-using-for-loop
pandas - Python: Import multiple dataframes using for loop - Stack Overflow
I have the following code which works to import a dataframe. #read tblA tbl = 'a' cols = 'imp_a' usecols = dfDD[dfDD[cols].notnull()][cols].values.tolist() dfa = getdf(tbl, dfRT, sfsession) dfa =...
🌐
Stack Overflow
stackoverflow.com › questions › 70069743 › create-multiple-dataframes-with-a-for-loop
python - Create multiple DataFrames with a for loop - Stack Overflow
I'm sorry to say this is where I'm up to. Could use of dicts help here? If there's any other info I can provide please let me know :) python · pandas · dataframe · for-loop · Share · Improve this question · Follow · edited Nov 23, 2021 at 8:22 · asked Nov 22, 2021 at 17:08 ·
Top answer
1 of 1
3

First: I think you want the product functionality, not zip, since you are checking every df with every ref. In zip, you would check df_a with ref_1 and df_b with ref_2 only.

Second: Your can look at the equation $(1+2+3+4)−(5+5+5+5)$ as $(1-5) + (2-5) + ...$ which is simply subtracting data frames and sum over columns.

With these two consideration, assuming you have defined your objects as follows:

df_a = {
    'name': 'df_a',
    'value': pd.DataFrame([[1, 2, 3, 4], [2, 4, 6, 8]])
}
df_b = {
    'name': 'df_b',
    'value': pd.DataFrame([[10, 5, 2, 1], [4, 4, 6, 2]])
}

ref_1 = {
    'name': 'ref_1',
    'value': pd.DataFrame([[5, 5, 5, 5], [5, 5, 5, 5]])
}
ref_2 = {
    'name': 'ref_b',
    'value': pd.DataFrame([[3, 3, 3, 3], [3, 3, 3, 3]])
}

I did this because I want to use the names in creating the name of the columns of your final df. Then your code would be:

from itertools import product

final_result = pd.DataFrame(
    {
        '{}_{}'.format(df['name'], ref['name']): (df['value']-ref['value']).sum(axis=1)
        for (df, ref) in product([df_a, df_b], [ref_1, ref_2])
    }
)
  • I have used dictionary comprehension to skip the ugly loop/append solution.
  • product function from itertools does your iteration. product on (ab, cd) gives you ac, ad, bc, bd
  • as for keys, df names are joined together with _, and as for values, I have subtracted two dfs and sum over columns (axis=1)

The result would then be as you expect:

   df_a_ref_1  df_a_ref_b  df_b_ref_1  df_b_ref_b
0         -10          -2          -2           6
1           0           8          -4           4

Still if you want to expand the dictionary comprehension or do not want to define dictionaries of names/values, of course you can imagine how you can write simple for loops with the same logic:

for (df, ref) in product([df_a, df_b], [ref_1, ref_2]):
    # your desired columns
    col = (df - ref).sum(axis=1)