I think the best is create dict of objects - see How do I create a variable number of variables?
You can use dict of DataFrames by converting groupby object to dict:
d = dict(tuple(df.groupby('month')))
print (d)
{1: month dest
0 1 a
1 1 bb, 2: month dest
2 2 cc
3 2 dd, 3: month dest
4 3 ee, 4: month dest
5 4 bb}
print (d[1])
month dest
0 1 a
1 1 bb
Another solution:
for i, x in df.groupby('month'):
globals()['dataframe' + str(i)] = x
print (dataframe1)
month dest
0 1 a
1 1 bb
Answer from jezrael on Stack OverflowGoing pseudo-code this out, perhaps somebody has encountered this sort of issue before. Have not had luck reading through stackoverflow posts.
I have a list of months and a df for each month with data that includes delivery volume and a time. These named like 'df_1701_unfiltered'.
I previously hardcoded my query logic, but on mobile now. That's not what I'm worried about so please disregard the pseudo aspect (I'm on mobile atm).
I want to create a new, separate dataframe for each month that is a filtered version of the original. Here is my thought process.
months = ['1701', '1702', '1703']
For month in month: "df_"+month+"filtered" = "df"+month+"_unfiltered".query("time > start and time < end")
I'm able to do something similar within a single dataframe using .apply to create dynamic columns. It throws an "cannot assign to operator" error each time.
Any idea how I can do this for entire dataframes?
I think the best is create dict of objects - see How do I create a variable number of variables?
You can use dict of DataFrames by converting groupby object to dict:
d = dict(tuple(df.groupby('month')))
print (d)
{1: month dest
0 1 a
1 1 bb, 2: month dest
2 2 cc
3 2 dd, 3: month dest
4 3 ee, 4: month dest
5 4 bb}
print (d[1])
month dest
0 1 a
1 1 bb
Another solution:
for i, x in df.groupby('month'):
globals()['dataframe' + str(i)] = x
print (dataframe1)
month dest
0 1 a
1 1 bb
You can use a list of dataframes:
dataframe = []
dataframe.append(None)
group = org_dataframe.groupby('month')
for n,g in group:
dataframe.append(g)
dataframe[1]
Output:
month dest
0 1 a
1 1 bb
dataframe[2]
Output:
month dest
2 2 cc
3 2 dd
python - Create new dataframe in pandas with dynamic names also add new column - Stack Overflow
python - How to dynamically name a dataframe within this for loop - Stack Overflow
python - Dynamic dataframe names creation - Stack Overflow
trying to dynamically name and reference a dataframe, getting error 'SyntaxError: can't assign to function call'
Videos
Just to underline my comment to @maxymoo's answer, it's almost invariably a bad idea ("code smell") to add names dynamically to a Python namespace. There are a number of reasons, the most salient being:
Created names might easily conflict with variables already used by your logic.
Since the names are dynamically created, you typically also end up using dynamic techniques to retrieve the data.
This is why dicts were included in the language. The correct way to proceed is:
d = {}
for name in companies:
d[name] = pd.DataFrame()
Nowadays you can write a single dict comprehension expression to do the same thing, but some people find it less readable:
d = {name: pd.DataFrame() for name in companies}
Once d is created the DataFrame for company x can be retrieved as d[x], so you can look up a specific company quite easily. To operate on all companies you would typically use a loop like:
for name, df in d.items():
# operate on DataFrame 'df' for company 'name'
In Python 2 you were better writing
for name, df in d.iteritems():
because this avoids instantiating the list of (name, df) tuples
that .items() creates in the older version.
That's now largely of historical interest, though there will of
course be Python 2 applications still extant and requiring
(hopefully occasional) maintenance.
You can do this (although obviously use exec with extreme caution if this is going to be public-facing code)
for c in companies:
exec('{} = pd.DataFrame()'.format(c))
Creating variables with dynamic names is typically a bad practice.
I think the best solution for your problem is to store your dataframes into a dictionary and dynamically generate the name of the key to access each dataframe.
import copy
dict_of_df = {}
for ym in [201511, 201612, 201710]:
key_name = 'df_new_'+str(ym)
dict_of_df[key_name] = copy.deepcopy(df)
to_change = df['YearMonth']< ym
dict_of_df[key_name].loc[to_change, 'new_col'] = ym
dict_of_df.keys()
Out[36]: ['df_new_201710', 'df_new_201612', 'df_new_201511']
dict_of_df
Out[37]:
{'df_new_201511': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201612
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201612
2 a c 2 2016-12-05 07:53:35.943 201612 201612,
'df_new_201612': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201612
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201612
2 a c 2 2016-12-05 07:53:35.943 201612 201612,
'df_new_201710': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201710
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201710
2 a c 2 2016-12-05 07:53:35.943 201612 201710}
# Extract a single dataframe
df_2015 = dict_of_df['df_new_201511']
There is a more easy way to accomplish this using exec method. The following steps can be done to create a dataframe at runtime.
1.Create the source dataframe with some random values.
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':['-a',1,'a'],
'B':['a',np.nan,'c'],
'ID':[1,2,2]})
2.Assign a variable that holds the new dataframe name. You can even send this value as a parameter or loop it dynamically.
new_df_name = 'df_201612'
3.Create dataframe dynamically using exec method to copy data from source dataframe to the new dataframe dynamically and in the next line assign a value to new column.
exec(f'{new_df_name} = df.copy()')
exec(f'{new_df_name}["new_col"] = 123')
4.Now the dataframe df_201612 will be available on the memory and you can execute print statement along with eval to verify this.
print(eval(new_df_name))
You can use a dictionary (here with dummy values) :
names = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth']
pvalues = {}
for i in range(len(names)):
pvalues["pvalues_" + names[i]] = i+1
print(pvalues)
Output:
{'pvalues_first': 1, 'pvalues_second': 2, 'pvalues_third': 3, 'pvalues_fourth': 4, 'pvalues_fifth': 5, 'pvalues_sixth': 6}
To access pvalues_third for example :
pvalues["pvalues_third"] = 20
print(pvalues)
**Output: **
{'pvalues_first': 1, 'pvalues_second': 2, 'pvalues_third': 20, 'pvalues_fourth': 4, 'pvalues_fifth': 5, 'pvalues_sixth': 6}
count=0
dataframe=[]
#loop through the three datasets (In reality I have many more than three)
names = ["first", "second", "third"]
for feature in feature_cols:
#define the model and fit it
mod = smf.ols(formula='Q(feature)'+'~material', data=dataset)
res = mod.fit()
#create a dataframe of the pvalues
#I would like to be able to dynamically name pvalues so that when looping through
#the chemicals of the first dataframe it is called 'pvalues_first' and so on.
name_str = "pvalues"+str(names[count])
pvalues = {'Intercept':[res.pvalues[0]], 'cap_type':[res.pvalues[1]]}
name_str=pd.DataFrame(pvalues)
count+=1
#define list of fields to run match for
fieldlist = ['MATTER NUMBER','MATTER NAME','CLAIM NUMBER LISTING']
#loop through each field in fieldlist
for field in fieldlist:
#define dfname as the field with spaces replaced with underscores
dfname = '{}'.format(field.replace(' ','_'))
#create df with dfname
'{}'.format(dfname) = checkdf['{}'.format(field)].dropna()the error is on the last line. I also tried:
'{}'.format(dfname) = checkdf['{}'.format(field)].dropna()Best way is to create a dict with the dynamic names as keys:
chunks = {f'{sub}{i}':chunk for i, chunk in enumerate(np.array_split(df, 10))}
If you absolutely insist on creating the frames as individual variables, then you could assign them to the globals() dictionary, but this method is NOT advised:
for i, chunk in enumerate(np.array_split(df, 10)):
globals()['{}{}'.format(sub, i)] = chunk
Why would you want to create variables in a loop?
- They are unnecessary: You can store everything in lists or any other type of collection
- They are hard to create and reuse: You have to use exec or globals()
Using a list is much easier:
subs = []
for chunk in np.array_split(df, 10):
print(chunk.head(2)) #just to check
print(chunk.tail(1)) #just to check
subs.append(chuck.copy())
I think it is easy to handle the dataframes in a dictionary. Try the codes below:
review_categories = ["beauty", "pet"]
reviews = {}
for review in review_categories:
df_name = review + '_reviews' # the name for the dataframe
filename = "D:\\Library\\reviews_{}.json".format(review)
reviews[df_name] = pd.read_json(path_or_buf=filename, lines=True)
In reviews, you will have a key with the respective dataframe to store the data. If you want to retrieve the data, just call:
reviews["beauty_reviews"]
Hope it helps.
You can first pack the files into a list
reviews = []
review_categories = ["beauty", "pet"]
for i in review_categories:
filename = "D:\\Library\\reviews_{}.json".format(i)
reviews.append(pd.read_json(path_or_buf=filename, lines=True))
and then unpack your results into the variable names you wanted:
beauty_reviews, pet_reviews = reviews
This isn't a pythonic thing to do, have you thought about instead creating a list of dataframes?
df=pd.DataFrame.copy(mef_list)
form=['','_M3','_M6','_M9','_M12','_LN','_C']
list_of_df = list()
for i in range(0, len(form)):
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(form[i])
list_of_df.append(df)
Then you can access 'df0' as list_of_df[0]
You also don't need to iterate through a range, you can just loop through the form list itself:
form=['','_M3','_M6','_M9','_M12','_LN','_C']
list_of_df = list()
for i in form:
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(i) ## You can remove str() if everything in form is already a string
list_of_df.append(df)
mef_list = ["UR", "CPI", "CEI", "Farm", "PCI", "durable", "C_CVM"]
form = ['', '_M3', '_M6', '_M9', '_M12', '_LN', '_C']
Variable_new = []
foo = 0
for variable in form:
Variable_new.append(mef_list[foo]+variable)
foo += 1
print(Variable_new)