This isn't a pythonic thing to do, have you thought about instead creating a list of dataframes?
df=pd.DataFrame.copy(mef_list)
form=['','_M3','_M6','_M9','_M12','_LN','_C']
list_of_df = list()
for i in range(0, len(form)):
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(form[i])
list_of_df.append(df)
Then you can access 'df0' as list_of_df[0]
You also don't need to iterate through a range, you can just loop through the form list itself:
form=['','_M3','_M6','_M9','_M12','_LN','_C']
list_of_df = list()
for i in form:
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(i) ## You can remove str() if everything in form is already a string
list_of_df.append(df)
Answer from TheHCA on Stack OverflowThis isn't a pythonic thing to do, have you thought about instead creating a list of dataframes?
df=pd.DataFrame.copy(mef_list)
form=['','_M3','_M6','_M9','_M12','_LN','_C']
list_of_df = list()
for i in range(0, len(form)):
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(form[i])
list_of_df.append(df)
Then you can access 'df0' as list_of_df[0]
You also don't need to iterate through a range, you can just loop through the form list itself:
form=['','_M3','_M6','_M9','_M12','_LN','_C']
list_of_df = list()
for i in form:
df=pd.DataFrame.copy(mef_list)
df['Variable_new']=df['Variable']+str(i) ## You can remove str() if everything in form is already a string
list_of_df.append(df)
mef_list = ["UR", "CPI", "CEI", "Farm", "PCI", "durable", "C_CVM"]
form = ['', '_M3', '_M6', '_M9', '_M12', '_LN', '_C']
Variable_new = []
foo = 0
for variable in form:
Variable_new.append(mef_list[foo]+variable)
foo += 1
print(Variable_new)
I have a large data frame I am creating smaller data frames with. Basically, there is one giant data frame with different departments information, so, I am creating smaller data frames, all the rows labeled purchasing will go into a smaller data frame, engineering, accounting, etc.
I created a while loop to go through the length of the large data frame and want to assign each smaller data frame with it's own name. Code might explain it better. It is below.
n=0
while n < len(df_dept_list.index):
dept = df_dept_list.iloc[n]
df_dept = df_all_data [
(df_all_data['Department'] == dept)].dropna()
n = n + 1I would like to just have the data frame name change each iteration like df_dept0, then df_dept1, something that changes with the value 'n'.
Any ideas how to?
Going pseudo-code this out, perhaps somebody has encountered this sort of issue before. Have not had luck reading through stackoverflow posts.
I have a list of months and a df for each month with data that includes delivery volume and a time. These named like 'df_1701_unfiltered'.
I previously hardcoded my query logic, but on mobile now. That's not what I'm worried about so please disregard the pseudo aspect (I'm on mobile atm).
I want to create a new, separate dataframe for each month that is a filtered version of the original. Here is my thought process.
months = ['1701', '1702', '1703']
For month in month: "df_"+month+"filtered" = "df"+month+"_unfiltered".query("time > start and time < end")
I'm able to do something similar within a single dataframe using .apply to create dynamic columns. It throws an "cannot assign to operator" error each time.
Any idea how I can do this for entire dataframes?
Hello fellow strangers! I am trying to name pandas dataframe columns based on different years, from 2015 to 2025. I could do this manually like this:
import pandas as pd a = pd.DataFrame a['2015'] = "2015" a['2016'] = "2016" a['2017'] = "2017"
But I thought I could make it work with string formatting and a for loop:
import pandas as pd
years = 10
starting_year = 2015
for each in range(years):
a = pd.DataFrame
a['%s' % (starting_year + each + 1)] = starting_year+each+1But this throws me the error: TypeError: 'type' object does not support item assignment
I remember once reading that dictionaries can be used dynamically and something tells me I should use them for this specific problem but no insights yet... Anyone help??
I actually don’t know if the issue is directly related to the years being int. I actually think it’s related to a being assigned to a Python ‘type’ and not actually an empty data frame.. To validate you can just print(a) given your current code.
On mobile so I can’t test completely but try this assignment instead.
a = pd.DataFrame()
Are you sure column names can be integers? I am fairly certain that column names need tol be string datatypes
I think it is easy to handle the dataframes in a dictionary. Try the codes below:
review_categories = ["beauty", "pet"]
reviews = {}
for review in review_categories:
df_name = review + '_reviews' # the name for the dataframe
filename = "D:\\Library\\reviews_{}.json".format(review)
reviews[df_name] = pd.read_json(path_or_buf=filename, lines=True)
In reviews, you will have a key with the respective dataframe to store the data. If you want to retrieve the data, just call:
reviews["beauty_reviews"]
Hope it helps.
You can first pack the files into a list
reviews = []
review_categories = ["beauty", "pet"]
for i in review_categories:
filename = "D:\\Library\\reviews_{}.json".format(i)
reviews.append(pd.read_json(path_or_buf=filename, lines=True))
and then unpack your results into the variable names you wanted:
beauty_reviews, pet_reviews = reviews
I guess you can achieve this with something simplier, like that :
df_list=[df1, df2, df3]
for i, df in enumerate(df_list, 1):
df.columns = [col_name+'_df{}'.format(i) for col_name in df.columns]
If your DataFrames have prettier names you can try:
df_names=('Home', 'Work', 'Park')
for df_name in df_names:
df = globals()[df_name]
df.columns = [col_name+'_{}'.format(df_name) for col_name in df.columns]
Or you can fetch the name of each variable by looking up into globals() (or locals()) :
df_list = [Home, Work, Park]
for df in df_list:
name = [k for k, v in globals().items() if id(v) == id(df) and k[0] != '_'][0]
df.columns = [col_name+'_{}'.format(name) for col_name in df.columns]
My preferred rather simple way of doing this, especially when you want to apply some logic to all column names is:
for col in df.columns:
df.rename(columns={col:col.upper().replace(" ","_")},inplace=True)
Just to underline my comment to @maxymoo's answer, it's almost invariably a bad idea ("code smell") to add names dynamically to a Python namespace. There are a number of reasons, the most salient being:
Created names might easily conflict with variables already used by your logic.
Since the names are dynamically created, you typically also end up using dynamic techniques to retrieve the data.
This is why dicts were included in the language. The correct way to proceed is:
d = {}
for name in companies:
d[name] = pd.DataFrame()
Nowadays you can write a single dict comprehension expression to do the same thing, but some people find it less readable:
d = {name: pd.DataFrame() for name in companies}
Once d is created the DataFrame for company x can be retrieved as d[x], so you can look up a specific company quite easily. To operate on all companies you would typically use a loop like:
for name, df in d.items():
# operate on DataFrame 'df' for company 'name'
In Python 2 you were better writing
for name, df in d.iteritems():
because this avoids instantiating the list of (name, df) tuples
that .items() creates in the older version.
That's now largely of historical interest, though there will of
course be Python 2 applications still extant and requiring
(hopefully occasional) maintenance.
You can do this (although obviously use exec with extreme caution if this is going to be public-facing code)
for c in companies:
exec('{} = pd.DataFrame()'.format(c))
Use a dictionary for organizing your dataframes, and groupby to split them. You can iterate through your groupby object with a dict comprehension.
Example:
>>> data
Sport random_data
0 soccer 0
1 soccer 3
2 football 1
3 football 1
4 soccer 4
frames = {i:dat for i, dat in data.groupby('Sport')}
You can then access your frames as you would any other dictionary value:
>>> frames['soccer']
Sport random_data
0 soccer 0
1 soccer 3
4 soccer 4
>>> frames['football']
Sport random_data
2 football 1
3 football 1
You can do this by modifying globals() but that's not really adviseable.
for S in Sports:
globals()[str(S)] = data.loc[data['Sport']==S]
Below is a self-contained example:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'sport':['football', 'football', 'tennis'],
'value':[1, 2, 3]})
In [3]: df
Out[3]:
sport value
0 football 1
1 football 2
2 tennis 3
In [4]: for name in df.sport.unique():
...: globals()[name] = df.loc[df.sport == name]
...:
In [4]: football
Out[4]:
sport value
0 football 1
1 football 2
While this is a direct answer to your question, I would recommend sacul's answer, dictionaries are meant for this (i.e. storing keys and values) and variable names inserted via globals() are usually not a good idea to begin with.
Imagine someone else or yourself in the future reading your code - all of a sudden you are using football like a pd.DataFrame which you have never explicitly defined before - how are you supposed to know what is going on?
I think the best is create dict of objects - see How do I create a variable number of variables?
You can use dict of DataFrames by converting groupby object to dict:
d = dict(tuple(df.groupby('month')))
print (d)
{1: month dest
0 1 a
1 1 bb, 2: month dest
2 2 cc
3 2 dd, 3: month dest
4 3 ee, 4: month dest
5 4 bb}
print (d[1])
month dest
0 1 a
1 1 bb
Another solution:
for i, x in df.groupby('month'):
globals()['dataframe' + str(i)] = x
print (dataframe1)
month dest
0 1 a
1 1 bb
You can use a list of dataframes:
dataframe = []
dataframe.append(None)
group = org_dataframe.groupby('month')
for n,g in group:
dataframe.append(g)
dataframe[1]
Output:
month dest
0 1 a
1 1 bb
dataframe[2]
Output:
month dest
2 2 cc
3 2 dd