I think the best is create dictionary of DataFrames:
d = {}
for i in range(12,0,-1):
d['t' + str(i)] = df.shift(i).add_suffix('_t' + str(i))
If need specify columns first:
d = {}
cols = ['column1','column2']
for i in range(12,0,-1):
d['t' + str(i)] = df[cols].shift(i).add_suffix('_t' + str(i))
dict comprehension solution:
d = {'t' + str(i): df.shift(i).add_suffix('_t' + str(i)) for i in range(12,0,-1)}
print (d['t10'])
column1_t10 column2_t10
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 0.0 19.0
11 1.0 18.0
12 2.0 17.0
13 3.0 16.0
14 4.0 15.0
15 5.0 14.0
16 6.0 13.0
17 7.0 12.0
18 8.0 11.0
19 9.0 10.0
EDIT: Is it possible by globals, but much better is dictionary:
d = {}
cols = ['column1','column2']
for i in range(12,0,-1):
globals()['df' + str(i)] = df[cols].shift(i).add_suffix('_t' + str(i))
print (df10)
column1_t10 column2_t10
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 0.0 19.0
11 1.0 18.0
12 2.0 17.0
13 3.0 16.0
14 4.0 15.0
15 5.0 14.0
16 6.0 13.0
17 7.0 12.0
18 8.0 11.0
19 9.0 10.0
Answer from jezrael on Stack OverflowI think the best is create dictionary of DataFrames:
d = {}
for i in range(12,0,-1):
d['t' + str(i)] = df.shift(i).add_suffix('_t' + str(i))
If need specify columns first:
d = {}
cols = ['column1','column2']
for i in range(12,0,-1):
d['t' + str(i)] = df[cols].shift(i).add_suffix('_t' + str(i))
dict comprehension solution:
d = {'t' + str(i): df.shift(i).add_suffix('_t' + str(i)) for i in range(12,0,-1)}
print (d['t10'])
column1_t10 column2_t10
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 0.0 19.0
11 1.0 18.0
12 2.0 17.0
13 3.0 16.0
14 4.0 15.0
15 5.0 14.0
16 6.0 13.0
17 7.0 12.0
18 8.0 11.0
19 9.0 10.0
EDIT: Is it possible by globals, but much better is dictionary:
d = {}
cols = ['column1','column2']
for i in range(12,0,-1):
globals()['df' + str(i)] = df[cols].shift(i).add_suffix('_t' + str(i))
print (df10)
column1_t10 column2_t10
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 0.0 19.0
11 1.0 18.0
12 2.0 17.0
13 3.0 16.0
14 4.0 15.0
15 5.0 14.0
16 6.0 13.0
17 7.0 12.0
18 8.0 11.0
19 9.0 10.0
for i in range(1, 16):
text=f"Version{i}=pd.DataFrame()"
exec(text)
A combination of exec and f"..." will help you do that.
If you need iterating or Versions of same variable above statement will help
I hope this will help, as i understood from this.
gbl = globals()
lst = ['SymbolA','SymbolB', 'SymbolC' .... 'SymbolN']
for i in lst:
data = SomeFunction(lst[i])
gbl[lst[i]+str(i)] = pd.Dataframe(data)
this will create a df dynamically . for accessing those df you need to run code like this.
gbl[lst[i]+str(i)]
try this..
You input has to be like below:
lst = ({'data':['SymbolA','SymbolB', 'SymbolC', 'SymbolN']})
print pd.DataFrame(lst)
python - Create new dataframe in pandas with dynamic names also add new column - Stack Overflow
Using Pandas in Glue ETL Job ( How to convert Dynamic DataFrame or PySpark Dataframe to Pandas Dataframe)
Dynamically assigning name of dataframe in a loop. Stuck!
python - How to create a dynamic dataframe - Stack Overflow
Videos
Creating variables with dynamic names is typically a bad practice.
I think the best solution for your problem is to store your dataframes into a dictionary and dynamically generate the name of the key to access each dataframe.
import copy
dict_of_df = {}
for ym in [201511, 201612, 201710]:
key_name = 'df_new_'+str(ym)
dict_of_df[key_name] = copy.deepcopy(df)
to_change = df['YearMonth']< ym
dict_of_df[key_name].loc[to_change, 'new_col'] = ym
dict_of_df.keys()
Out[36]: ['df_new_201710', 'df_new_201612', 'df_new_201511']
dict_of_df
Out[37]:
{'df_new_201511': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201612
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201612
2 a c 2 2016-12-05 07:53:35.943 201612 201612,
'df_new_201612': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201612
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201612
2 a c 2 2016-12-05 07:53:35.943 201612 201612,
'df_new_201710': A B ID t YearMonth new_col
0 -a a 1 2016-12-05 07:53:35.943 201612 201710
1 1 NaN 2 2016-12-05 07:53:35.943 201612 201710
2 a c 2 2016-12-05 07:53:35.943 201612 201710}
# Extract a single dataframe
df_2015 = dict_of_df['df_new_201511']
There is a more easy way to accomplish this using exec method. The following steps can be done to create a dataframe at runtime.
1.Create the source dataframe with some random values.
import numpy as np
import pandas as pd
df = pd.DataFrame({'A':['-a',1,'a'],
'B':['a',np.nan,'c'],
'ID':[1,2,2]})
2.Assign a variable that holds the new dataframe name. You can even send this value as a parameter or loop it dynamically.
new_df_name = 'df_201612'
3.Create dataframe dynamically using exec method to copy data from source dataframe to the new dataframe dynamically and in the next line assign a value to new column.
exec(f'{new_df_name} = df.copy()')
exec(f'{new_df_name}["new_col"] = 123')
4.Now the dataframe df_201612 will be available on the memory and you can execute print statement along with eval to verify this.
print(eval(new_df_name))
Going pseudo-code this out, perhaps somebody has encountered this sort of issue before. Have not had luck reading through stackoverflow posts.
I have a list of months and a df for each month with data that includes delivery volume and a time. These named like 'df_1701_unfiltered'.
I previously hardcoded my query logic, but on mobile now. That's not what I'm worried about so please disregard the pseudo aspect (I'm on mobile atm).
I want to create a new, separate dataframe for each month that is a filtered version of the original. Here is my thought process.
months = ['1701', '1702', '1703']
For month in month: "df_"+month+"filtered" = "df"+month+"_unfiltered".query("time > start and time < end")
I'm able to do something similar within a single dataframe using .apply to create dynamic columns. It throws an "cannot assign to operator" error each time.
Any idea how I can do this for entire dataframes?
You could give a try with a config file like below
import json
files = json.loads('{
"fileA": {
"header": "true",
"inputFileType": "csv",
"sourceFilePath": "path_to_fileA"
},
"fileB": {
"header": "true",
"inputFileType": "parquet",
"sourceFilePath": "path_to_fileB"
}
}')
df_dict = {}
for file in files:
df_dict['file'] = spark.read.option('header',file["header"]).format(file["inputFileType"]).path(file["sourceFilePath"])
Then you get a dictionary of dataframes with different formats and file path.
Hey got the answer...
def fileReader(inputFileType,sourceFilePath,inputFileType):
value ='true'
header='header'
a= "spark.read.option('"+header+"','"+value+"')."+inputFileType+"
('"+sourceFilePath+"')"
print(a)
print(type(a))
ds = eval(a)
return 'True'
the motive behind creating this function is to dynamically create data frames using different formats of files supported by pyspark. Now using this function I can create a data frame any type of file formats that are supported by pyspark, by just passing the location and the format of files.
I appreciate all the help.
#define list of fields to run match for
fieldlist = ['MATTER NUMBER','MATTER NAME','CLAIM NUMBER LISTING']
#loop through each field in fieldlist
for field in fieldlist:
#define dfname as the field with spaces replaced with underscores
dfname = '{}'.format(field.replace(' ','_'))
#create df with dfname
'{}'.format(dfname) = checkdf['{}'.format(field)].dropna()the error is on the last line. I also tried:
'{}'.format(dfname) = checkdf['{}'.format(field)].dropna()