As said in the documentation applymap apply a function to a whole Dataframe not to a series
Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame
To apply for function to a series use map or in your case just astype (np.float) could also work.
If you want to cast the column to float do this :
self.file['Value'].astype(np.float32)
Answer from Espoir Murhabazi on Stack Overflowpython - AttributeError: 'DataFrame' object has no attribute 'map' - Stack Overflow
FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
Mapping multiple columns from one pandas data frame to another
python - 'DataFrame' object has no attribute 'map' adding more column - Stack Overflow
map is a method that you can call on a pandas.Series object. This method doesn't exist on pandas.DataFrame objects.
df['new'] = df['old'].map(d)
In your code ^^^ df['old'] is returning a pandas.Dataframe object for some reason.
- As @jezrael points out this could be due to having more than one old column in the dataframe.
Or perhaps your code isn't quite the same as the example you have given.
Either way the error is there because you are calling map() on a pandas.Dataframe object
Main problem is after selecting old column get DataFrame instead Series, so map implemented yet to Series failed.
Here should be duplicated column old, so if select one column it return all columns old in DataFrame:
df = pd.DataFrame([[1,3,8],[4,5,3]], columns=['old','old','col'])
print (df)
old old col
0 1 3 8
1 4 5 3
print(df['old'])
old old
0 1 3
1 4 5
#dont use dict like variable, because python reserved word
df['new'] = df['old'].map(d)
print (df)
AttributeError: 'DataFrame' object has no attribute 'map'
Possible solution for deduplicated this columns:
s = df.columns.to_series()
new = s.groupby(s).cumcount().astype(str).radd('_').replace('_0','')
df.columns += new
print (df)
old old_1 col
0 1 3 8
1 4 5 3
Another problem should be MultiIndex in column, test it by:
mux = pd.MultiIndex.from_arrays([['old','old','col'],['a','b','c']])
df = pd.DataFrame([[1,3,8],[4,5,3]], columns=mux)
print (df)
old col
a b c
0 1 3 8
1 4 5 3
print (df.columns)
MultiIndex(levels=[['col', 'old'], ['a', 'b', 'c']],
codes=[[1, 1, 0], [0, 1, 2]])
And solution is flatten MultiIndex:
#python 3.6+
df.columns = [f'{a}_{b}' for a, b in df.columns]
#puthon bellow
#df.columns = ['{}_{}'.format(a,b) for a, b in df.columns]
print (df)
old_a old_b col_c
0 1 3 8
1 4 5 3
Another solution is map by MultiIndex with tuple and assign to new tuple:
df[('new', 'd')] = df[('old', 'a')].map(d)
print (df)
old col new
a b c d
0 1 3 8 A
1 4 5 3 D
print (df.columns)
MultiIndex(levels=[['col', 'old', 'new'], ['a', 'b', 'c', 'd']],
codes=[[1, 1, 0, 2], [0, 1, 2, 3]])
You can't map a dataframe, but you can convert the dataframe to an RDD and map that by doing spark_df.rdd.map(). Prior to Spark 2.0, spark_df.map would alias to spark_df.rdd.map(). With Spark 2.0, you must explicitly call .rdd first.
You can use df.rdd.map(), as DataFrame does not have map or flatMap, but be aware of the implications of using df.rdd:
Converting to RDD breaks Dataframe lineage, there is no predicate pushdown, no column prunning, no SQL plan and less efficient PySpark transformations.
What should you do instead?
Keep in mind that the high-level DataFrame API is equipped with many alternatives. First, you can use select or selectExpr.
Another example is using explode instead of flatMap(which existed in RDD):
df.select($"name",explode($"knownLanguages"))
.show(false)
Result:
+-------+------+
|name |col |
+-------+------+
|James |Java |
|James |Scala |
|Michael|Spark |
|Michael|Java |
|Michael|null |
|Robert |CSharp|
|Robert | |
+-------+------+
You can also use withColumn or UDF, depending on the use-case, or another option in the DataFrame API.
Mapping from one column to another such as below works fine, however the requirements have changed and now need to map two columns to the summary table, and am getting the error 'DataFrame' object has no attribute 'map'. I'm sure it is something simple like a bracket or parentheses out of place, but right now not quite sure.
score['#_%_to_Total'] = (score['Total_#_Genuine'] / score['mop_'].map(summary.set_index(['mop_'])['count_Not Fraud']))*100 #Below is the line of code giving the AttributeError score['#_%_to_Total'] = (score['Total_#_Genuine'] / score[['merchant_merchantid_','mop_']].map(summary.set_index(['merchant_merchantid_','mop_'])['count_Not Fraud']))*100 AttributeError: 'DataFrame' object has no attribute 'map'
The map function in pandas series not in pandas DataFrame, if you have pandas series and you need to replace some values you will change them by passing dictionary in it. and if you need to change values in dataframe regardless of which column they belong to, you can use pd.DataFrame.replace(value, replacement)
Just use applymap. You have different version with your teacher.