When you use df.apply(), each row of your DataFrame will be passed to your lambda function as a pandas Series. The frame's columns will then be the index of the series and you can access values using series[label].
So this should work:
df['D'] = (df.apply(lambda x: myfunc(x[colNames[0]], x[colNames[1]]), axis=1))
Answer from foglerit on Stack OverflowAttributeError: 'Series' object has no attribute [X] when preparing DataBlock - Part 1 (2020) - fast.ai Course Forums
python - Pandas - 'Series' object has no attribute - Stack Overflow
simple "attributeerror" exception is driving me mad
python - How to fix AttributeError: 'Series' object has no attribute 'to_numpy' - Stack Overflow
Videos
When you use df.apply(), each row of your DataFrame will be passed to your lambda function as a pandas Series. The frame's columns will then be the index of the series and you can access values using series[label].
So this should work:
df['D'] = (df.apply(lambda x: myfunc(x[colNames[0]], x[colNames[1]]), axis=1))
In general, this error occurs if you try to access an attribute that doesn't exist on an object. For pandas Serieses (or DataFrames), it occurs because you tried to index it using the attribute access (.).
In the case in the OP, they used x.colNames[0] to access the value on colNames[0] in row x but df doesn't have attribute colNames, so the error occurred.1
Another case this error may occur is if an index had a white space in it that you didn't know about. For example, the following case reproduces this error.
s = pd.Series([1, 2], index=[' a', 'b'])
s.a
In this case, make sure to remove the white space:
s.index = [x.strip() for x in s.index]
# or
s.index = [x.replace(' ', '') for x in s.index]
Finally, it's always safe to use [] to index a Series (or a DataFrame).
1: Serieses have the following attributes: axes, dtypes, empty, index, ndim, size, shape, T, values. DataFrames have all of these attributes + columns. When you use df.apply(..., axis=1), it iterates over the rows where each row is a Series whose indices are the column names of df.
full code here: https://github.com/josevqzmdz/proyecto_final_6/blob/main/main.py
for some reason, python does not detect, so to speak, that my class, BTC_predict, has indeed a train_test_split method. it is right there, it has no mistakes by its own. yet it refuses to call it. the full error is this one:
AttributeError: 'Series' object has no attribute 'train_test_split'
but why? I'm sure i'm missing something super obvious, but I can't see it. the piece of code that runs until it hits this wall is this:
btcc = BTC_predict() history_price = btcc.history_price() price_matrix = btcc.price_matrix_creator(history_price) price_matrix = btcc.normarlize_windows(price_matrix) row, X_train, y_train, X_test, y_test = history_price.train_test_split(price_matrix)
it beats me, I see nothing wrong with this implementation. the original code from the article I'm learning from looks like this:
ser = hist_price_dl() # Not passing any argument since they are set by default price_matrix = price_matrix_creator(ser) # Creating a matrix using the dataframe price_matrix = normalize_windows(price_matrix) # Normalizing its values to fit to RNN row, X_train, y_train, X_test, y_test = train_test_split_(price_matrix) # Applying train-test splitting, also returning the splitting-point
the only difference being the article is running everything in one file, while I took the liberty of making a class with all of it inside so it's not as cluttered.
Check the version of your pandas library:
import pandas
print(pandas.__version__)
If your version is less than 0.24.1:
pip install --upgrade pandas
If you need your code to work with all versions of pandas, here's a simple way to convert a Series into a NumPy array:
import pandas as pd
import numpy as np
s = pd.Series([1.1, 2.3])
a = np.array(s)
print(a) # [1.1 2.3]
On an advanced note, if your Series has missing values (as NaN values), these can be converted to a masked array:
s = pd.Series([1.1, np.nan])
a = np.ma.masked_invalid(s)
print(a) # [1.1 --]
You need to remove .values:
phone_numbers = merged_df.loc[(merged_df['Facility Code'] ==facility_number) & (merged_df['group'] == group) & (merged_df['Optedout'] == optout)]['phone']
@Serge Ballesta's comment is the most likely cause.
There are typos in the code that you have shared. Check whether you called value instead of values.
The following code works as expected:
import pandas as pd
data = {'phone': [25470000000, 25470000000, 25470000010, 25470000020, 25470000000], 'group': ['MAMA', 'MAMA', 'MAMA', 'MAMA', 'MAMA'], 'County': ['Orange', 'Orange', 'Orange', 'Orange', 'Orange'], 'PNC/ANC': ['PNC', 'PNC', 'PNC', 'PNC', 'PNC'], 'Facility Name': ['Main Centre', 'Main Centre', 'Centre', 'Centre', 'Main Centre'], 'Optedout': ['FALSE', 'FALSE', 'FALSE', 'FALSE', 'FALSE'], 'Facility Code': [112, 112, 108, 108, 112]}
merged_df = pd.DataFrame.from_dict(data)
facility_number = 108
group = 'MAMA'
optout = 'FALSE'
phone_numbers = merged_df.loc[(merged_df['Facility Code'] ==facility_number) & (merged_df['group'] == group) & (merged_df['Optedout'] == optout)]['phone'].values
print(phone_numbers)
Output:
[25470000010 25470000020]
By removing .values, the output is a dataframe:
2 25470000010
3 25470000020
Name: phone, dtype: int64
