Check the Pandas documentation, but I think
X_train = df_train.drop(['ID','TARGET'], axis=1).values
.values returns a numpy array, not a Pandas dataframe. An array does not have a columns attribute.
remove_features_identical - if you pass this an array, make sure you are only using array, not dataframe, features. Otherwise, make sure you pass it a dataframe. And don't use variable names like DataFrame.
Check the Pandas documentation, but I think
X_train = df_train.drop(['ID','TARGET'], axis=1).values
.values returns a numpy array, not a Pandas dataframe. An array does not have a columns attribute.
remove_features_identical - if you pass this an array, make sure you are only using array, not dataframe, features. Otherwise, make sure you pass it a dataframe. And don't use variable names like DataFrame.
Maybe this solution solve such problem, try this:
X_train = pd.DataFrame(X_train, columns = X.columns)
X_test = pd.DataFrame(X_test, columns=X.columns)
Python Coding help- keep recieving error message"AttributeError: 'numpy.ndarray' object has no attribute 'MESSAGE_A'"
AttributeError: 'numpy.ndarray' object has no attribute 'numpy'
python - Multivariate Regression Error “AttributeError: 'numpy.ndarray' object has no attribute 'columns'” - Data Science Stack Exchange
Error creating numpy v-stack, 'AttributeError: 'numpy.ndarray' object has no attribute 'np'
Videos
The problem is that train_test_split(X, y, ...) returns numpy arrays and not pandas dataframes. Numpy arrays have no attribute named columns
If you want to see what features SelectFromModel kept, you need to substitute X_train (which is a numpy.array) with X which is a pandas.DataFrame.
selected_feat= X.columns[(sel.get_support())]
This will return a list of the columns kept by the feature selector.
If you wanted to see how many features were kept you can just run this:
sel.get_support().sum() # by default this will count 'True' as 1 and 'False' as 0
because this :
X = df.iloc[:,:24481].values
y = df.iloc[:, -1].values
you should remove .values or make extra X_col, y_col like that
X_col = df.iloc[:,:24481]
y_col = df.iloc[:, -1]
https://imgur.com/gallery/yAdAjdx
Hello,
Linked is the screenshot of the two error messages I keep recieving as well as the code leading up to it. I'm trying to run an uplift on a classification tree. Any help is appreciated, I am still fairly new to python. Thank you!!
Using .values on a pandas dataframe gives you a numpy array. This will not contain column names and such. You do this when setting X like this:
X = dataset[['Read?', 'x1', .. ,'x47']].values
But then you try to get the column names from X (which it does not have) by writing X.columns here:
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
So store your column names in a variable or input them again, like this:
coeff_df = pd.DataFrame(regressor.coef_, ['Read?', 'x1', .. ,'x47'], columns=['Coefficient'])
hi remove values method
X = dataset[['Read?', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6' , 'x7','x8','x9','x10','x11','x12','x13','x14','x15','x16','x17','x18','x19','x20','x21','x22','x23','x24','x25','x26','x27','x28','x29','x30','x31','x32','x33','x34','x35','x36','x37','x38','x39','x40','x41','x42','x43','x44','x45','x46','x47']]
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
Hi,
I'm trying to create a numpy v-stack and creating 3 np.array's for it, by filling them with a loop:
I get the error: 'AttributeError: 'numpy.ndarray' object has no attribute 'np' . I think I'm using the wrong notation to append to the empty arrays:
neighbor_id = [id_ for id_ in range(1, n_obs) if id_ != user_id]
neighbor_id_arr = np.array(neighbor_id)
similarity = np.array([])
num_interactions = np.array([])
# get similarity and num_interactions
for id_ in neighbor_id:
similarity.np.append(np.dot(user_item.loc[user_id],user_item.loc[id_])) #The issue is here, I think
num_interactions.np.append(user_interactions.loc[id_])
c = numpy.vstack((neighbor_id_arr, similarity,num_interactions))
Thanks!
James
The solution:
The given dataset was already an array, so I didn’t need to call .values.
The problem lies in the following line:
df = StandardScaler().fit_transform(df)
It returns a NumPy array (see the documentation), which does not have a drop function. You would have to convert it into a pd.DataFrame first!
new_df = pd.DataFrame(StandardScaler().fit_transform(df), columns=df.columns, index=df.index)
I couldn't find the minute in the tutorial, but may be it's just a consequence of copy-paste.
In the function scale_dataset you make data a numpy array and then you assign that value to train variable. But when you come again with scale_dataset for valid data set you want to use this `train' data set as a pandas dataframe but in that moment it's a numpy array.
My common sense tells me you want to use valid data set instead of train and so on like this:
train, X_train, y_train = scale_dataset(train, oversample = True)
valid, X_train, y_train = scale_dataset(valid, oversample = False)
test, X_train, y_train = scale_dataset(test, oversample = False)
I followed the same video and got the same result. I found at the end it's because she is using the same variable name.
train, X_train, y_train = scale_dataset(train, oversample = True)
Initially, the train variable is a dataframe but after running the function scale_dataset, it becomes a numpy array. Hence, when you call the function again, it errors out with 'numpy.ndarray' object has no attribute 'columns'
To fix, I just changed the variable name so the dataframe train will still be the same.
d_train, X_train, y_train = scale_dataset(train, oversample = True)