Check the Pandas documentation, but I think
X_train = df_train.drop(['ID','TARGET'], axis=1).values
.values returns a numpy array, not a Pandas dataframe. An array does not have a columns attribute.
remove_features_identical - if you pass this an array, make sure you are only using array, not dataframe, features. Otherwise, make sure you pass it a dataframe. And don't use variable names like DataFrame.
Check the Pandas documentation, but I think
X_train = df_train.drop(['ID','TARGET'], axis=1).values
.values returns a numpy array, not a Pandas dataframe. An array does not have a columns attribute.
remove_features_identical - if you pass this an array, make sure you are only using array, not dataframe, features. Otherwise, make sure you pass it a dataframe. And don't use variable names like DataFrame.
Maybe this solution solve such problem, try this:
X_train = pd.DataFrame(X_train, columns = X.columns)
X_test = pd.DataFrame(X_test, columns=X.columns)
python - 'numpy.ndarray' object has no attribute 'columns' - Stack Overflow
python - Multivariate Regression Error “AttributeError: 'numpy.ndarray' object has no attribute 'columns'” - Data Science Stack Exchange
python - 'numpy.ndarray' object has no attribute 'columns' how can ı solve it? - Stack Overflow
Python Coding help- keep recieving error message"AttributeError: 'numpy.ndarray' object has no attribute 'MESSAGE_A'"
The problem is that train_test_split(X, y, ...) returns numpy arrays and not pandas dataframes. Numpy arrays have no attribute named columns
If you want to see what features SelectFromModel kept, you need to substitute X_train (which is a numpy.array) with X which is a pandas.DataFrame.
selected_feat= X.columns[(sel.get_support())]
This will return a list of the columns kept by the feature selector.
If you wanted to see how many features were kept you can just run this:
sel.get_support().sum() # by default this will count 'True' as 1 and 'False' as 0
because this :
X = df.iloc[:,:24481].values
y = df.iloc[:, -1].values
you should remove .values or make extra X_col, y_col like that
X_col = df.iloc[:,:24481]
y_col = df.iloc[:, -1]
Use this:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# importing dataset
dataset=pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:,3:12].values
Y = dataset.iloc[:,13].values
#spliting dataset into test set and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.20)
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=20, random_state=0)
regressor.fit(X_train, y_train)
#feature importance
feature_importances = pd.DataFrame(regressor.feature_importances_,index = dataset.columns,columns=['importance']).sort_values('importance',ascending=False)
So X_train that comes out from train_test_split is actually a numpy array which will never have a columns.
Secondly, you are asking for values when you make X from dataset which returns the numpy.ndarry and not a df.
You need to changes your line
feature_importances = pd.DataFrame(rf.feature_importances_,index = X_train.columns,columns=['importance']).sort_values('importance',ascending=False)
to
columns_ = dataset.iloc[:1, 3:12].columns
feature_importances = pd.DataFrame(rf.feature_importances_,index = columns_,columns=['importance']).sort_values('importance',ascending=False)
Using .values on a pandas dataframe gives you a numpy array. This will not contain column names and such. You do this when setting X like this:
X = dataset[['Read?', 'x1', .. ,'x47']].values
But then you try to get the column names from X (which it does not have) by writing X.columns here:
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
So store your column names in a variable or input them again, like this:
coeff_df = pd.DataFrame(regressor.coef_, ['Read?', 'x1', .. ,'x47'], columns=['Coefficient'])
hi remove values method
X = dataset[['Read?', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6' , 'x7','x8','x9','x10','x11','x12','x13','x14','x15','x16','x17','x18','x19','x20','x21','x22','x23','x24','x25','x26','x27','x28','x29','x30','x31','x32','x33','x34','x35','x36','x37','x38','x39','x40','x41','x42','x43','x44','x45','x46','x47']]
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
https://imgur.com/gallery/yAdAjdx
Hello,
Linked is the screenshot of the two error messages I keep recieving as well as the code leading up to it. I'm trying to run an uplift on a classification tree. Any help is appreciated, I am still fairly new to python. Thank you!!