The error is reproducible if the array is of dtype=object:
import numpy as np
label0 = np.random.random((50, 3)).astype(object)
np.cov(label0, rowvar=False)
AttributeError: 'float' object has no attribute 'shape'
If possible you should convert it to a numeric type. For example:
np.cov(label0.astype(float), rowvar=False) # works
Note: object arrays are rarely useful (they are slow and not all NumPy functions deal gracefully with these - like in this case), so it could make sense to check where it came from and also fix that.
The error is reproducible if the array is of dtype=object:
import numpy as np
label0 = np.random.random((50, 3)).astype(object)
np.cov(label0, rowvar=False)
AttributeError: 'float' object has no attribute 'shape'
If possible you should convert it to a numeric type. For example:
np.cov(label0.astype(float), rowvar=False) # works
Note: object arrays are rarely useful (they are slow and not all NumPy functions deal gracefully with these - like in this case), so it could make sense to check where it came from and also fix that.
try
label0.astype(float32)
and then calculate your cov.
It might because your dtype is object.
Check the Pandas documentation, but I think
X_train = df_train.drop(['ID','TARGET'], axis=1).values
.values returns a numpy array, not a Pandas dataframe. An array does not have a columns attribute.
remove_features_identical - if you pass this an array, make sure you are only using array, not dataframe, features. Otherwise, make sure you pass it a dataframe. And don't use variable names like DataFrame.
Maybe this solution solve such problem, try this:
X_train = pd.DataFrame(X_train, columns = X.columns)
X_test = pd.DataFrame(X_test, columns=X.columns)
Using .values on a pandas dataframe gives you a numpy array. This will not contain column names and such. You do this when setting X like this:
X = dataset[['Read?', 'x1', .. ,'x47']].values
But then you try to get the column names from X (which it does not have) by writing X.columns here:
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
So store your column names in a variable or input them again, like this:
coeff_df = pd.DataFrame(regressor.coef_, ['Read?', 'x1', .. ,'x47'], columns=['Coefficient'])
hi remove values method
X = dataset[['Read?', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6' , 'x7','x8','x9','x10','x11','x12','x13','x14','x15','x16','x17','x18','x19','x20','x21','x22','x23','x24','x25','x26','x27','x28','x29','x30','x31','x32','x33','x34','x35','x36','x37','x38','x39','x40','x41','x42','x43','x44','x45','x46','x47']]
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])