Although numpy.ndarray has a mean, max, std etc. method, it does not have a median method. For a list of all methods available for an ndarray, see the numpy documentation for ndarray.
It is available as a function that takes the array as an argument:
>>> import numpy as np
>>> a = np.array([1,2,3,4,5,6,7,8,9,10])
>>> np.median(a)
5.5
As you will see in the documentation for ndarray.mean, ndarray.mean and np.mean are "equivalent functions," so this is just a matter of semantics.
The error is reproducible if the array is of dtype=object:
import numpy as np
label0 = np.random.random((50, 3)).astype(object)
np.cov(label0, rowvar=False)
AttributeError: 'float' object has no attribute 'shape'
If possible you should convert it to a numeric type. For example:
np.cov(label0.astype(float), rowvar=False) # works
Note: object arrays are rarely useful (they are slow and not all NumPy functions deal gracefully with these - like in this case), so it could make sense to check where it came from and also fix that.
try
label0.astype(float32)
and then calculate your cov.
It might because your dtype is object.
Using .values on a pandas dataframe gives you a numpy array. This will not contain column names and such. You do this when setting X like this:
X = dataset[['Read?', 'x1', .. ,'x47']].values
But then you try to get the column names from X (which it does not have) by writing X.columns here:
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
So store your column names in a variable or input them again, like this:
coeff_df = pd.DataFrame(regressor.coef_, ['Read?', 'x1', .. ,'x47'], columns=['Coefficient'])
hi remove values method
X = dataset[['Read?', 'x1', 'x2', 'x3', 'x4', 'x5', 'x6' , 'x7','x8','x9','x10','x11','x12','x13','x14','x15','x16','x17','x18','x19','x20','x21','x22','x23','x24','x25','x26','x27','x28','x29','x30','x31','x32','x33','x34','x35','x36','x37','x38','x39','x40','x41','x42','x43','x44','x45','x46','x47']]
coeff_df = pd.DataFrame(regressor.coef_, X.columns, columns=['Coefficient'])
The problem is that train_test_split(X, y, ...) returns numpy arrays and not pandas dataframes. Numpy arrays have no attribute named columns
If you want to see what features SelectFromModel kept, you need to substitute X_train (which is a numpy.array) with X which is a pandas.DataFrame.
selected_feat= X.columns[(sel.get_support())]
This will return a list of the columns kept by the feature selector.
If you wanted to see how many features were kept you can just run this:
sel.get_support().sum() # by default this will count 'True' as 1 and 'False' as 0
because this :
X = df.iloc[:,:24481].values
y = df.iloc[:, -1].values
you should remove .values or make extra X_col, y_col like that
X_col = df.iloc[:,:24481]
y_col = df.iloc[:, -1]