I had the same problem happening on some code that was working perfectly fine after migrating to the latest Pycharm version.
I assume you are using the latest Pycharm version (2019.2). I don't have an explanation to why this is causing the issue but installing the older Pycharm 2019.1.4 fixed the problem for me.
Answer from Ben on Stack OverflowI had the same problem happening on some code that was working perfectly fine after migrating to the latest Pycharm version.
I assume you are using the latest Pycharm version (2019.2). I don't have an explanation to why this is causing the issue but installing the older Pycharm 2019.1.4 fixed the problem for me.
Agreed, run this on pycharm 2019.2 with no problem. Put a break point somewhere, debug and error will happen
spark = SparkSession.builder.getOrCreate()
pdf = pd.DataFrame({'A': ['asdf', 'fdsa'], 'B': [1, 2]})
sdf = spark.createDataFrame(pdf)
print(pdf)
sdf.show()
pandas.dataframe doesn't have a built-in reshape method, but you can use .values to access the underlying numpy array object and call reshape on it:
start = 0
for i in range(0, len(df.index)):
if (i + 1)%10 == 0:
result = df.iloc[start:i+1].values.reshape(2,5)
start = i + 1
print result
#[[ 52.1 32.2 44.6 99.1 12.3]
# [ 43.2 79.4 45.5 56.3 15.4]]
#[[ 35.7 23.7 66.7 33.8 12.9]
# [ 34.8 21.6 43.7 44.2 55.8]]
Simplest Answer
df = pd.read_csv("test.csv", header=None,usecols=[1])
df.values.reshape(-1,5)
array([[52.1, 32.2, 44.6, 99.1, 12.3],
[43.2, 79.4, 45.5, 56.3, 15.4],
[35.7, 23.7, 66.7, 33.8, 12.9],
[34.8, 21.6, 43.7, 44.2, 55.8]])
You can get the number of columns directly
len(df.columns) # this is fast
You can also call len on the dataframe itself, though beware that this will trigger a computation.
len(df) # this requires a full scan of the data
Dask.dataframe doesn't know how many records are in your data without first reading through all of it.
With shape you can do the following
a = df.shape
a[0].compute(),a[1]
This will show the shape just as it is shown with pandas
Check your DataFrame with data.columns
It should print something like this
Index([u'regiment', u'company', u'name',u'postTestScore'], dtype='object')
Check for hidden white spaces..Then you can rename with
data = data.rename(columns={'Number ': 'Number'})
I think the column name that contains "Number" is something like " Number" or "Number ". I'm assuming you might have a residual space in the column name. Please run print "<{}>".format(data.columns[1]) and see what you get. If it's something like < Number>, it can be fixed with:
data.columns = data.columns.str.strip()
See pandas.Series.str.strip
In general, AttributeError: 'DataFrame' object has no attribute '...', where ... is some column name, is caused because . notation has been used to reference a nonexistent column name or pandas method.
pandas methods are accessed with a .. pandas columns can also be accessed with a . (e.g. data.col) or with brackets (e.g. ['col'] or [['col1', 'col2']]).
data.columns = data.columns.str.strip() is a fast way to quickly remove leading and trailing spaces from all column names. Otherwise verify the column or attribute is correctly spelled.
The function pd.read_csv() is already a DataFrame and thus that kind of object does not support calling .to_dataframe().
You can check the type of your variable ds using print(type(ds)), you will see that it is a pandas DataFrame type.
According to what I understand. You are loading loanapp_c.csv in ds using this code:
ds = pd.read_csv('desktop/python ML/loanapp_c.csv')
ds over here is a DataFrame object. What you are doing is calling to_dataframe on an object which a DataFrame already.
Removing this dataset = ds.to_dataframe() from your code should solve the error