The regular python int function only works for scalars. You should either use a numpy function to round the data, either
Copys = np.round((s - s % bucket_size) / bucket_size) #to round properly; or
s = np.fix((s - s % bucket_size) / bucket_size) #to round towards 0
and if you actually want to convert to an integer type, use
Copys = s.astype(int)
to cast your array.
Answer from Andras Deak -- Слава Україні on Stack OverflowThe regular python int function only works for scalars. You should either use a numpy function to round the data, either
Copys = np.round((s - s % bucket_size) / bucket_size) #to round properly; or
s = np.fix((s - s % bucket_size) / bucket_size) #to round towards 0
and if you actually want to convert to an integer type, use
Copys = s.astype(int)
to cast your array.
N.B. This answer is less efficient from the point of view that pandas is built on top of numpy. Please consider numpy if going for efficiency.
As for this answer, there is a significant amount of work done using pandas data frames, so adding additional conversion to numpy means writing extra code. So if one is performing an analysis in say jupyter notebook, then we can surely let the programming language do a bit of work under the hood.
Big thank you to @Chris for noticing this.
pandas version (theoretically less efficient than numpy)
Create a list with float values:
Copyy = [0.1234, 0.6789, 0.5678]
Convert the list of float values to pandas Series
Copys = pd.Series(data=y)
Round values to three decimal values
Copyprint(s.round(3))
returns
Copy0 0.123
1 0.679
2 0.568
dtype: float64
Convert to integer
Copyprint(s.astype(int))
returns
Copy0 0
1 0
2 0
dtype: int64
Pipe it all
Copypd.Series(data=y).round(3)
python - Simple way to convert a Pandas Series for integer comparison - Stack Overflow
python - Convert float Series into an integer Series in pandas - Stack Overflow
python - How to convert from pandas Series to an int - Stack Overflow
python - Convert all elements in float Series to integer - Stack Overflow
Videos
You can parse the "-" as a NaN-value. That might help you for more future tasks.
table = pd.read_table('team_rankings.dat', na_values="-")
See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
User pd.to_numeric with errors ='coerce' i.e
df.loc[(pd.to_numeric(df['highest_rank'],errors='coerce')) < 2]
Output:
rank team rating highest_rank highest_rating 0 1 Germany 2097 1 2205 1 2 Brazil 2086 1 2161 2 3 Spain 2011 1 2147 4 5 Argentina 1967 1 2128
Try converting with astype:
new_re_df = [s.iloc[np.where(ts.astype(int) == int(i))] for i in ts]
Edit
On suggestion by @Rutger Kassies a nicer way would be to cast series and then groupby:
rise_p['ts'] = (rise_p.time / 100).astype('int')
ts_grouped = rise_p.groupby('ts')
...
Here's a different way to solve your problem
In [3]: df
Out[3]:
time magnitude
0 1379945444 156.627598
1 1379945447 1474.648726
2 1379945448 1477.448999
3 1379945449 1474.886202
4 1379945699 1371.454224
In [4]: df.dtypes
Out[4]:
time int64
magnitude float64
dtype: object
Convert your epoch timestamps to seconds
In [7]: df['time'] = pd.to_datetime(df['time'],unit='s')
Set the index
In [8]: df.set_index('time',inplace=True)
In [9]: df
Out[9]:
magnitude
time
2013-09-23 14:10:44 156.627598
2013-09-23 14:10:47 1474.648726
2013-09-23 14:10:48 1477.448999
2013-09-23 14:10:49 1474.886202
2013-09-23 14:14:59 1371.454224
Groupby 1min and mean the results (how= can be an arbitrary function as well)
In [10]: df.resample('1Min',how=np.mean)
Out[10]:
magnitude
time
2013-09-23 14:10:00 1145.902881
2013-09-23 14:11:00 NaN
2013-09-23 14:12:00 NaN
2013-09-23 14:13:00 NaN
2013-09-23 14:14:00 1371.454224
round won't work as it's being called on a pandas Series which is array-like rather than a scalar value, there is the built in method pd.Series.round to operate on the whole Series array after which you can change the dtype using astype:
In [43]:
df = pd.DataFrame({'a':np.random.randn(5)})
df['a'] = df['a'] * 100
df
Out[43]:
a
0 -4.489462
1 -133.556951
2 -136.397189
3 -106.993288
4 -89.820355
In [45]:
df['a'] = df['a'].round(0).astype(int)
df
Out[45]:
a
0 -4
1 -134
2 -136
3 -107
4 -90
Also it's unnecessary to iterate over the rows when there are vectorised methods available
Also this:
for obj in df['a']:
obj =int(round(obj))
Does not mutate the individual cell in the Series, it's operating on a copy of the value which is why the df is not mutated.
The code in your loop:
obj = int(round(obj))
Only changes which object the name obj refers to. It does not modify the data stored in the series. If you want to do this you need to know where in the series the data is stored and update it there.
E.g.
for i, num in enumerate(df['a']):
df['a'].iloc[i] = int(round(obj))
To modify the float output do this:
Copydf= pd.DataFrame(range(5), columns=['a'])
df.a = df.a.astype(float)
df
Out[33]:
a
0 0.0000000
1 1.0000000
2 2.0000000
3 3.0000000
4 4.0000000
pd.options.display.float_format = '{:,.0f}'.format
df
Out[35]:
a
0 0
1 1
2 2
3 3
4 4
Use the pandas.DataFrame.astype(<type>) function to manipulate column dtypes.
Copy>>> df = pd.DataFrame(np.random.rand(3,4), columns=list("ABCD"))
>>> df
A B C D
0 0.542447 0.949988 0.669239 0.879887
1 0.068542 0.757775 0.891903 0.384542
2 0.021274 0.587504 0.180426 0.574300
>>> df[list("ABCD")] = df[list("ABCD")].astype(int)
>>> df
A B C D
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
EDIT:
To handle missing values:
Copy>>> df
A B C D
0 0.475103 0.355453 0.66 0.869336
1 0.260395 0.200287 NaN 0.617024
2 0.517692 0.735613 0.18 0.657106
>>> df[list("ABCD")] = df[list("ABCD")].fillna(0.0).astype(int)
>>> df
A B C D
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0