One way to do this would be to use Series.dt.seconds and ´Series.dt.days´ and multiply with a factor for the desired unit:
(Series.dt.seconds/3600) + (Series.dt.days*24) # for values with [hours]
Answer from Soerendip on Stack ExchangeOne way to do this would be to use Series.dt.seconds and ´Series.dt.days´ and multiply with a factor for the desired unit:
(Series.dt.seconds/3600) + (Series.dt.days*24) # for values with [hours]
You can divide by the timedelta you want to use as unit:
totalDays = my_timedelta = pd.TimeDelta('1D')
You can use time deltas to do this more directly:
In [11]: s = pd.Series(["00:10:30"])
In [12]: s = pd.to_timedelta(s)
In [13]: s
Out[13]:
0 00:10:30
dtype: timedelta64[ns]
In [14]: s / pd.offsets.Minute(1)
Out[14]:
0 10.5
dtype: float64
I would convert the string to a datetime and then use the dt accessor to access the components of the time and generate your minutes column:
In [16]:
df = pd.DataFrame({'time':['00:10:30']})
df['time'] = pd.to_datetime(df['time'])
df['minutes'] = df['time'].dt.hour * 60 + df['time'].dt.minute + df['time'].dt.second/60
df
Out[16]:
time minutes
0 2015-02-05 00:10:30 10.5
python - Convert date to float for linear regression on Pandas data frame - Stack Overflow
python - Pandas - convert float to proper datetime or time object - Stack Overflow
Python Pandas Series of Datetimes to Seconds Since the Epoch - Stack Overflow
python - Convert timedelta to floating-point - Stack Overflow
For this kind of regression, I usually convert the dates or timestamps to an integer number of days since the start of the data.
This does the trick nicely:
df = pd.read_csv('test.csv')
df['date'] = pd.to_datetime(df['date'])
df['date_delta'] = (df['date'] - df['date'].min()) / np.timedelta64(1,'D')
city_data = df[df['city'] == 'London']
result = sm.ols(formula = 'sales ~ date_delta', data = city_data).fit()
The advantage of this method is that you're sure of the units involved in the regression (days), whereas an automatic conversion may implicitly use other units, creating confusing coefficients in your linear model. It also allows you to combine data from multiple sales campaigns that started at different times into your regression (say you're interested in effectiveness of a campaign as a function of days into the campaign). You could also pick Jan 1st as your 0 if you're interested in measuring the day of year trend. Picking your own 0 date puts you in control of all that.
There's also evidence that statsmodels supports timeseries from pandas. You may be able to apply this to linear models as well: http://statsmodels.sourceforge.net/stable/examples/generated/ex_dates.html
Also, a quick note: You should be able to read column names directly out of the csv automatically as in the sample code I posted. In your example I see there are spaces between the commas in the first line of the csv file, resulting in column names like ' date'. Remove the spaces and automatic csv header reading should just work.
get date as floating point year
I prefer a date-format, which can be understood without context. Hence, the floating point year representation.
The nice thing here is, that the solution works on a numpy level - hence should be fast.
import numpy as np
import pandas as pd
def dt64_to_float(dt64):
"""Converts numpy.datetime64 to year as float.
Rounded to days
Parameters
----------
dt64 : np.datetime64 or np.ndarray(dtype='datetime64[X]')
date data
Returns
-------
float or np.ndarray(dtype=float)
Year in floating point representation
"""
year = dt64.astype('M8[Y]')
# print('year:', year)
days = (dt64 - year).astype('timedelta64[D]')
# print('days:', days)
year_next = year + np.timedelta64(1, 'Y')
# print('year_next:', year_next)
days_of_year = (year_next.astype('M8[D]') - year.astype('M8[D]')
).astype('timedelta64[D]')
# print('days_of_year:', days_of_year)
dt_float = 1970 + year.astype(float) + days / (days_of_year)
# print('dt_float:', dt_float)
return dt_float
if __name__ == "__main__":
dates = np.array([
'1970-01-01', '2014-01-01', '2020-12-31', '2019-12-31', '2010-04-28'],
dtype='datetime64[D]')
df = pd.DataFrame({
'date': dates,
'number': np.arange(5)
})
df['date_float'] = dt64_to_float(df['date'].to_numpy())
print('df:', df, sep='\n')
print()
dt64 = np.datetime64( "2011-11-11" )
print('dt64:', dt64_to_float(dt64))
output
df:
date number date_float
0 1970-01-01 0 1970.000000
1 2014-01-01 1 2014.000000
2 2020-12-31 2 2020.997268
3 2019-12-31 3 2019.997260
4 2010-04-28 4 2010.320548
dt64: 2011.8602739726027
When you read the excel file specify the dtype of col itime as a str:
df = pd.read_excel("test.xlsx", dtype={'itime':str})
then you will have a time column of strings looking like:
df = pd.DataFrame({'itime':['2300', '0100', '0500', '1000']})
Then specify the format and convert to time:
df['Time'] = pd.to_datetime(df['itime'], format='%H%M').dt.time
itime Time
0 2300 23:00:00
1 0100 01:00:00
2 0500 05:00:00
3 1000 10:00:00
Just addon to Chris answer, if you are unable to convert because there is no zero in the front, apply the following to the dataframe.
df['itime'] = df['itime'].apply(lambda x: x.zfill(4))
So basically is that because the original format does not have even leading digit (4 digit). Example: 945 instead of 0945.
Update:
In 0.15.0 Timedeltas became a full-fledged dtype.
So this becomes possible (as well as the methods below)
In [45]: s = Series(pd.timedelta_range('1 day',freq='1S',periods=5))
In [46]: s.dt.components
Out[46]:
days hours minutes seconds milliseconds microseconds nanoseconds
0 1 0 0 0 0 0 0
1 1 0 0 1 0 0 0
2 1 0 0 2 0 0 0
3 1 0 0 3 0 0 0
4 1 0 0 4 0 0 0
In [47]: s.astype('timedelta64[s]')
Out[47]:
0 86400
1 86401
2 86402
3 86403
4 86404
dtype: float64
Original Answer:
I see that you are on master (and 0.13 is coming out very shortly), so assuming you have numpy >= 1.7. Do this. See here for the docs (this is frequency conversion)
In [5]: df = DataFrame(dict(date = date_range('20130101',periods=10)))
In [6]: df
Out[6]:
date
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2013-01-04 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2013-01-08 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
In [7]: df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)
Out[7]:
0 15706 days, 02:00:00
1 15707 days, 02:00:00
2 15708 days, 02:00:00
3 15709 days, 02:00:00
4 15710 days, 02:00:00
5 15711 days, 02:00:00
6 15712 days, 02:00:00
7 15713 days, 02:00:00
8 15714 days, 02:00:00
9 15715 days, 02:00:00
Name: date, dtype: timedelta64[ns]
In [9]: (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1)) / np.timedelta64(1,'s')
Out[9]:
0 1357005600
1 1357092000
2 1357178400
3 1357264800
4 1357351200
5 1357437600
6 1357524000
7 1357610400
8 1357696800
9 1357783200
Name: date, dtype: float64
The contained values are np.timedelta64[ns] objects, they don't have the same methods as timedelta objects, so no total_seconds().
In [10]: s = (df['date']+timedelta(hours=2)-datetime.datetime(1970,1,1))
In [11]: s[0]
Out[11]: numpy.timedelta64(1357005600000000000,'ns')
You can astype them to int, and you get back a ns unit.
In [12]: s[0].astype(int)
Out[12]: 1357005600000000000
You can do this as well (but only on an individual unit element).
In [18]: s[0].astype('timedelta64[s]')
Out[18]: numpy.timedelta64(1357005600,'s')
Since recent versions of Pandas, you can do:
import pandas as pd
# create a dataframe from 2023-05-06 to 2023-06-04
df = pd.DataFrame({'date': pd.date_range('2023-05-26', periods=10, freq='D')})
df['timestamp'] = (df['date'].add(pd.DateOffset(hours=2)) # add hour offset
.sub(pd.Timestamp(0)) # subtract 1970-1-1
.dt.total_seconds() # extract total of seconds
.astype(int)) # downcast float64 to int64
Output:
>>> df
date timestamp
0 2023-05-26 1685066400
1 2023-05-27 1685152800
2 2023-05-28 1685239200
3 2023-05-29 1685325600
4 2023-05-30 1685412000
5 2023-05-31 1685498400
6 2023-06-01 1685584800
7 2023-06-02 1685671200
8 2023-06-03 1685757600
9 2023-06-04 1685844000
The key is to subtract the origin (pd.Timestamp(0)) to each dates (DatetimeIndex) then use the dt accessor to extract from the result (TimedeltaIndex) the number of seconds. You can also downcast the numeric result (float64 to int64).
You could use the total_seconds method:
time_d_float = time_d.total_seconds()
In Python 3.2 or higher, you can divide two timedeltas to give a float. This is useful if you need the value to be in units other than seconds.
time_d_min = time_d / datetime.timedelta(minutes=1)
time_d_ms = time_d / datetime.timedelta(milliseconds=1)
Python 2:
def datetime_to_float(d):
epoch = datetime.datetime.utcfromtimestamp(0)
total_seconds = (d - epoch).total_seconds()
# total_seconds will be in decimals (millisecond precision)
return total_seconds
def float_to_datetime(fl):
return datetime.datetime.fromtimestamp(fl)
Python 3:
def datetime_to_float(d):
return d.timestamp()
The python 3 version of float_to_datetime will be no different from the python 2 version above.
In Python 3 you can use: timestamp (and fromtimestamp for the inverse).
Example:
>>> from datetime import datetime
>>> now = datetime.now()
>>> now.timestamp()
1455188621.063099
>>> ts = now.timestamp()
>>> datetime.fromtimestamp(ts)
datetime.datetime(2016, 2, 11, 11, 3, 41, 63098)
Here is the DF( just the column need to change from float to seconds and minutes):
Time
360.00
245.00
111.00
How can I change the column to:
Time
6:00
4:05
1:51
Thanks.
You can use pd.to_timedelta or np.timedelta64 to define a duration and divide by this:
# set up as per @EdChum
df['total_days_td'] = df['time_delta'] / pd.to_timedelta(1, unit='D')
df['total_days_td'] = df['time_delta'] / np.timedelta64(1, 'D')
You can use dt.total_seconds and divide this by the total number of seconds in a day, example:
In [25]:
df = pd.DataFrame({'dates':pd.date_range(dt.datetime(2016,1,1, 12,15,3), periods=10)})
df
Out[25]:
dates
0 2016-01-01 12:15:03
1 2016-01-02 12:15:03
2 2016-01-03 12:15:03
3 2016-01-04 12:15:03
4 2016-01-05 12:15:03
5 2016-01-06 12:15:03
6 2016-01-07 12:15:03
7 2016-01-08 12:15:03
8 2016-01-09 12:15:03
9 2016-01-10 12:15:03
In [26]:
df['time_delta'] = df['dates'] - pd.datetime(2015,11,6,8,10)
df
Out[26]:
dates time_delta
0 2016-01-01 12:15:03 56 days 04:05:03
1 2016-01-02 12:15:03 57 days 04:05:03
2 2016-01-03 12:15:03 58 days 04:05:03
3 2016-01-04 12:15:03 59 days 04:05:03
4 2016-01-05 12:15:03 60 days 04:05:03
5 2016-01-06 12:15:03 61 days 04:05:03
6 2016-01-07 12:15:03 62 days 04:05:03
7 2016-01-08 12:15:03 63 days 04:05:03
8 2016-01-09 12:15:03 64 days 04:05:03
9 2016-01-10 12:15:03 65 days 04:05:03
In [27]:
df['total_days_td'] = df['time_delta'].dt.total_seconds() / (24 * 60 * 60)
df
Out[27]:
dates time_delta total_days_td
0 2016-01-01 12:15:03 56 days 04:05:03 56.170174
1 2016-01-02 12:15:03 57 days 04:05:03 57.170174
2 2016-01-03 12:15:03 58 days 04:05:03 58.170174
3 2016-01-04 12:15:03 59 days 04:05:03 59.170174
4 2016-01-05 12:15:03 60 days 04:05:03 60.170174
5 2016-01-06 12:15:03 61 days 04:05:03 61.170174
6 2016-01-07 12:15:03 62 days 04:05:03 62.170174
7 2016-01-08 12:15:03 63 days 04:05:03 63.170174
8 2016-01-09 12:15:03 64 days 04:05:03 64.170174
9 2016-01-10 12:15:03 65 days 04:05:03 65.170174
I am plotting time series data from a pandas dataframe using matplotlib. When I plot the data and open up the figure options window from the matplotlib figure toolbar to adjust axis scales the x-axis (datetime) is given as floats, sometimes with quite a few decimal places.
https://imgur.com/a/eq0PDyo
https://imgur.com/a/JzIbbpv
I want to be able to set my x-scale from this "Figure options" window. How do I figure out the float that corresponds to my desired datetime?
If more info is needed... I am reading a csv file and converting a "Time" column of strings to datetime using `df["Time"] = pd.to_datetime(df["Time"])`
The digits after the decimal point are microseconds, and are formatted separately from the seconds:
>>> a = "2016-03-22 12:33:45.7565"
>>> datetime.datetime.strptime(a, "%Y-%m-%d %H:%M:%S.%f")
datetime.datetime(2016, 3, 22, 12, 33, 45, 756500)
Just an alternative approach - let the dateutil parser do the job:
>>> from dateutil.parser import parse
>>> a = "2016-03-22 12:33:45.7565"
>>> parse(a)
datetime.datetime(2016, 3, 22, 12, 33, 45, 756500)