This has been answered in the comments where it was noted that the following works:
df.astype({'date': 'datetime64[ns]'})
In addition, you can set the dtype when reading in the data:
pd.read_csv('path/to/file.csv', parse_dates=['date'])
This has been answered in the comments where it was noted that the following works:
df.astype({'date': 'datetime64[ns]'})
In addition, you can set the dtype when reading in the data:
pd.read_csv('path/to/file.csv', parse_dates=['date'])
datetime
Since you can't pass datetime format to astype(), it's a little primitive and it's better to use pd.to_datetime() instead.
df['date'] = pd.to_datetime(df['date'])
For example, if the dates in the data are of the format %d/%m/%Y such as 01/04/2020, astype() would incorrectly parse it as 2020-01-04 whereas with pd.to_datetime(), you can pass the correct format.
If you need to convert multiple columns into datetime64 (which is often the reason astype() is used), then you can apply pd.to_datetime().
df = pd.DataFrame({'date1': ['01/04/2020'], 'date2': ['02/04/2020']})
df = df.apply(pd.to_datetime, format='%d/%m/%Y')
Even with read_csv, you have some control over the format, e.g.
df = pd.read_csv('file.csv', parse_dates=['date'], dayfirst=True)
date
If you want to cast into date, then you can first cast to datetime64[ns] and then use dt.date to get a column of datetime.date objects:
df['date'] = pd.to_datetime(df['date']).dt.date
The column dtype will become object though (on which you can still perform vectorized operations such as adding days, comparing dates etc.), so if you plan to work on it a lot in pandas, it's more performative to use datetime64 instead. For example, adding a day is extremely fast on datetime64 columns, not so much on date columns:
s_dt = pd.Series(pd.date_range('1700', None, 10000, 'D'))
s_d = s_dt.dt.date
%timeit x = s_dt + pd.Timedelta(days=1)
# 344 µs ± 17.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit y = s_d + pd.Timedelta(days=1)
# 56.1 ms ± 11.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
With that being said, if you dump it into a database (such as sqlite), an object dtype column of datetime.date objects stored as a DATE type (whereas datetime64[ns] will be stored as TIMESTAMP).
Pandas datetime dtype is from numpy datetime64, so if you have pandas<2.0, you can use the following as well (since pandas 2.0, unitless datetime64 is not supported anymore). There's no date dtype (although you can perform vectorized operations on a column that holds datetime.date values).
df = df.astype({'date': np.datetime64})
# or (on a little endian system)
df = df.astype({'date': '<M8'})
# (on a big endian system)
df = df.astype({'date': '>M8'})
how to convert pandas columns of dates formatted like "2020-01-02 23:59:56.078191" to datetime object, then to ms epoch?
BUG: astype not working correctly for similar datetime format
python - Convert DataFrame column type from string to datetime - Stack Overflow
Converting string to datetime - when the year is only three digits.
I have a pandas dataframe with a column of dates formatted like "2020-01-02 23:59:56.078191"
How would I convert that column to a column of to datetime objects, then to ms epoch (float)?
Attempt:
df['UT'] = pd.to_datetime(df['UT'], format = '%Y-%m-%d %H:%M:%S.%f')
which works to get the datetime, then
df['UT'] = df['UT'].timestamp('ms')
which gives the error:
'AttributeError: 'Series' object has no attribute 'timestamp'
The easiest way is to use to_datetime:
Copydf['col'] = pd.to_datetime(df['col'])
It also offers a dayfirst argument for European times (but beware this isn't strict).
Here it is in action:
CopyIn [11]: pd.to_datetime(pd.Series(['05/23/2005']))
Out[11]:
0 2005-05-23 00:00:00
dtype: datetime64[ns]
You can pass a specific format:
CopyIn [12]: pd.to_datetime(pd.Series(['05/23/2005']), format="%m/%d/%Y")
Out[12]:
0 2005-05-23
dtype: datetime64[ns]
If your date column is a string of the format '2017-01-01' you can use pandas astype to convert it to datetime.
Copydf['date'] = df['date'].astype('datetime64[ns]')
or use datetime64[D] if you want Day precision and not nanoseconds
Copyprint(type(df['date'].iloc[0]))
yields
Copy<class 'pandas._libs.tslib.Timestamp'>
the same as when you use pandas.to_datetime
You can try it with other formats then '%Y-%m-%d' but at least this works.
Hi, so I'm working on a genealogy project. I have ancestors back in the 400s onward.
I've imported them as a csv using pandas and need to convert just the years to datetime.
import pandas as pd
df = pd.read_csv('final_ancestry.csv')
df.Year1=df.Year1.astype(str)
df.Year2=df.Year2.astype(str)
pd.to_datetime(df.Year1, format='%Y')I keep getting "ValueError: time data '406' does not match format '%Y' (match)". I know the format is YYYY, but in the csv, I actually put it as 0406, etc. My next thought is to save it as a txt file and import the file that way, but is there something I'm missing?
I also don't know if I really need to convert to string, but it wasn't working as the INT64 that it imported as, so I thought I'd try string.