Since version 0.15.0 this can now be easily done using .dt to access just the date component:
df['just_date'] = df['dates'].dt.date
The above returns datetime.date, so object dtype. If you want to keep the dtype as datetime64 then you can just normalize:
df['normalised_date'] = df['dates'].dt.normalize()
This sets the time component to midnight, i.e. 00:00:00, but the display shows just the date value.
pandas.Series.dt
Since version 0.15.0 this can now be easily done using .dt to access just the date component:
df['just_date'] = df['dates'].dt.date
The above returns datetime.date, so object dtype. If you want to keep the dtype as datetime64 then you can just normalize:
df['normalised_date'] = df['dates'].dt.normalize()
This sets the time component to midnight, i.e. 00:00:00, but the display shows just the date value.
pandas.Series.dt
Simple Solution:
df['date_only'] = df['date_time_column'].dt.date
Videos
Howdy!
I'm having some difficult in working with datetime within a pandas dataframe, specifically a dataframe that's being imported from csv, excel and/or sql, which have dates already written on it.
I think that the issue is due to pandas somehow having an american default format of mm/dd/yyyy while I mostly use dd/mm/yyyy or yyyy-mm-dd (with time when needed), and working with datetime class or string class.
I've had cases when the excel had a dd/mm/yyyy date format (can't be sure if string or number) but pandas (vscode with jupyter) insisted on showing as yyyy-mm-dd as datetime column even though each value was apparently a string. It was weird because a similar column that should have been formatted the same had its data shown as some other format. I remember that I had applied a formula to transform it to datetime but it wouldn't reset back to the original format even after I restarted the kernal, as if the formula I applied was a permanent change or something.
So I have some questions:
Can I have a datetime variable (that works with any datetime comparison/formulas) that is expressed/formatted as any format possible (like dd/mm/yyyy or yyyy-mm-dd)? This is important if I have to apply a filter to a dataframe, to define what format should I use and if I can use string or dataframe (
df[df['date']=="21/12/2022']or something likedf[df['date']==datetime(2022,12,21)])I always have to export the data to a file (csv or xlsx) as an intermediate step. Should I export as a datetime or should I convert datetime to string? I'd rather have the data be exported in a way that is visually and systematically understood as a date (so a dd/mm/yyyy format but that excel/sql/other knows that it is a date)
I'm really lost and I've spent the whole day yesterday juggling variables and date formats just to simply compare the values between two different columns. I'm almost requesting an ELI5 because I'm that lost. Like I don't quite understand the difference or when to use datetime.strftime and datetime.strptime
Should I have some standard steps when working with dates within a dataframe, like always formating from string/datetime (default by pd.read) to datetime and then always converting to a specific format when exporting the dataframe (pd.to_)? What is the norm?
Can anyone give me some pointers to understand these things?
Cheers!