The format you are passing is invalid. The dash between the % and the I is not supposed to be there.
df['TIME'] = pd.to_datetime(df['TIME'], format="%m/%d/%Y %I:%M:%S %p")
This will convert your TIME column to a datetime.
Alternatively, you can adjust your read_csv call to do this:
pd.read_csv('testresult.csv', parse_dates=['TIME'],
date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))
Again, this uses the appropriate format with out the extra -, but it also passes in the format to the date_parser parameter instead of having pandas attempt to guess it with the infer_datetime_format parameter.
The format you are passing is invalid. The dash between the % and the I is not supposed to be there.
df['TIME'] = pd.to_datetime(df['TIME'], format="%m/%d/%Y %I:%M:%S %p")
This will convert your TIME column to a datetime.
Alternatively, you can adjust your read_csv call to do this:
pd.read_csv('testresult.csv', parse_dates=['TIME'],
date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))
Again, this uses the appropriate format with out the extra -, but it also passes in the format to the date_parser parameter instead of having pandas attempt to guess it with the infer_datetime_format parameter.
you can try this:
In [69]: df = pd.read_csv(fn, parse_dates=[0],
date_parser=lambda x: pd.to_datetime(x, format='%m/%d/%Y %I:%M:%S %p'))
In [70]: df
Out[70]:
TIME RESULT
0 2016-03-24 00:27:11 2
1 2016-03-24 00:28:41 76
2 2016-03-24 00:37:23 19
3 2016-03-24 00:38:44 68
4 2016-03-24 00:42:02 44
Videos
Why does pd.to_datetime return datetime64[ns] and not a Python datetime?
How do I convert a datetime column back to a string?
What is the difference between pd.to_datetime() and pd.Timestamp()?
You can use dt.strftime if you need to convert datetime to other formats (but note that then dtype of column will be object (string)):
import pandas as pd
df = pd.DataFrame({'DOB': {0: '26/1/2016', 1: '26/1/2016'}})
print (df)
DOB
0 26/1/2016
1 26/1/2016
df['DOB'] = pd.to_datetime(df.DOB)
print (df)
DOB
0 2016-01-26
1 2016-01-26
df['DOB1'] = df['DOB'].dt.strftime('%m/%d/%Y')
print (df)
DOB DOB1
0 2016-01-26 01/26/2016
1 2016-01-26 01/26/2016
There is a difference between
- the content of a dataframe cell (a binary value) and
- its presentation (displaying it) for us, humans.
So the question is: How to reach the appropriate presentation of my data without changing the data / data types themselves?
Here is the answer:
- If you use the Jupyter notebook for displaying your dataframe, or
- if you want to reach a presentation in the form of an HTML file (even with many prepared superfluous
idandclassattributes for further CSS styling — you may or you may not use them),
use styling. Styling doesn't change data / data types of columns of your dataframe.
Now I show you how to reach it in the Jupyter notebook — for a presentation in the form of HTML file see the note near the end of this answer.
I will suppose that your column DOB already has the datetime64 type (you have shown that you know how to reach it). I prepared a simple dataframe (with only one column) to show you some basic styling:
Not styled:
df
DOB 0 2019-07-03 1 2019-08-03 2 2019-09-03 3 2019-10-03
Styling it as
mm/dd/yyyy:df.style.format({"DOB": lambda t: t.strftime("%m/%d/%Y")})
DOB 0 07/03/2019 1 08/03/2019 2 09/03/2019 3 10/03/2019
Styling it as
dd-mm-yyyy:df.style.format({"DOB": lambda t: t.strftime("%d-%m-%Y")})
DOB 0 03-07-2019 1 03-08-2019 2 03-09-2019 3 03-10-2019
Be careful!
The returning object is NOT a dataframe — it is an object of the class Styler, so don't assign it back to df:
Don't do this:
df = df.style.format({"DOB": lambda t: t.strftime("%m/%d/%Y")}) # Don't do this!
(Every dataframe has its Styler object accessible by its .style property, and we changed this df.style object, not the dataframe itself.)
Questions and Answers:
Q: Why your Styler object (or an expression returning it) used as the last command in a Jupyter notebook cell displays your (styled) table, and not the Styler object itself?
A: Because every Styler object has a callback method
._repr_html_()which returns an HTML code for rendering your dataframe (as a nice HTML table).Jupyter Notebook IDE calls this method automatically to render objects which have it.
Note:
You don't need the Jupyter notebook for styling (i.e., for nice outputting a dataframe without changing its data / data types).
A Styler object has a method render(), too, if you want to obtain a string with the HTML code (e.g., for publishing your formatted dataframe on the Web, or simply present your table in the HTML format):
df_styler = df.style.format({"DOB": lambda t: t.strftime("%m/%d/%Y")})
HTML_string = df_styler.render()
Howdy!
I'm having some difficult in working with datetime within a pandas dataframe, specifically a dataframe that's being imported from csv, excel and/or sql, which have dates already written on it.
I think that the issue is due to pandas somehow having an american default format of mm/dd/yyyy while I mostly use dd/mm/yyyy or yyyy-mm-dd (with time when needed), and working with datetime class or string class.
I've had cases when the excel had a dd/mm/yyyy date format (can't be sure if string or number) but pandas (vscode with jupyter) insisted on showing as yyyy-mm-dd as datetime column even though each value was apparently a string. It was weird because a similar column that should have been formatted the same had its data shown as some other format. I remember that I had applied a formula to transform it to datetime but it wouldn't reset back to the original format even after I restarted the kernal, as if the formula I applied was a permanent change or something.
So I have some questions:
Can I have a datetime variable (that works with any datetime comparison/formulas) that is expressed/formatted as any format possible (like dd/mm/yyyy or yyyy-mm-dd)? This is important if I have to apply a filter to a dataframe, to define what format should I use and if I can use string or dataframe (
df[df['date']=="21/12/2022']or something likedf[df['date']==datetime(2022,12,21)])I always have to export the data to a file (csv or xlsx) as an intermediate step. Should I export as a datetime or should I convert datetime to string? I'd rather have the data be exported in a way that is visually and systematically understood as a date (so a dd/mm/yyyy format but that excel/sql/other knows that it is a date)
I'm really lost and I've spent the whole day yesterday juggling variables and date formats just to simply compare the values between two different columns. I'm almost requesting an ELI5 because I'm that lost. Like I don't quite understand the difference or when to use datetime.strftime and datetime.strptime
Should I have some standard steps when working with dates within a dataframe, like always formating from string/datetime (default by pd.read) to datetime and then always converting to a specific format when exporting the dataframe (pd.to_)? What is the norm?
Can anyone give me some pointers to understand these things?
Cheers!
I work for a European company which uses US format dates. There are some dates that go into the database as a future date and shouldn't. These are obviously dd/mm/yyyy dates being read as mm/dd//yyyy.
I've set up a small script to see which files are using dd/mm/yyyy dates, it's not working as I expected. As a test:
print(pd.to_datetime('29/1/21'))
print(pd.to_datetime('1/29/21'))
produces:
2021-01-29 00:00:00
2021-01-29 00:00:00
________
print(pd.to_datetime('29/1/2021', dayfirst=True))
print(pd.to_datetime('1/29/2021', dayfirst=True))
also produces the same result, which is probably down to this warning in https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
Warning: dayfirst=True is not strict, but will prefer to parse with day first (this is a known bug, based on dateutil behavior).
What is the best way of setting region to US (note I am in Europe), strictly using mm/dd/yyyy and forcing an error if there is a European date like 13/1/2021