Let pandas to parse dates, but then some days with months should be swapped:
df['accepted_date'] = pd.to_datetime(df['accepted_date'])
So better is use to_datetime with format and parameter errors='coerce', what return only matched datetimes with NaT for non matched. Last use combine_first for join all Series - NaT are replaced by values from another Series:
df = pd.DataFrame({'accepted_date':['2017-01-02','07-08-2017','20-03-2017','2017-01-04']})
d1 = pd.to_datetime(df['accepted_date'], format='%d-%m-%Y', errors='coerce')
d2 = pd.to_datetime(df['accepted_date'], format='%Y-%m-%d', errors='coerce')
df['accepted_date1'] = d1.combine_first(d2)
df['accepted_date2'] = pd.to_datetime(df['accepted_date'])
print (df)
accepted_date accepted_date1 accepted_date2
0 2017-01-02 2017-01-02 2017-01-02
1 07-08-2017 2017-08-07 2017-07-08 <-swapped dd-mm
2 20-03-2017 2017-03-20 2017-03-20
3 2017-01-04 2017-01-04 2017-01-04
Detail:
print (d1)
0 NaT
1 2017-08-07
2 2017-03-20
3 NaT
Name: accepted_date, dtype: datetime64[ns]
print (d2)
0 2017-01-02
1 NaT
2 NaT
3 2017-01-04
Name: accepted_date, dtype: datetime64[ns]
EDIT:
Another solution is use parameter dayfirst=True:
df['accepted_date3'] = pd.to_datetime(df['accepted_date'], dayfirst=True)
print (df)
accepted_date accepted_date3
0 2017-01-02 2017-01-02
1 07-08-2017 2017-08-07
2 20-03-2017 2017-03-20
3 2017-01-04 2017-01-04
Answer from jezrael on Stack Overflowpython - Pandas, convert datetime format mm/dd/yyyy to dd/mm/yyyy - Stack Overflow
python - Convert date string YYYY-MM-DD to YYYYMM in pandas - Stack Overflow
python - Convert pandas datetime column yyyy-mm-dd to YYYYMMDD - Stack Overflow
pandas - Convert integer (YYYYMMDD) to date format (mm/dd/yyyy) in python - Stack Overflow
Videos
You can use the parse_dates and dayfirst arguments of pd.read_csv, see: the docs for read_csv()
df = pd.read_csv('myfile.csv', parse_dates=['Date'], dayfirst=True)
This will read the Date column as datetime values, correctly taking the first part of the date input as the day. Note that in general you will want your dates to be stored as datetime objects.
Then, if you need to output the dates as a string you can call dt.strftime():
df['Date'].dt.strftime('%d/%m/%Y')
When I use again this:
df['Date'] = pd.to_datetime(df['Date']), it gets back to the previous format.
No, you cannot simultaneously have the string format of your choice and keep your series of type datetime. As remarked here:
datetimeseries are stored internally as integers. Any human-readable date representation is just that, a representation, not the underlying integer. To access your custom formatting, you can use methods available in Pandas. You can even store such a text representation in apd.Seriesvariable:formatted_dates = df['datetime'].dt.strftime('%m/%d/%Y')The
dtypeofformatted_dateswill beobject, which indicates that the elements of your series point to arbitrary Python times. In this case, those arbitrary types happen to be all strings.Lastly, I strongly recommend you do not convert a
datetimeseries to strings until the very last step in your workflow. This is because as soon as you do so, you will no longer be able to use efficient, vectorised operations on such a series.
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]
If your column is a string, you will need to first use `pd.to_datetime',
df['Date'] = pd.to_datetime(df['Date'])
Then, use .dt datetime accessor with strftime:
df = pd.DataFrame({'Date':pd.date_range('2017-01-01', periods = 60, freq='D')})
df.Date.dt.strftime('%Y%m%d').astype(int)
Or use lambda function:
df.Date.apply(lambda x: x.strftime('%Y%m%d')).astype(int)
Output:
0 20170101
1 20170102
2 20170103
3 20170104
4 20170105
5 20170106
6 20170107
7 20170108
8 20170109
9 20170110
10 20170111
11 20170112
12 20170113
13 20170114
14 20170115
15 20170116
16 20170117
17 20170118
18 20170119
19 20170120
20 20170121
21 20170122
22 20170123
23 20170124
24 20170125
25 20170126
26 20170127
27 20170128
28 20170129
29 20170130
30 20170131
31 20170201
32 20170202
33 20170203
34 20170204
35 20170205
36 20170206
37 20170207
38 20170208
39 20170209
40 20170210
41 20170211
42 20170212
43 20170213
44 20170214
45 20170215
46 20170216
47 20170217
48 20170218
49 20170219
50 20170220
51 20170221
52 20170222
53 20170223
54 20170224
55 20170225
56 20170226
57 20170227
58 20170228
59 20170301
Name: Date, dtype: int32
The error in the OP occurred because datetime.datetime.strftime was called without a datetime/date argument in apply(). The format= should be passed as a separate argument to apply(), which will be passed off to strftime() as the format.
from datetime import datetime
x = dates.apply(datetime.strftime, format='%Y%m%d').astype(int)
If the date were strings (instead of datetime/date), then str.replace() should do the job.
x = dates.str.replace('-', '').astype(int)
# using apply
x = dates.apply(lambda x: x.replace('-', '')).astype(int)
A mildly interesting(?) thing to note is that both .dt.strftime and str.replace of pandas are not optimized, so calling Python's strftime and str.replace via apply() is actually faster than the pandas counterparts (in the case of strftime, it is much faster).
dates = pd.Series(pd.date_range('2020','2200', freq='d'))
%timeit dates.dt.strftime('%Y%m%d')
# 719 ms ± 41.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit dates.apply(datetime.strftime, format='%Y%m%d')
# 472 ms ± 34.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
dates = dates.astype(str)
%timeit dates.str.replace('-', '')
# 30.9 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit dates.apply(lambda x: x.replace('-', ''))
# 26 ms ± 183 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
You can use datetime methods.
from datetime import datetime
a = '20160228'
date = datetime.strptime(a, '%Y%m%d').strftime('%m/%d/%Y')
Good Luck;
Build a new column with applymap:
import pandas as pd
dates = [
20160228,
20161231,
20160618,
20170123,
20151124,
]
df = pd.DataFrame(data=list(enumerate(dates, start=1)), columns=['id','int_date'])
df[['str_date']] = df[['int_date']].applymap(str).applymap(lambda s: "{}/{}/{}".format(s[4:6],s[6:], s[0:4]))
print(df)
Emits:
$ python test.py
id int_date str_date
0 1 20160228 02/28/2016
1 2 20161231 12/31/2016
2 3 20160618 06/18/2016
3 4 20170123 01/23/2017
4 5 20151124 11/24/2015
to_datetime accepts a format string:
In [92]:
t = 20070530
pd.to_datetime(str(t), format='%Y%m%d')
Out[92]:
Timestamp('2007-05-30 00:00:00')
example:
In [94]:
t = 20070530
df = pd.DataFrame({'date':[t]*10})
df
Out[94]:
date
0 20070530
1 20070530
2 20070530
3 20070530
4 20070530
5 20070530
6 20070530
7 20070530
8 20070530
9 20070530
In [98]:
df['DateTime'] = df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
df
Out[98]:
date DateTime
0 20070530 2007-05-30
1 20070530 2007-05-30
2 20070530 2007-05-30
3 20070530 2007-05-30
4 20070530 2007-05-30
5 20070530 2007-05-30
6 20070530 2007-05-30
7 20070530 2007-05-30
8 20070530 2007-05-30
9 20070530 2007-05-30
In [99]:
df.dtypes
Out[99]:
date int64
DateTime datetime64[ns]
dtype: object
EDIT
Actually it's quicker to convert the type to string and then convert the entire series to a datetime rather than calling apply on every value:
In [102]:
df['DateTime'] = pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
df
Out[102]:
date DateTime
0 20070530 2007-05-30
1 20070530 2007-05-30
2 20070530 2007-05-30
3 20070530 2007-05-30
4 20070530 2007-05-30
5 20070530 2007-05-30
6 20070530 2007-05-30
7 20070530 2007-05-30
8 20070530 2007-05-30
9 20070530 2007-05-30
timings
In [104]:
%timeit df['date'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
100 loops, best of 3: 2.55 ms per loop
In [105]:
%timeit pd.to_datetime(df['date'].astype(str), format='%Y%m%d')
1000 loops, best of 3: 396 µs per loop
You don't need to cast to strings, pd.to_datetime() can parse
int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like
so directly calling it with the specific format= should work.
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d')
One useful parameter is errors=. By setting it to 'coerce', you can get NaT values for "broken" dates instead of having an error raised.
df['date'] = pd.to_datetime(df['date'], format='%Y%m%d', errors='coerce')
Hello all,
I have an excel file with about 100 columns, and 30 or so are dates, I would like to convert all the date formats
from:
YYYY-MM-DD
to
M/D/YYYY
I was able to change it to MM/DD/YYYY using the following code
def fmt(input_dt):
if isnull(input_dt):
return ""
else:
return input_dt.strftime("%m/%d/%Y")
for col in df.columns:
if df[col].dtype == 'datetime64[ns]':
df[col] = df[col].apply(fmt)but that gives me
MM/DD/YYYY
I also need it to be datetime when exported back to excel.
I looked into the documentation
https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
But it does not have M/D/YYYY any suggestions would be helpful. Thank you! Also if there is a more pythonic way to write it please let me know