In [16]: df = DataFrame(np.arange(10).reshape(5,2),columns=list('AB'))
In [17]: df
Out[17]:
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
In [18]: df.dtypes
Out[18]:
A int64
B int64
dtype: object
Convert a series
In [19]: df['A'].apply(str)
Out[19]:
0 0
1 2
2 4
3 6
4 8
Name: A, dtype: object
In [20]: df['A'].apply(str)[0]
Out[20]: '0'
Don't forget to assign the result back:
df['A'] = df['A'].apply(str)
Convert the whole frame
In [21]: df.applymap(str)
Out[21]:
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
In [22]: df.applymap(str).iloc[0,0]
Out[22]: '0'
df = df.applymap(str)
Answer from Jeff on Stack OverflowIn [16]: df = DataFrame(np.arange(10).reshape(5,2),columns=list('AB'))
In [17]: df
Out[17]:
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
In [18]: df.dtypes
Out[18]:
A int64
B int64
dtype: object
Convert a series
In [19]: df['A'].apply(str)
Out[19]:
0 0
1 2
2 4
3 6
4 8
Name: A, dtype: object
In [20]: df['A'].apply(str)[0]
Out[20]: '0'
Don't forget to assign the result back:
df['A'] = df['A'].apply(str)
Convert the whole frame
In [21]: df.applymap(str)
Out[21]:
A B
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
In [22]: df.applymap(str).iloc[0,0]
Out[22]: '0'
df = df.applymap(str)
Change data type of DataFrame column:
To int:
df.column_name = df.column_name.astype(np.int64)
To str:
df.column_name = df.column_name.astype(str)
The type object is actually string in pandas dataframe.
If you would like to retain the data as string, use df.to_excel() instead of df.to_csv. This is because when opening the CSV file, Excel will automatically convert the number data to numbers.
df1 = pd.DataFrame({'GL': [2311000200.0, 2312000600.0, 2330800100.0]})
df1.GL = df1.GL.astype('int64').astype('string')
df1.to_excel('test.xlsx', index=False)

You can force it to use the string dtype by using:
>>> df1.GL.astype("string")
df1.GL
0 2311000200.0
1 2312000600.0
2 2330800100.0
Name: GL, dtype: string
However, object dtypes are fine for most string operations. As per the docs:
For backwards-compatibility, object dtype remains the default type we infer a list of strings to
You can try by doing df["Bare Nuclei"].astype(np.int64) but as far as I can see the problem is something else. Pandas first reads all the data to best estimate the data type for each column, then only makes the data frame. So, there must be some entries in the data frame which are not integer types, i.e., they may contain some letters. In that case, also typecasting should give an error. So you need to remove those entries before successfully making the table integer.
I had the same problem with the same dataset.
There are lots of "?" in the data for the 'bare_nuclei' column (16) of them in the csv itself you need to use the error handling to drop the rows with the ? in the bare_nuclei column, also as a heads up don't name 'class' column class as that's a reserved keyword in python and that's also going to cause problems later.
You can fix this at import using:
missing_values = ["NA","N/a",np.nan,"?"]
l1 = pd.read_csv("../DataSets/Breast cancer dataset/breast-cancer-wisconsin.data",
header=None, na_values=missing_values,
names=['id','clump_thickness','uniformity_of_cell_size',
'uniformity_of_cell_shape', 'marginal_adhesion',
'single_epithelial_cell_size', 'bare_nuclei', 'bland_chromatin',
'normal_nucleoli', 'mitoses', 'diagnosis'])
l1 = l1.dropna()
One way to convert to string is to use astype:
total_rows['ColumnID'] = total_rows['ColumnID'].astype(str)
However, perhaps you are looking for the to_json function, which will convert keys to valid json (and therefore your keys to strings):
In [11]: df = pd.DataFrame([['A', 2], ['A', 4], ['B', 6]])
In [12]: df.to_json()
Out[12]: '{"0":{"0":"A","1":"A","2":"B"},"1":{"0":2,"1":4,"2":6}}'
In [13]: df[0].to_json()
Out[13]: '{"0":"A","1":"A","2":"B"}'
Note: you can pass in a buffer/file to save this to, along with some other options...
If you need to convert ALL columns to strings, you can simply use:
df = df.astype(str)
This is useful if you need everything except a few columns to be strings/objects, then go back and convert the other ones to whatever you need (integer in this case):
df[["D", "E"]] = df[["D", "E"]].astype(int)
I solved my problem as follows.
Op['dt'] = pd.to_datetime(Op['dt'], format='%Y%m%d')
Op['dt'] = pd.to_datetime(Op['dt'], unit = 's')
Op['dt'] = Op['dt'].fillna(datetime(2200,12,31))
Op['dt'] = Op['dt'].apply(lambda x: x.strftime('%d/%m/%Y'))
Try pd.to_datetime(df["dt"]).dt.strftime('%d/%m/%Y')
Ex:
import pandas as pd
df = pd.DataFrame({"dt" : [ "19670619"]})
df["dt"] = pd.to_datetime(df["dt"]).dt.strftime('%d/%m/%Y')
print(df)
Output:
dt
0 19/06/1967