since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.
df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,
or alternatively
df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters
Answer from Siraj S. on Stack Overflowsince strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.
df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,
or alternatively
df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters
Did you try assigning it back to the column?
df['column'] = df['column'].astype('str')
Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs ,You could try:
df['column_new'] = df['column'].str.split(',')
To convert multiple columns to string, include a list of columns to your above-mentioned command:
df[['one', 'two', 'three']] = df[['one', 'two', 'three']].astype(str)
# add as many column names as you like.
That means that one way to convert all columns is to construct the list of columns like this:
all_columns = list(df) # Creates list of all column headers
df[all_columns] = df[all_columns].astype(str)
Note that the latter can also be done directly (see comments).
I know this is an old question, but I was looking for a way to turn all columns with an object dtype to strings as a workaround for a bug I discovered in rpy2. I'm working with large dataframes, so didn't want to list each column explicitly. This seemed to work well for me so I thought I'd share in case it helps someone else.
stringcols = df.select_dtypes(include='object').columns
df[stringcols] = df[stringcols].fillna('').astype(str)
The "fillna('')" prevents NaN entries from getting converted to the string 'nan' by replacing with an empty string instead.
One way to convert to string is to use astype:
Copytotal_rows['ColumnID'] = total_rows['ColumnID'].astype(str)
However, perhaps you are looking for the to_json function, which will convert keys to valid json (and therefore your keys to strings):
CopyIn [11]: df = pd.DataFrame([['A', 2], ['A', 4], ['B', 6]])
In [12]: df.to_json()
Out[12]: '{"0":{"0":"A","1":"A","2":"B"},"1":{"0":2,"1":4,"2":6}}'
In [13]: df[0].to_json()
Out[13]: '{"0":"A","1":"A","2":"B"}'
Note: you can pass in a buffer/file to save this to, along with some other options...
If you need to convert ALL columns to strings, you can simply use:
Copydf = df.astype(str)
This is useful if you need everything except a few columns to be strings/objects, then go back and convert the other ones to whatever you need (integer in this case):
Copy df[["D", "E"]] = df[["D", "E"]].astype(int)
This gives you the list of column names
lst = list(df)
This converts all the columns to string type
df[lst] = df[lst].astype(str)
df = pd.DataFrame({
'a': [23.0, 51.0, np.nan, 24.0],
'b': ["a42", "3", np.nan, "a1"],
'c': [142.0, 12.0, np.nan, np.nan]})
for col in df:
df[col] = [np.nan if (not isinstance(val, str) and np.isnan(val)) else
(val if isinstance(val, str) else str(int(val)))
for val in df[col].tolist()]
>>> df
a b c
0 23 a42 142
1 51 3 12
2 NaN NaN NaN
3 24 a1 NaN
>>> df.values
array([['23', 'a42', '142'],
['51', '3', '12'],
[nan, nan, nan],
['24', 'a1', nan]], dtype=object)