You need astype:
df['zipcode'] = df.zipcode.astype(str)
#df.zipcode = df.zipcode.astype(str)
For converting to categorical:
df['zipcode'] = df.zipcode.astype('category')
#df.zipcode = df.zipcode.astype('category')
Another solution is Categorical:
df['zipcode'] = pd.Categorical(df.zipcode)
Sample with data:
import pandas as pd
df = pd.DataFrame({'zipcode': {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, 'bathrooms': {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, 14554: 2.5}, 'sqft_lot': {17384: 1650, 2680: 3700, 722: 51836, 18754: 2640, 14554: 9603}, 'bedrooms': {17384: 2, 2680: 2, 722: 4, 18754: 2, 14554: 4}, 'sqft_living': {17384: 1430, 2680: 1440, 722: 4670, 18754: 1130, 14554: 3180}, 'floors': {17384: 3.0, 2680: 1.0, 722: 2.0, 18754: 1.0, 14554: 2.0}})
print (df)
bathrooms bedrooms floors sqft_living sqft_lot zipcode
722 3.25 4 2.0 4670 51836 98005
2680 0.75 2 1.0 1440 3700 98107
14554 2.50 4 2.0 3180 9603 98155
17384 1.50 2 3.0 1430 1650 98125
18754 1.00 2 1.0 1130 2640 98109
print (df.dtypes)
bathrooms float64
bedrooms int64
floors float64
sqft_living int64
sqft_lot int64
zipcode int64
dtype: object
df['zipcode'] = df.zipcode.astype('category')
print (df)
bathrooms bedrooms floors sqft_living sqft_lot zipcode
722 3.25 4 2.0 4670 51836 98005
2680 0.75 2 1.0 1440 3700 98107
14554 2.50 4 2.0 3180 9603 98155
17384 1.50 2 3.0 1430 1650 98125
18754 1.00 2 1.0 1130 2640 98109
print (df.dtypes)
bathrooms float64
bedrooms int64
floors float64
sqft_living int64
sqft_lot int64
zipcode category
dtype: object
Answer from jezrael on Stack Overflowpandas dataframe convert column type to string or categorical - Stack Overflow
Pandas: converting entire dataframe to string type, except for NaN entries
python - How to convert column with dtype as object to string in Pandas Dataframe - Stack Overflow
python - Pandas: change data type of Series to String - Stack Overflow
Videos
You need astype:
df['zipcode'] = df.zipcode.astype(str)
#df.zipcode = df.zipcode.astype(str)
For converting to categorical:
df['zipcode'] = df.zipcode.astype('category')
#df.zipcode = df.zipcode.astype('category')
Another solution is Categorical:
df['zipcode'] = pd.Categorical(df.zipcode)
Sample with data:
import pandas as pd
df = pd.DataFrame({'zipcode': {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, 'bathrooms': {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, 14554: 2.5}, 'sqft_lot': {17384: 1650, 2680: 3700, 722: 51836, 18754: 2640, 14554: 9603}, 'bedrooms': {17384: 2, 2680: 2, 722: 4, 18754: 2, 14554: 4}, 'sqft_living': {17384: 1430, 2680: 1440, 722: 4670, 18754: 1130, 14554: 3180}, 'floors': {17384: 3.0, 2680: 1.0, 722: 2.0, 18754: 1.0, 14554: 2.0}})
print (df)
bathrooms bedrooms floors sqft_living sqft_lot zipcode
722 3.25 4 2.0 4670 51836 98005
2680 0.75 2 1.0 1440 3700 98107
14554 2.50 4 2.0 3180 9603 98155
17384 1.50 2 3.0 1430 1650 98125
18754 1.00 2 1.0 1130 2640 98109
print (df.dtypes)
bathrooms float64
bedrooms int64
floors float64
sqft_living int64
sqft_lot int64
zipcode int64
dtype: object
df['zipcode'] = df.zipcode.astype('category')
print (df)
bathrooms bedrooms floors sqft_living sqft_lot zipcode
722 3.25 4 2.0 4670 51836 98005
2680 0.75 2 1.0 1440 3700 98107
14554 2.50 4 2.0 3180 9603 98155
17384 1.50 2 3.0 1430 1650 98125
18754 1.00 2 1.0 1130 2640 98109
print (df.dtypes)
bathrooms float64
bedrooms int64
floors float64
sqft_living int64
sqft_lot int64
zipcode category
dtype: object
With pandas >= 1.0 there is now a dedicated string datatype:
1) You can convert your column to this pandas string datatype using .astype('string'):
df['zipcode'] = df['zipcode'].astype('string')
2) This is different from using str which sets the pandas object datatype:
df['zipcode'] = df['zipcode'].astype(str)
3) For changing into categorical datatype use:
df['zipcode'] = df['zipcode'].astype('category')
You can see this difference in datatypes when you look at the info of the dataframe:
df = pd.DataFrame({
'zipcode_str': [90210, 90211] ,
'zipcode_string': [90210, 90211],
'zipcode_category': [90210, 90211],
})
df['zipcode_str'] = df['zipcode_str'].astype(str)
df['zipcode_string'] = df['zipcode_str'].astype('string')
df['zipcode_category'] = df['zipcode_category'].astype('category')
df.info()
# you can see that the first column has dtype object
# while the second column has the new dtype string
# the third column has dtype category
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 zipcode_str 2 non-null object
1 zipcode_string 2 non-null string
2 zipcode_category 2 non-null category
dtypes: category(1), object(1), string(1)
From the docs:
The 'string' extension type solves several issues with object-dtype NumPy arrays:
You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.
object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text, but still object-dtype columns.
When reading code, the contents of an object dtype array is less clear than string.
More info on working with the new string datatype can be found here: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html
Basically, I know I can use
df = df.astype(str)
to convert every entry in every column to a string, but the issue is that it also converts NaN type entries into a string. Is there a way to replicate the above code without touching NaN entries?
Edit: found one potential solution, though might be a bit on the slower side.
df = df.where(df.isna(), df.astype(str))
since strings data types have variable length, it is by default stored as object dtype. If you want to store them as string type, you can do something like this.
df['column'] = df['column'].astype('|S80') #where the max length is set at 80 bytes,
or alternatively
df['column'] = df['column'].astype('|S') # which will by default set the length to the max len it encounters
Did you try assigning it back to the column?
df['column'] = df['column'].astype('str')
Referring to this question, the pandas dataframe stores the pointers to the strings and hence it is of type 'object'. As per the docs ,You could try:
df['column_new'] = df['column'].str.split(',')
A new answer to reflect the most current practices: as of now (v1.2.4), neither astype('str') nor astype(str) work.
As per the documentation, a Series can be converted to the string datatype in the following ways:
df['id'] = df['id'].astype("string")
df['id'] = pandas.Series(df['id'], dtype="string")
df['id'] = pandas.Series(df['id'], dtype=pandas.StringDtype)
End to end example:
import pandas as pd
# Create a sample DataFrame
data = {
'Name': ['John', 'Alice', 'Bob', 'John', 'Alice'],
'Age': [25, 30, 35, 25, 30],
'City': ['New York', 'London', 'Paris', 'New York', 'London'],
'Salary': [50000, 60000, 70000, 50000, 60000],
'Category': ['A', 'B', 'C', 'A', 'B']
}
df = pd.DataFrame(data)
# Print the DataFrame
print("Original DataFrame:")
print(df)
print("\nData types:")
print(df.dtypes)
cat_cols_ = None
# Apply the code to change data types
if not cat_cols_:
# Get the columns with object data type
object_columns = df.select_dtypes(include=['object']).columns.tolist()
if len(object_columns) > 0:
print(f"\nObject columns found, converting to string: {object_columns}")
# Convert object columns to string type
df[object_columns] = df[object_columns].astype('string')
# Get the categorical columns (including string and category data types)
cat_cols_ = df.select_dtypes(include=['category', 'string']).columns.tolist()
# Print the updated DataFrame and data types
print("\nUpdated DataFrame:")
print(df)
print("\nUpdated data types:")
print(df.dtypes)
print(f"\nCategorical columns (cat_cols_): {cat_cols_}")
Original DataFrame:
Name Age City Salary Category
0 John 25 New York 50000 A
1 Alice 30 London 60000 B
2 Bob 35 Paris 70000 C
3 John 25 New York 50000 A
4 Alice 30 London 60000 B
Data types:
Name object
Age int64
City object
Salary int64
Category object
dtype: object
Object columns found, converting to string: ['Name', 'City', 'Category']
Updated DataFrame:
Name Age City Salary Category
0 John 25 New York 50000 A
1 Alice 30 London 60000 B
2 Bob 35 Paris 70000 C
3 John 25 New York 50000 A
4 Alice 30 London 60000 B
Updated data types:
Name string[python]
Age int64
City string[python]
Salary int64
Category string[python]
dtype: object
Categorical columns (cat_cols_): ['Name', 'City', 'Category']
You can convert all elements of id to str using apply
df.id.apply(str)
0 123
1 512
2 zhub1
3 12354.3
4 129
5 753
6 295
7 610
Edit by OP:
I think the issue was related to the Python version (2.7.), this worked:
df['id'].astype(basestring)
0 123
1 512
2 zhub1
3 12354.3
4 129
5 753
6 295
7 610
Name: id, dtype: object
One way to convert to string is to use astype:
Copytotal_rows['ColumnID'] = total_rows['ColumnID'].astype(str)
However, perhaps you are looking for the to_json function, which will convert keys to valid json (and therefore your keys to strings):
CopyIn [11]: df = pd.DataFrame([['A', 2], ['A', 4], ['B', 6]])
In [12]: df.to_json()
Out[12]: '{"0":{"0":"A","1":"A","2":"B"},"1":{"0":2,"1":4,"2":6}}'
In [13]: df[0].to_json()
Out[13]: '{"0":"A","1":"A","2":"B"}'
Note: you can pass in a buffer/file to save this to, along with some other options...
If you need to convert ALL columns to strings, you can simply use:
Copydf = df.astype(str)
This is useful if you need everything except a few columns to be strings/objects, then go back and convert the other ones to whatever you need (integer in this case):
Copy df[["D", "E"]] = df[["D", "E"]].astype(int)