pandas convert categorical to string

pandas dataframe convert column type to string or categorical

stackoverflow.com › questions › 39092067 › pandas-dataframe-convert-column-type-to-string-or-categorical

You need astype:

Copydf['zipcode'] = df.zipcode.astype(str)
#df.zipcode = df.zipcode.astype(str)

For converting to categorical:

Copydf['zipcode'] = df.zipcode.astype('category')
#df.zipcode = df.zipcode.astype('category')

Another solution is Categorical:

Copydf['zipcode'] = pd.Categorical(df.zipcode)

Sample with data:

Copyimport pandas as pd

df = pd.DataFrame({'zipcode': {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, 'bathrooms': {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, 14554: 2.5}, 'sqft_lot': {17384: 1650, 2680: 3700, 722: 51836, 18754: 2640, 14554: 9603}, 'bedrooms': {17384: 2, 2680: 2, 722: 4, 18754: 2, 14554: 4}, 'sqft_living': {17384: 1430, 2680: 1440, 722: 4670, 18754: 1130, 14554: 3180}, 'floors': {17384: 3.0, 2680: 1.0, 722: 2.0, 18754: 1.0, 14554: 2.0}})

Copyprint (df)
       bathrooms  bedrooms  floors  sqft_living  sqft_lot  zipcode
722         3.25         4     2.0         4670     51836    98005
2680        0.75         2     1.0         1440      3700    98107
14554       2.50         4     2.0         3180      9603    98155
17384       1.50         2     3.0         1430      1650    98125
18754       1.00         2     1.0         1130      2640    98109

print (df.dtypes)
bathrooms      float64
bedrooms         int64
floors         float64
sqft_living      int64
sqft_lot         int64
zipcode          int64
dtype: object

df['zipcode'] = df.zipcode.astype('category')

print (df)
       bathrooms  bedrooms  floors  sqft_living  sqft_lot zipcode
722         3.25         4     2.0         4670     51836   98005
2680        0.75         2     1.0         1440      3700   98107
14554       2.50         4     2.0         3180      9603   98155
17384       1.50         2     3.0         1430      1650   98125
18754       1.00         2     1.0         1130      2640   98109

print (df.dtypes)
bathrooms       float64
bedrooms          int64
floors          float64
sqft_living       int64
sqft_lot          int64
zipcode        category
dtype: object

Answer from jezrael on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 39092067 › pandas-dataframe-convert-column-type-to-string-or-categorical

pandas dataframe convert column type to string or categorical - Stack Overflow

Top answer

1 of 5

150

You need astype:

Copydf['zipcode'] = df.zipcode.astype(str)
#df.zipcode = df.zipcode.astype(str)

For converting to categorical:

Copydf['zipcode'] = df.zipcode.astype('category')
#df.zipcode = df.zipcode.astype('category')

Another solution is Categorical:

Copydf['zipcode'] = pd.Categorical(df.zipcode)

Sample with data:

Copyimport pandas as pd

df = pd.DataFrame({'zipcode': {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, 'bathrooms': {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, 14554: 2.5}, 'sqft_lot': {17384: 1650, 2680: 3700, 722: 51836, 18754: 2640, 14554: 9603}, 'bedrooms': {17384: 2, 2680: 2, 722: 4, 18754: 2, 14554: 4}, 'sqft_living': {17384: 1430, 2680: 1440, 722: 4670, 18754: 1130, 14554: 3180}, 'floors': {17384: 3.0, 2680: 1.0, 722: 2.0, 18754: 1.0, 14554: 2.0}})

Copyprint (df)
       bathrooms  bedrooms  floors  sqft_living  sqft_lot  zipcode
722         3.25         4     2.0         4670     51836    98005
2680        0.75         2     1.0         1440      3700    98107
14554       2.50         4     2.0         3180      9603    98155
17384       1.50         2     3.0         1430      1650    98125
18754       1.00         2     1.0         1130      2640    98109

print (df.dtypes)
bathrooms      float64
bedrooms         int64
floors         float64
sqft_living      int64
sqft_lot         int64
zipcode          int64
dtype: object

df['zipcode'] = df.zipcode.astype('category')

print (df)
       bathrooms  bedrooms  floors  sqft_living  sqft_lot zipcode
722         3.25         4     2.0         4670     51836   98005
2680        0.75         2     1.0         1440      3700   98107
14554       2.50         4     2.0         3180      9603   98155
17384       1.50         2     3.0         1430      1650   98125
18754       1.00         2     1.0         1130      2640   98109

print (df.dtypes)
bathrooms       float64
bedrooms          int64
floors          float64
sqft_living       int64
sqft_lot          int64
zipcode        category
dtype: object

2 of 5

With pandas >= 1.0 there is now a dedicated string datatype:

1) You can convert your column to this pandas string datatype using .astype('string'):

Copydf['zipcode'] = df['zipcode'].astype('string')

2) This is different from using str which sets the pandas object datatype:

Copydf['zipcode'] = df['zipcode'].astype(str)

3) For changing into categorical datatype use:

Copydf['zipcode'] = df['zipcode'].astype('category')

You can see this difference in datatypes when you look at the info of the dataframe:

Copydf = pd.DataFrame({
    'zipcode_str': [90210, 90211] ,
    'zipcode_string': [90210, 90211],
    'zipcode_category': [90210, 90211],
})

df['zipcode_str'] = df['zipcode_str'].astype(str)
df['zipcode_string'] = df['zipcode_str'].astype('string')
df['zipcode_category'] = df['zipcode_category'].astype('category')

df.info()

# you can see that the first column has dtype object
# while the second column has the new dtype string
# the third column has dtype category
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   zipcode_str       2 non-null      object  
 1   zipcode_string    2 non-null      string  
 2   zipcode_category  2 non-null      category
dtypes: category(1), object(1), string(1)

From the docs:

The 'string' extension type solves several issues with object-dtype NumPy arrays:

You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.

object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isn’t a clear way to select just text while excluding non-text, but still object-dtype columns.

When reading code, the contents of an object dtype array is less clear than string.

More info on working with the new string datatype can be found here: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html

Pandas

pandas.pydata.org › docs › user_guide › categorical.html

Categorical data — pandas 3.0.3 documentation - PyData |

In contrast to R’s factor function, categorical data is not converting input values to strings; categories will end up the same data type as the original values.

Data Science Parichay

datascienceparichay.com › home › blog › pandas – convert category type column to string

Pandas - Convert Category Type Column to String - Data Science Parichay

March 31, 2022 - You can use the Pandas astype() function to change the data type of a column. To convert a category type column to string type, apply the astype() function on the column and pass 'str' as the argument.

Pandas

pandas.pydata.org › pandas-docs › version › 0.25.3 › user_guide › categorical.html

Categorical data — pandas 0.25.3 documentation

In contrast to R’s factor function, categorical data is not converting input values to strings; categories will end up the same data type as the original values.

Turing

turing.com › kb › convert-categorical-data-in-pandas-and-scikit-learn

How to Convert Categorical Data in Pandas and Scikit-learn

These categorical variables also contain valuable pieces of information about the data. In this article, we will learn how to encode categorical variables to numeric with Pandas and Scikit-learn. Categorical variables are generally addressed as ‘strings’ or ‘categories’ and are finite ...

APXML

apxml.com › courses › intro-data-cleaning-preprocessing › chapter-4-correcting-data-types › converting-to-categorical-string

Converting to Categorical or String Types

While you can store these as strings, converting them to a specific categorical data type offers several advantages, particularly in tools like pandas:

Stack Overflow

stackoverflow.com › questions › 32011359 › convert-categorical-data-in-pandas-dataframe › 46186626

python - Convert categorical data in pandas dataframe - Stack Overflow

Top answer

1 of 1

Use, the labels parameter to generate strings instead of pd.Intevals:

breaks = [-np.inf, .2, .4, np.inf]
test_cut = pd.cut(test,breaks, labels=pd.IntervalIndex.from_breaks(breaks).astype(str))

Try timings with this code.

GitHub

github.com › pandas-dev › pandas › issues › 51074

BUG: Setting `CategoricalDtype` categories as string objects nulls out data · Issue #51074 · pandas-dev/pandas

January 30, 2023 - import pandas as pd double_cats = pd.Series([1.2, 2.3, 3.9, 4.1, 5.5], dtype="category") explicit_dtype = pd.CategoricalDtype( categories=double_cats.dtype.categories.astype("string").astype("object"), ) all_nans = double_cats.astype(explicit_dtype) assert all(all_nans.isna()) I need to convert a category dtype's float categories to be string categories with the object dtype.

Author pandas-dev

Skytowner

skytowner.com › explore › converting_all_object_typed_columns_to_categorical_type_in_pandas_dataframe

Converting all object-typed columns to categorical type in Pandas DataFrame

Adding a prefix to column valuesAdding leading zeros to strings of a columnAdding new column using listsAdding padding to a column of stringsBit-wise ORChanging column type to stringConditionally updating values of a DataFrameConverting all object-typed columns to categorical typeConverting column type to dateConverting column type to floatConverting column type to integerConverting K and M to numerical formConverting string categories or labels to numeric valuesEncoding categorical variablesExpanding lists vertically in a DataFrameExpanding strings vertically in a DataFrameExtracting numbers

Pandas

pandas.pydata.org › pandas-docs › version › 0.15.2 › categorical.html

Categorical Data — pandas 0.15.2 documentation

In contrast to R’s factor function, categorical data is not converting input values to strings and categories will end up the same data type as the original values.

Hyperskill

hyperskill.org › learn › step › 33221

Converting a string column to a categorical type

Hyperskill is an educational platform for learning programming and software development through project-based courses, that helps you secure a job in tech. Master Python, Java, Kotlin, and more with real-world coding challenges.

ProjectPro

projectpro.io › recipes › convert-string-categorical-variables-into-numerical-variables-in-python

How to convert string to categorical pandas? - Projectpro

December 20, 2022 - Machine Learning Models can not work on categorical variables in the form of strings, so we need to change it into numerical form. This can be done by making new features according to the categories by assigning it values. This python source code does the following: 1. Creates a data dictionary and converts it into pandas dataframe 2.

Pandas

pandas.pydata.org › pandas-docs › version › 0.15 › categorical.html

Categorical Data — pandas 0.15.2 documentation - PyData |

In contrast to R’s factor function, categorical data is not converting input values to strings and categories will end up the same data type as the original values.

Towards Data Science

towardsdatascience.com › home › latest › using pandas categories properly is tricky, here’s why…

Using pandas categories properly is tricky, here's why... | Towards Data Science

January 24, 2025 - I am aware that ordinarily, humans do the operations on cats, but reality is even stranger than fiction: we have pandas doing operations on cats! It’s probable that at some point you’re going to want to do something with your categorical columns, one of those things might be a transformation. This is the first place that we’re going to have to show some diligence… · Since categorical columns are often text based columns let’s look at an example using string manipulations, we can do these manipulations on categorical columns in the same way that we do ordinarily for text based object columns; by using the .str accessor.

Medium

medium.com › @whyamit404 › steps-to-pandas-convert-column-to-string-6154c98e3ae5

Steps to Pandas Convert Column to String | by whyamit404 | Medium

April 12, 2025 - Type checking: After converting a column, you may want to check if the operation worked successfully. Use the dtypes attribute for this: ... You’ll see that the type for ‘Scores’ is now object, which is the generic type for strings in Pandas.

GeeksforGeeks

geeksforgeeks.org › how-to-convert-categorical-variable-to-numeric-in-pandas

How to Convert Categorical Variable to Numeric in Pandas? - GeeksforGeeks

May 31, 2025 - The datasets have both numerical and categorical features. Categorical features refer to string data types and can be easily understood by human beings. However, machines cannot interpret the categorical data directly. Therefore, the categorical data must be converted into numerical data for further

Practical Business Python

pbpython.com › categorical-encoding.html

Guide to Encoding Categorical Values in Python - Practical Business Python

The nice aspect of this approach is that you get the benefits of pandas categories (compact data size, ability to order, plotting support) but can easily be converted to numeric values for further analysis.