Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
    public[col] = public[col].astype('category')
Answer from unutbu on Stack Overflow
๐ŸŒ
Pandas
pandas.pydata.org โ€บ docs โ€บ user_guide โ€บ categorical.html
Categorical data โ€” pandas 3.0.3 documentation - PyData |
... In contrast to Rโ€™s factor function, there is currently no way to assign/change labels at creation time. Use categories to change the categories after creation time. ... This information can be stored in a CategoricalDtype. The categories argument is optional, which implies that the actual ...
Discussions

How to Transform Categorical Data to Numerical Data Using Pandas
Pd.get_dummies(data) More on reddit.com
๐ŸŒ r/learnmachinelearning
6
8
November 29, 2021
How do you guys handle pandas and its sh*tty data type inference
I feel like y'all need to learn how to read docs, you can (and should) specify your schema beforehand, which you can do by setting dtype param on read_csv to a dictionary in the form of "column_name": pandas_type. Docs More on reddit.com
๐ŸŒ r/Python
105
55
April 14, 2023
๐ŸŒ
Bobby Hadz
bobbyhadz.com โ€บ blog โ€บ pandas-change-column-type-to-categorical
Pandas: Changing the column type to Categorical | bobbyhadz
Copied!import pandas as pd df = ... salary float64 # dtype: object print(df.dtypes) ... The lambda function gets called with each column name and sets its type to category....
๐ŸŒ
Skytowner
skytowner.com โ€บ explore โ€บ changing_column_type_to_categorical_in_pandas
Changing column type to categorical in Pandas
To change column type to categorical in Pandas, use the DataFrame's astype("category") method.
๐ŸŒ
Pandas
pandas.pydata.org โ€บ docs โ€บ reference โ€บ api โ€บ pandas.DataFrame.astype.html
pandas.DataFrame.astype โ€” pandas 3.0.3 documentation
>>> from pandas.api.types import CategoricalDtype >>> cat_dtype = CategoricalDtype(categories=[2, 1], ordered=True) >>> ser.astype(cat_dtype) 0 1 1 2 dtype: category Categories (2, int64): [2 < 1]
๐ŸŒ
Pandas
pandas.pydata.org โ€บ pandas-docs โ€บ stable โ€บ user_guide โ€บ categorical.html
Categorical data โ€” pandas 3.0.1 documentation - PyData |
... In contrast to Rโ€™s factor function, there is currently no way to assign/change labels at creation time. Use categories to change the categories after creation time. ... This information can be stored in a CategoricalDtype. The categories argument is optional, which implies that the actual ...
๐ŸŒ
DataScientYst
datascientyst.com โ€บ convert-column-to-categorical-pandas-dataframe-examples
How to Convert Column to Categorical in Pandas DataFrame with Examples
November 20, 2023 - In this article, we'll explore how to convert columns to categorical in a Pandas DataFrame with practical examples. In data analysis, efficient memory usage and improved performance are crucial considerations. Conversion column to categorical is simple as: df['col'].astype('category') Let's dive into more details.
Find elsewhere
๐ŸŒ
Data Science Parichay
datascienceparichay.com โ€บ home โ€บ blog โ€บ pandas โ€“ change column type to category
Pandas - Change Column Type to Category - Data Science Parichay
March 22, 2022 - Pass "category" as an argument to the pandas astype function to convert the column type to category. By default categories are unordered.
Top answer
1 of 5
150

You need astype:

df['zipcode'] = df.zipcode.astype(str)
#df.zipcode = df.zipcode.astype(str)

For converting to categorical:

df['zipcode'] = df.zipcode.astype('category')
#df.zipcode = df.zipcode.astype('category')

Another solution is Categorical:

df['zipcode'] = pd.Categorical(df.zipcode)

Sample with data:

import pandas as pd

df = pd.DataFrame({'zipcode': {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, 'bathrooms': {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, 14554: 2.5}, 'sqft_lot': {17384: 1650, 2680: 3700, 722: 51836, 18754: 2640, 14554: 9603}, 'bedrooms': {17384: 2, 2680: 2, 722: 4, 18754: 2, 14554: 4}, 'sqft_living': {17384: 1430, 2680: 1440, 722: 4670, 18754: 1130, 14554: 3180}, 'floors': {17384: 3.0, 2680: 1.0, 722: 2.0, 18754: 1.0, 14554: 2.0}})
print (df)
       bathrooms  bedrooms  floors  sqft_living  sqft_lot  zipcode
722         3.25         4     2.0         4670     51836    98005
2680        0.75         2     1.0         1440      3700    98107
14554       2.50         4     2.0         3180      9603    98155
17384       1.50         2     3.0         1430      1650    98125
18754       1.00         2     1.0         1130      2640    98109

print (df.dtypes)
bathrooms      float64
bedrooms         int64
floors         float64
sqft_living      int64
sqft_lot         int64
zipcode          int64
dtype: object

df['zipcode'] = df.zipcode.astype('category')

print (df)
       bathrooms  bedrooms  floors  sqft_living  sqft_lot zipcode
722         3.25         4     2.0         4670     51836   98005
2680        0.75         2     1.0         1440      3700   98107
14554       2.50         4     2.0         3180      9603   98155
17384       1.50         2     3.0         1430      1650   98125
18754       1.00         2     1.0         1130      2640   98109

print (df.dtypes)
bathrooms       float64
bedrooms          int64
floors          float64
sqft_living       int64
sqft_lot          int64
zipcode        category
dtype: object
2 of 5
35

With pandas >= 1.0 there is now a dedicated string datatype:

1) You can convert your column to this pandas string datatype using .astype('string'):

df['zipcode'] = df['zipcode'].astype('string')

2) This is different from using str which sets the pandas object datatype:

df['zipcode'] = df['zipcode'].astype(str)

3) For changing into categorical datatype use:

df['zipcode'] = df['zipcode'].astype('category')

You can see this difference in datatypes when you look at the info of the dataframe:

df = pd.DataFrame({
    'zipcode_str': [90210, 90211] ,
    'zipcode_string': [90210, 90211],
    'zipcode_category': [90210, 90211],
})

df['zipcode_str'] = df['zipcode_str'].astype(str)
df['zipcode_string'] = df['zipcode_str'].astype('string')
df['zipcode_category'] = df['zipcode_category'].astype('category')

df.info()

# you can see that the first column has dtype object
# while the second column has the new dtype string
# the third column has dtype category
 #   Column            Non-Null Count  Dtype   
---  ------            --------------  -----   
 0   zipcode_str       2 non-null      object  
 1   zipcode_string    2 non-null      string  
 2   zipcode_category  2 non-null      category
dtypes: category(1), object(1), string(1)

From the docs:

The 'string' extension type solves several issues with object-dtype NumPy arrays:

  1. You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.

  2. object dtype breaks dtype-specific operations like DataFrame.select_dtypes(). There isnโ€™t a clear way to select just text while excluding non-text, but still object-dtype columns.

  3. When reading code, the contents of an object dtype array is less clear than string.

More info on working with the new string datatype can be found here: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html

๐ŸŒ
DataScience Made Simple
datasciencemadesimple.com โ€บ home โ€บ convert column to categorical in pandas python
Convert column to categorical in pandas python - DataScience Made Simple
January 29, 2023 - Categorical function is used to convert / typecast integer or character column to categorical in pandas python. Typecast a numeric column to categorical using categorical function().
๐ŸŒ
Practical Business Python
pbpython.com โ€บ pandas_dtypes_cat.html
Using The Pandas Category Data Type - Practical Business Python
Care must be taken to understand the data set and the necessary analysis before converting columns to categorical data types. One of the main use cases for categorical data types is more efficient memory usage. In order to demonstrate, we will use a large data set from the US Centers for Medicare and Medicaid Services. This data set includes a 500MB+ csv file that has information about research payments to doctors and hospital in fiscal year 2017. ... import pandas as pd from pandas.api.types import CategoricalDtype df_raw = pd.read_csv('OP_DTL_RSRCH_PGYR2017_P06292018.csv', low_memory=False)
๐ŸŒ
Pandas
pandas.pydata.org โ€บ pandas-docs โ€บ version โ€บ 0.25.3 โ€บ user_guide โ€บ categorical.html
Categorical data โ€” pandas 0.25.3 documentation
In contrast to Rโ€™s factor function, categorical data is not converting input values to strings; categories will end up the same data type as the original values. ... In contrast to Rโ€™s factor function, there is currently no way to assign/change labels at creation time. Use categories to change the categories after creation time. Changed in version 0.21.0. ... This information can be stored in a CategoricalDtype. The categories argument is optional, which implies that the actual categories should be inferred from whatever is present in the data when the pandas.Categorical is created.
๐ŸŒ
Tutorial Reference
tutorialreference.com โ€บ python โ€บ examples โ€บ faq โ€บ python-pandas-how-to-change-column-type-to-categorical-astype
Python Pandas: How to Change DataFrame Column Type to Categorical | Tutorial Reference
Enables Specific Operations: Some ... The most common and recommended way to change a column's data type is by selecting the column (which returns a Series) and then calling the .astype() method on it....
๐ŸŒ
GeeksforGeeks
geeksforgeeks.org โ€บ python-pandas-dataframe-astype
Python | Pandas DataFrame.astype() - GeeksforGeeks
December 3, 2023 - DataFrame.astype() method is used to cast a pandas object to a specified dtype.astype() function also provides the capability to convert any suitable existing column to a categorical type.
๐ŸŒ
Pandas
pandas.pydata.org โ€บ docs โ€บ dev โ€บ user_guide โ€บ categorical.html
Categorical data โ€” pandas 3.0.0rc2+1.g0d0ec6f2c0 documentation
... In contrast to Rโ€™s factor function, there is currently no way to assign/change labels at creation time. Use categories to change the categories after creation time. ... This information can be stored in a CategoricalDtype. The categories argument is optional, which implies that the actual ...
๐ŸŒ
pandas
pandas.pydata.org โ€บ pandas-docs โ€บ dev โ€บ user_guide โ€บ categorical.html
Categorical data โ€” pandas documentation
... In contrast to Rโ€™s factor function, there is currently no way to assign/change labels at creation time. Use categories to change the categories after creation time. ... This information can be stored in a CategoricalDtype. The categories argument is optional, which implies that the actual ...
๐ŸŒ
Dask
docs.dask.org โ€บ en โ€บ latest โ€บ dataframe-categoricals.html
Categoricals โ€” Dask documentation
>>> dtype = {'col': pd.api.types.CategoricalDtype(['a', 'b', 'c'])} >>> ddf = dd.read_csv(..., dtype=dtype) If you write and read to parquet, Dask will forget known categories. This happens because, due to performance concerns, all the categories are saved in every partition rather than in the parquet metadata. It is possible to manually load the categories: >>> import dask.dataframe as dd >>> import pandas as pd >>> df = pd.DataFrame(data=list('abcaabbcc'), columns=['col']) >>> df.col = df.col.astype('category') >>> ddf = dd.from_pandas(df, npartitions=1) >>> ddf.col.cat.known True >>> ddf.to_parquet('tmp') >>> ddf2 = dd.read_parquet('tmp') >>> ddf2.col.cat.known False >>> ddf2.col = ddf2.col.cat.set_categories(ddf2.col.head(1).cat.categories) >>> ddf2.col.cat.known True
๐ŸŒ
Pandas
pandas.pydata.org โ€บ pandas-docs โ€บ version โ€บ 0.25.2 โ€บ user_guide โ€บ categorical.html
Categorical data โ€” pandas 0.25.2 documentation
In contrast to Rโ€™s factor function, categorical data is not converting input values to strings; categories will end up the same data type as the original values. ... In contrast to Rโ€™s factor function, there is currently no way to assign/change labels at creation time. Use categories to change the categories after creation time. Changed in version 0.21.0. ... This information can be stored in a CategoricalDtype. The categories argument is optional, which implies that the actual categories should be inferred from whatever is present in the data when the pandas.Categorical is created.
๐ŸŒ
Pandas
pandas.pydata.org โ€บ pandas-docs โ€บ stable โ€บ reference โ€บ api โ€บ pandas.DataFrame.astype.html
pandas.DataFrame.astype โ€” pandas 3.0.1 documentation
>>> from pandas.api.types import CategoricalDtype >>> cat_dtype = CategoricalDtype(categories=[2, 1], ordered=True) >>> ser.astype(cat_dtype) 0 1 1 2 dtype: category Categories (2, int64): [2 < 1]
๐ŸŒ
Vultr Docs
docs.vultr.com โ€บ python โ€บ third-party โ€บ pandas โ€บ DataFrame โ€บ astype
Python Pandas DataFrame astype() - Change Data Type | Vultr Docs
December 24, 2024 - This code snippet ensures that the NaN values in col1 are filled with 0 before conversion to prevent any type errors. The col2 is converted from string to integer directly. Manage categorical data effectively. Convert a column to categorical type to optimize memory usage.