Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
    public[col] = public[col].astype('category')
Answer from unutbu on Stack Overflow
🌐
DataScience Made Simple
datasciencemadesimple.com › home › convert column to categorical in pandas python
Convert column to categorical in pandas python - DataScience Made Simple
January 29, 2023 - Typecast column to categorical in pandas python using categorical() function · Convert column to categorical in pandas using astype() function
Discussions

python - Converting multiple columns to categories in Pandas. apply? - Stack Overflow
Consider a Dataframe. I want to convert a set of columns to_convert to categories. I can certainly do the following: for col in to_convert: df[col] = df[col].astype('category') but I was surpr... More on stackoverflow.com
🌐 stackoverflow.com
April 2, 2019
scikit learn - Mass convert categorical columns in Pandas (not one-hot encoding) - Data Science Stack Exchange
I have pandas dataframe with tons of categorical columns, which I am planning to use in decision tree with scikit-learn. I need to convert them to numerical values (not one hot vectors). I can do i... More on datascience.stackexchange.com
🌐 datascience.stackexchange.com
September 18, 2016
pandas - Python: Converting multiple columns to a single column with categorical data - Stack Overflow
If I have this table: City State Person 1 Person 2 Atlanta GA Bob Fred But, I want to convert it to: City State Person# Person Name Atlanta GA 1 Bob Atlanta GA 2 Fred What is the most efficient way... More on stackoverflow.com
🌐 stackoverflow.com
July 27, 2021
How to Transform Categorical Data to Numerical Data Using Pandas
Pd.get_dummies(data) More on reddit.com
🌐 r/learnmachinelearning
6
8
November 29, 2021
🌐
Reddit
reddit.com › r/learnpython › in pandas, how do i transform a categorical column with a related numeric column into several numeric columns with the categories as headers?
r/learnpython on Reddit: In Pandas, how do I transform a categorical column with a related numeric column into several numeric columns with the categories as headers?
July 7, 2022 -

I'm working with sentiment data. I'm trying to find the most pythonic way to do a transformation like so:

Original Dataframe

Index SENTIMENT CONFIDENCE
0 Positive .99
1 Negative .98
2 Positive .9
3 Neutral .8

Converted to New Dataframe

Index Positive Negative Neutral
0 .99 NaN NaN
1 NaN .98 NaN
2 .9 NaN NaN
3 NaN NaN .8

I've been doing this with nested loops forever, and I just know that there's some one-line or two-line solution.

I appreciate the help.

🌐
IQCode
iqcode.com › code › python › pandas-convert-multiple-columns-to-categorical
pandas convert multiple columns to categorical Code Example
October 14, 2021 - #Two ways to do this df[['parks', 'playgrounds', 'sports']].apply(lambda x: x.astype('category')) cols = ['parks', 'playgrounds', 'sports',...
🌐
CodeSignal
codesignal.com › learn › courses › data-transformation-techniques-in-pandas › lessons › handling-categorical-data
Handling Categorical Data | CodeSignal Learn
Now let's convert the sex and class columns and reprint the DataFrame information. Notice how sex and class changed from object to category. This confirms the conversion was successful. This way, Pandas now treats these columns as categorical data, optimizing memory and performance.
🌐
DataScientYst
datascientyst.com › convert-column-to-categorical-pandas-dataframe-examples
How to Convert Column to Categorical in Pandas DataFrame with Examples
November 20, 2023 - In this article, we'll explore how to convert columns to categorical in a Pandas DataFrame with practical examples. In data analysis, efficient memory usage and improved performance are crucial considerations. Conversion column to categorical is simple as: df['col'].astype('category') Let's ...
Find elsewhere
🌐
Pandas
pandas.pydata.org › docs › user_guide › categorical.html
Categorical data — pandas 3.0.3 documentation - PyData |
This has some performance implication if you have a Series of type string, where lots of elements are repeated (i.e. the number of unique elements in the Series is a lot smaller than the length of the Series). In this case it can be faster to convert the original Series to one of type category and use .str.<method> or .dt.<property> on that. Setting values in a categorical column (or Series) works as long as the value is included in the categories:
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 0.25.3 › user_guide › categorical.html
Categorical data — pandas 0.25.3 documentation
Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.
Top answer
1 of 3
12

If your categorical columns are currently character/object you can use something like this to do each one:

char_cols = df.dtypes.pipe(lambda x: x[x == 'object']).index

for c in char_cols:
    df[c] = pd.factorize(df[c])[0]

If you need to be able to get back to the categories I'd create a dictionary to save the encoding; something like:

char_cols = df.dtypes.pipe(lambda x: x[x == 'object']).index
label_mapping = {}

for c in char_cols:
    df[c], label_mapping[c] = pd.factorize(df[c])

Using Julien's mcve will output:

In [3]: print(df)
Out[3]: 
    a   b   c   d
0   0   0   0   0.155463
1   1   1   1   0.496427
2   0   0   2   0.168625
3   2   0   1   0.209681
4   0   2   1   0.661857

In [4]: print(label_mapping)
Out[4]:
{'a': Index(['Var2', 'Var3', 'Var1'], dtype='object'),
 'b': Index(['Var2', 'Var1', 'Var3'], dtype='object'),
 'c': Index(['Var3', 'Var2', 'Var1'], dtype='object')}
2 of 3
7

First, let's create a mcve to play with:

import pandas as pd
import numpy as np

In [1]: categorical_array = np.random.choice(['Var1','Var2','Var3'],
                                             size=(5,3), p=[0.25,0.5,0.25])
        df = pd.DataFrame(categorical_array,
               columns=map(lambda x:chr(97+x), range(categorical_array.shape[1])))
        # Add another column that isn't categorical but float
        df['d'] = np.random.rand(len(df))
        print(df)

Out[1]:
      a     b     c         d
0  Var3  Var3  Var3  0.953153
1  Var1  Var2  Var1  0.924896
2  Var2  Var2  Var2  0.273205
3  Var2  Var1  Var3  0.459676
4  Var2  Var1  Var1  0.114358

Now we can use pd.get_dummies to encode the first three columns.

Note that I'm using the drop_firstparameter because N-1 dummies are sufficient to fully describe N possibilities (eg: if a_Var2 and a_Var3 are 0, then it's a_Var1). Also, I'm specifically specifying the columns but I don't have to as it will be columns with dtype either object or categorical (more below).

In [2]: df_encoded = pd.get_dummies(df, columns=['a','b', 'c'], drop_first=True)
        print(df_encoded]
Out[2]:
          d  a_Var2  a_Var3  b_Var2  b_Var3  c_Var2  c_Var3
0  0.953153       0       1       0       1       0       1
1  0.924896       0       0       1       0       0       0
2  0.273205       1       0       1       0       1       0
3  0.459676       1       0       0       0       0       1
4  0.114358       1       0       0       0       0       0

In your specific application, you'll have to provide a list of column that are Categorical, or you'll have to infer which columns are Categorical.

Best case scenario your dataframe already has these columns with a dtype=category and you can pass columns=df.columns[df.dtypes == 'category'] to get_dummies.

Otherwise I suggest setting the dtype of all other columns as appropriate (hint: pd.to_numeric, pd.to_datetime, etc) and you'll be left with columns that have an object dtype and these should be your categorical columns.

The pd.get_dummies parameter columns defaults as follows:

columns : list-like, default None
    Column names in the DataFrame to be encoded.
    If `columns` is None then all the columns with
    `object` or `category` dtype will be converted.
🌐
Bobby Hadz
bobbyhadz.com › blog › pandas-change-column-type-to-categorical
Pandas: Changing the column type to Categorical | bobbyhadz
Copied!import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = ['name', 'experience'] for column in columns: df[column] = df[column].astype('category') # name category # experience category # salary float64 # dtype: object print(df.dtypes) ... However, in recent Pandas versions, iterating over the columns collection is not necessary. You might also see examples online that use a lambda function. ... Copied!import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = ['name', 'experience'] df[columns] = df[columns].apply(lambda x: x.astype('category')) # name category # experience category # salary float64 # dtype: object print(df.dtypes)
🌐
Pandas
pandas.pydata.org › pandas-docs › stable › user_guide › categorical.html
Categorical data — pandas 3.0.1 documentation - PyData |
Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.
🌐
Skytowner
skytowner.com › explore › changing_column_type_to_categorical_in_pandas
Changing column type to categorical in Pandas
To change column type to categorical in Pandas, use the DataFrame's astype("category") method. ... Converts the data type of the columns of a DataFrame to the specified type.
🌐
Dask
docs.dask.org › en › stable › dataframe-categoricals.html
Categoricals — Dask documentation
>>> col_known = ddf.col.cat.as_known() # use for single column >>> col_known.cat.known True >>> ddf_known = ddf.categorize() # use for multiple columns >>> ddf_known.col.cat.known True · To convert a known categorical to an unknown categorical, there is also the .cat.as_unknown() method.
🌐
Practical Business Python
pbpython.com › categorical-encoding.html
Guide to Encoding Categorical Values in Python - Practical Business Python
Despite the different names, the basic strategy is to convert each category value into a new column and assigns a 1 or 0 (True/False) value to the column. This has the benefit of not weighting a value improperly but does have the downside of adding more columns to the data set. Pandas supports this feature using get_dummies.
🌐
Pandas
pandas.pydata.org › docs › dev › user_guide › categorical.html
Categorical data — pandas 3.0.0rc2+1.g0d0ec6f2c0 documentation
Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.
🌐
Medium
medium.com › @urvashilluniya › convert-multiple-categorical-columns-into-numeric-columns-in-single-line-of-code-577bab825635
Convert Multiple Categorical Data Columns to Numerical Data Columns using Dummy Variables | by Urvashi Jaitley | Medium
February 6, 2019 - One of the methods to create dummy variables involves following steps: 1) creating dummy variables for each of the columns, 2) concatenate the new columns to the main data frame, 3) drop corresponding categorical columns.
🌐
Medium
medium.com › analytics-vidhya › handling-categories-with-pandas-bfe7d28b2f91
Handling categories with pandas. While dealing with pandas DataFrames… | by Kacper Łukawski | Analytics Vidhya | Medium
May 31, 2021 - Internally, categorical values ... to handle. Pandas allows converting selected columns to categories easily, by using .astype method ......
🌐
Pandas
pandas.pydata.org › pandas-docs › version › 0.25.2 › user_guide › categorical.html
Categorical data — pandas 0.25.2 documentation
Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.