convert multiple columns to categorical pandas

Python Pandas - Changing some column types to categories

stackoverflow.com › questions › 28910851 › python-pandas-changing-some-column-types-to-categories

Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
    public[col] = public[col].astype('category')

Answer from unutbu on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 28910851 › python-pandas-changing-some-column-types-to-categories

numpy - Python Pandas - Changing some column types to categories - Stack Overflow

Top answer

1 of 8

163

Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
    public[col] = public[col].astype('category')

2 of 8

83

No need for loops, Pandas can do it directly now, just pass a list of columns you want to convert and Pandas will convert them all.

cols = ['parks', 'playgrounds', 'sports', 'roading']
public[cols] = public[cols].astype('category')

df = pd.DataFrame({'a': ['a', 'b', 'c'], 'b': ['c', 'd', 'e']})

>>     a  b
>>  0  a  c
>>  1  b  d
>>  2  c  e

df.dtypes
>> a    object
>> b    object
>> dtype: object

df[df.columns] = df[df.columns].astype('category')
df.dtypes
>> a    category
>> b    category
>> dtype: object

DataScience Made Simple

datasciencemadesimple.com › home › convert column to categorical in pandas python

Convert column to categorical in pandas python - DataScience Made Simple

January 29, 2023 - Typecast column to categorical in pandas python using categorical() function · Convert column to categorical in pandas using astype() function

Discussions

python - Converting multiple columns to categories in Pandas. apply? - Stack Overflow

Consider a Dataframe. I want to convert a set of columns to_convert to categories. I can certainly do the following: for col in to_convert: df[col] = df[col].astype('category') but I was surpr... More on stackoverflow.com

stackoverflow.com

April 2, 2019

scikit learn - Mass convert categorical columns in Pandas (not one-hot encoding) - Data Science Stack Exchange

I have pandas dataframe with tons of categorical columns, which I am planning to use in decision tree with scikit-learn. I need to convert them to numerical values (not one hot vectors). I can do i... More on datascience.stackexchange.com

datascience.stackexchange.com

September 18, 2016

pandas - Python: Converting multiple columns to a single column with categorical data - Stack Overflow

If I have this table: City State Person 1 Person 2 Atlanta GA Bob Fred But, I want to convert it to: City State Person# Person Name Atlanta GA 1 Bob Atlanta GA 2 Fred What is the most efficient way... More on stackoverflow.com

stackoverflow.com

July 27, 2021

How to Transform Categorical Data to Numerical Data Using Pandas

Pd.get_dummies(data) More on reddit.com

r/learnmachinelearning

6

8

November 29, 2021

Videos

01:44

YouTube

Converting Multiple Columns to a Single Column with Categorical ...

Pandas: How to work with Categorical Data - YouTube

June 3, 2023

03:32

YouTube

Work with Categorical Data - Part 1 | #25 of 53: The Complete Pandas ...

May 18, 2022

View all

reddit.com › r/learnpython › in pandas, how do i transform a categorical column with a related numeric column into several numeric columns with the categories as headers?

r/learnpython on Reddit: In Pandas, how do I transform a categorical column with a related numeric column into several numeric columns with the categories as headers?

July 7, 2022 -

I'm working with sentiment data. I'm trying to find the most pythonic way to do a transformation like so:

Original Dataframe

Index	SENTIMENT	CONFIDENCE
0	Positive	.99
1	Negative	.98
2	Positive	.9
3	Neutral	.8

Converted to New Dataframe

Index	Positive	Negative	Neutral
0	.99	NaN	NaN
1	NaN	.98	NaN
2	.9	NaN	NaN
3	NaN	NaN	.8

I've been doing this with nested loops forever, and I just know that there's some one-line or two-line solution.

I appreciate the help.

Top answer

1 of 3

3

df.pivot(columns='SENTIMENT', values='CONFIDENCE')

2 of 3

2

You want dummy encoding (aka one-hot encoding). Use pandas.get_dummies . pd.get_dummies(df, columns=["SENTIMENT"])

IQCode

iqcode.com › code › python › pandas-convert-multiple-columns-to-categorical

pandas convert multiple columns to categorical Code Example

October 14, 2021 - #Two ways to do this df[['parks', 'playgrounds', 'sports']].apply(lambda x: x.astype('category')) cols = ['parks', 'playgrounds', 'sports',...

CodeSignal

codesignal.com › learn › courses › data-transformation-techniques-in-pandas › lessons › handling-categorical-data

Handling Categorical Data | CodeSignal Learn

Now let's convert the sex and class columns and reprint the DataFrame information. Notice how sex and class changed from object to category. This confirms the conversion was successful. This way, Pandas now treats these columns as categorical data, optimizing memory and performance.

DataScientYst

datascientyst.com › convert-column-to-categorical-pandas-dataframe-examples

How to Convert Column to Categorical in Pandas DataFrame with Examples

November 20, 2023 - In this article, we'll explore how to convert columns to categorical in a Pandas DataFrame with practical examples. In data analysis, efficient memory usage and improved performance are crucial considerations. Conversion column to categorical is simple as: df['col'].astype('category') Let's ...

Stack Overflow

stackoverflow.com › questions › 30991532 › converting-multiple-columns-to-categories-in-pandas-apply › 30991750

python - Converting multiple columns to categories in Pandas. apply? - Stack Overflow

Top answer

1 of 2

10

This was just fixed in master, and so will be in 0.17.0, see the issue here

In [7]: df = DataFrame({'A' : list('aabbcd'), 'B' : list('ffghhe')})

In [8]: df
Out[8]: 
   A  B
0  a  f
1  a  f
2  b  g
3  b  h
4  c  h
5  d  e

In [9]: df.dtypes
Out[9]: 
A    object
B    object
dtype: object

In [10]: df.apply(lambda x: x.astype('category'))       
Out[10]: 
   A  B
0  a  f
1  a  f
2  b  g
3  b  h
4  c  h
5  d  e

In [11]: df.apply(lambda x: x.astype('category')).dtypes
Out[11]: 
A    category
B    category
dtype: object

2 of 2

4

Note that since pandas 0.23.0 you no longer apply to convert multiple columns to categorical data types. Now you can simply do df[to_convert].astype('category') instead (where to_convert is a set of columns as defined in the question).

Find elsewhere

Google Bing Mojeek

Pandas

pandas.pydata.org › docs › user_guide › categorical.html

Categorical data — pandas 3.0.3 documentation - PyData |

This has some performance implication if you have a Series of type string, where lots of elements are repeated (i.e. the number of unique elements in the Series is a lot smaller than the length of the Series). In this case it can be faster to convert the original Series to one of type category and use .str.<method> or .dt.<property> on that. Setting values in a categorical column (or Series) works as long as the value is included in the categories:

Pandas

pandas.pydata.org › pandas-docs › version › 0.25.3 › user_guide › categorical.html

Categorical data — pandas 0.25.3 documentation

Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.

Stack Exchange

datascience.stackexchange.com › questions › 14069 › mass-convert-categorical-columns-in-pandas-not-one-hot-encoding

scikit learn - Mass convert categorical columns in Pandas (not one-hot encoding) - Data Science Stack Exchange

Top answer

1 of 3

12

If your categorical columns are currently character/object you can use something like this to do each one:

char_cols = df.dtypes.pipe(lambda x: x[x == 'object']).index

for c in char_cols:
    df[c] = pd.factorize(df[c])[0]

If you need to be able to get back to the categories I'd create a dictionary to save the encoding; something like:

char_cols = df.dtypes.pipe(lambda x: x[x == 'object']).index
label_mapping = {}

for c in char_cols:
    df[c], label_mapping[c] = pd.factorize(df[c])

Using Julien's mcve will output:

In [3]: print(df)
Out[3]: 
    a   b   c   d
0   0   0   0   0.155463
1   1   1   1   0.496427
2   0   0   2   0.168625
3   2   0   1   0.209681
4   0   2   1   0.661857

In [4]: print(label_mapping)
Out[4]:
{'a': Index(['Var2', 'Var3', 'Var1'], dtype='object'),
 'b': Index(['Var2', 'Var1', 'Var3'], dtype='object'),
 'c': Index(['Var3', 'Var2', 'Var1'], dtype='object')}

2 of 3

7

First, let's create a mcve to play with:

import pandas as pd
import numpy as np

In [1]: categorical_array = np.random.choice(['Var1','Var2','Var3'],
                                             size=(5,3), p=[0.25,0.5,0.25])
        df = pd.DataFrame(categorical_array,
               columns=map(lambda x:chr(97+x), range(categorical_array.shape[1])))
        # Add another column that isn't categorical but float
        df['d'] = np.random.rand(len(df))
        print(df)

Out[1]:
      a     b     c         d
0  Var3  Var3  Var3  0.953153
1  Var1  Var2  Var1  0.924896
2  Var2  Var2  Var2  0.273205
3  Var2  Var1  Var3  0.459676
4  Var2  Var1  Var1  0.114358

Now we can use pd.get_dummies to encode the first three columns.

Note that I'm using the drop_firstparameter because N-1 dummies are sufficient to fully describe N possibilities (eg: if a_Var2 and a_Var3 are 0, then it's a_Var1). Also, I'm specifically specifying the columns but I don't have to as it will be columns with dtype either object or categorical (more below).

In [2]: df_encoded = pd.get_dummies(df, columns=['a','b', 'c'], drop_first=True)
        print(df_encoded]
Out[2]:
          d  a_Var2  a_Var3  b_Var2  b_Var3  c_Var2  c_Var3
0  0.953153       0       1       0       1       0       1
1  0.924896       0       0       1       0       0       0
2  0.273205       1       0       1       0       1       0
3  0.459676       1       0       0       0       0       1
4  0.114358       1       0       0       0       0       0

In your specific application, you'll have to provide a list of column that are Categorical, or you'll have to infer which columns are Categorical.

Best case scenario your dataframe already has these columns with a dtype=category and you can pass columns=df.columns[df.dtypes == 'category'] to get_dummies.

Otherwise I suggest setting the dtype of all other columns as appropriate (hint: pd.to_numeric, pd.to_datetime, etc) and you'll be left with columns that have an object dtype and these should be your categorical columns.

The pd.get_dummies parameter columns defaults as follows:

columns : list-like, default None
    Column names in the DataFrame to be encoded.
    If `columns` is None then all the columns with
    `object` or `category` dtype will be converted.

Bobby Hadz

bobbyhadz.com › blog › pandas-change-column-type-to-categorical

Pandas: Changing the column type to Categorical | bobbyhadz

Copied!import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = ['name', 'experience'] for column in columns: df[column] = df[column].astype('category') # name category # experience category # salary float64 # dtype: object print(df.dtypes) ... However, in recent Pandas versions, iterating over the columns collection is not necessary. You might also see examples online that use a lambda function. ... Copied!import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'experience': [1, 5, 3, 8], 'salary': [189.1, 180.2, 190.3, 205.4], }) columns = ['name', 'experience'] df[columns] = df[columns].apply(lambda x: x.astype('category')) # name category # experience category # salary float64 # dtype: object print(df.dtypes)

Pandas

pandas.pydata.org › pandas-docs › stable › user_guide › categorical.html

Categorical data — pandas 3.0.1 documentation - PyData |

Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.

Skytowner

skytowner.com › explore › changing_column_type_to_categorical_in_pandas

Changing column type to categorical in Pandas

To change column type to categorical in Pandas, use the DataFrame's astype("category") method. ... Converts the data type of the columns of a DataFrame to the specified type.

Stack Overflow

stackoverflow.com › questions › 68547480 › python-converting-multiple-columns-to-a-single-column-with-categorical-data

pandas - Python: Converting multiple columns to a single column with categorical data - Stack Overflow

Top answer

1 of 1

4

Use melt:

out = df.melt(['City', 'State'], var_name='Person#', value_name='Person Name')
out['Person#'] = out['Person#'].str.extract('(\d+)')

>>> out
       City State Person# Person Name
0  Atlanta    GA        1        Bob 
1  Atlanta    GA        2        Fred

Dask

docs.dask.org › en › stable › dataframe-categoricals.html

Categoricals — Dask documentation

>>> col_known = ddf.col.cat.as_known() # use for single column >>> col_known.cat.known True >>> ddf_known = ddf.categorize() # use for multiple columns >>> ddf_known.col.cat.known True · To convert a known categorical to an unknown categorical, there is also the .cat.as_unknown() method.

Practical Business Python

pbpython.com › categorical-encoding.html

Guide to Encoding Categorical Values in Python - Practical Business Python

Despite the different names, the basic strategy is to convert each category value into a new column and assigns a 1 or 0 (True/False) value to the column. This has the benefit of not weighting a value improperly but does have the downside of adding more columns to the data set. Pandas supports this feature using get_dummies.

Pandas

pandas.pydata.org › docs › dev › user_guide › categorical.html

Categorical data — pandas 3.0.0rc2+1.g0d0ec6f2c0 documentation

Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.

Medium

medium.com › @urvashilluniya › convert-multiple-categorical-columns-into-numeric-columns-in-single-line-of-code-577bab825635

Convert Multiple Categorical Data Columns to Numerical Data Columns using Dummy Variables | by Urvashi Jaitley | Medium

February 6, 2019 - One of the methods to create dummy variables involves following steps: 1) creating dummy variables for each of the columns, 2) concatenate the new columns to the main data frame, 3) drop corresponding categorical columns.

Medium

medium.com › analytics-vidhya › handling-categories-with-pandas-bfe7d28b2f91

Handling categories with pandas. While dealing with pandas DataFrames… | by Kacper Łukawski | Analytics Vidhya | Medium

May 31, 2021 - Internally, categorical values ... to handle. Pandas allows converting selected columns to categories easily, by using .astype method ......

Pandas

pandas.pydata.org › pandas-docs › version › 0.25.2 › user_guide › categorical.html

Categorical data — pandas 0.25.2 documentation

Writing to a CSV file will convert the data, effectively removing any information about the categorical (categories and ordering). So if you read back the CSV file you have to convert the relevant columns back to category and assign the right categories and categories ordering.