how to find percentage of categorical variable in python

pandas-percentage count of categorical variable

stackoverflow.com › questions › 53690279 › pandas-percentage-count-of-categorical-variable

personal favorite way:

df.column_name.value_counts() / len(df)

Gives a series with the column's values as the index and the proportions as the values.

Answer from Freestyle076 on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 53690279 › pandas-percentage-count-of-categorical-variable

python - pandas-percentage count of categorical variable - Stack Overflow

Top answer

1 of 4

personal favorite way:

df.column_name.value_counts() / len(df)

Gives a series with the column's values as the index and the proportions as the values.

2 of 4

This is a generalized solution which doesn't alter the table or does any kind of filtering or transformation before using groupby.

> s = df_test.groupby(['A'])['B'].value_counts(normalize=True)
> print(s)

A  B
a  Y    0.666667
   N    0.333333
b  N    0.500000
   Y    0.500000
Name: B, dtype: float64

Above variable s is a multi-index series and you can access any rows using .loc

> s.loc[:,'Y']
A
a    0.666667
b    0.500000
Name: B, dtype: float64

Similarly, you can access the details about 'N' using the same series.

> s.loc[:,'N']
A
a    0.333333
b    0.500000
Name: B, dtype: float64

PS: If you want to understand groupby better then try to decode this code which is exactly similar of above but only alters the column names and results differnetly.

> r = df_test.groupby(['B'])['A'].value_counts(normalize=True)
> print(r)
B  A
N  a    0.500000
   b    0.500000
Y  a    0.666667
   b    0.333333
Name: A, dtype: float64

and

> r.loc['Y',:]
B  A
Y  a    0.666667
   b    0.333333
Name: A, dtype: float64

Quora

quora.com › How-do-I-plot-the-distribution-of-a-categorical-variable-as-percentages-instead-of-numerical-counts-in-Python

How to plot the distribution of a categorical variable as percentages instead of numerical counts in Python - Quora

You need to convert that into distribution percentage. Suppose you have a series like this: Convert it into percentage freq: and then plot. You can use can use any type of plot for this.

Videos

01:46

YouTube

Summary statistics for categorical and non categorical variables ...

Count of Categorical Variables in a Pandas Dataframe - YouTube

December 21, 2019

14:32

YouTube

Python - How to get a Frequency Table (count) of a Categorical ...

March 8, 2020

View all

YouTube

youtube.com › learn 2 excel

Percentage of categorical variables python - YouTube

02:04

Published on Jan 05, 2019:In this video, we will learn to find the precentage of categorical variables in Python.In the previous video, we learnt to find a d...

Published January 5, 2020

Views 2K

Codecademy

codecademy.com › learn › stats-summary-statistics-for-categorical-data › modules › stats-summary-statistics-for-categorical-data › cheatsheet

Summary Statistics for Categorical Data: Summary Statistics for Categorical Data Cheatsheet | Codecademy

Proportions are often used to summarize categorical data and can be calculated by dividing individual frequencies by the total number of responses. In Python/pandas, df['column_name'].value_counts(normalize=True) will ignore missing data and ...

YouTube

youtube.com › codeinvite

calculate percentage of categorical variable in python - YouTube

03:00

Download this code from https://codegive.com Title: Calculating the Percentage of Categorical Variables in Python: A Step-by-Step TutorialIntroduction:Catego...

Published January 20, 2024

Views 29

Stack Overflow

stackoverflow.com › questions › 71466473 › how-to-get-the-percentage-of-categorical-variables-by-group

python - How to get the percentage of categorical variables by group? - Stack Overflow

Top answer

1 of 2

You can use GroupBy.value_counts with normalize=True, and reshaping:

(df
 .groupby(['A', 'B'])['C']
 .value_counts(normalize=True)
 .unstack('C', fill_value=0)
 .reset_index()
)

output:

C  A  B    0    1    2    3
0  x  i  0.5  0.5  0.0  0.0
1  x  j  0.0  1.0  0.0  0.0
2  y  j  0.5  0.0  0.5  0.0
3  y  k  0.0  0.0  0.0  1.0
4  z  k  0.5  0.0  0.0  0.5

2 of 2

You can use pd.crosstab, which is just a wrapper for groupby:

pd.crosstab([df['A'], df['B']], df['C'], normalize='index')

Output:

C      0    1    2    3
A B                    
x i  0.5  0.5  0.0  0.0
  j  0.0  1.0  0.0  0.0
y j  0.5  0.0  0.5  0.0
  k  0.0  0.0  0.0  1.0
z k  0.5  0.0  0.0  0.5

Stack Overflow

stackoverflow.com › questions › 72937604 › how-to-count-percentage-within-group-with-categorical-values

python - How to count percentage within group with categorical values? - Stack Overflow

Top answer

1 of 2

Check below code:

import pandas as pd

df = pd.DataFrame({'col1':[1,1,1,2,3,3],'col2':['b','a','a','a','a','b']})

df['perc'] = df.groupby(['col1','col2'])['col2'].transform('count')/df.groupby('col1')['col2'].transform('count')

df.round(2).drop_duplicates()

Output:

2 of 2

you also can do something like this:

res = df.groupby('id').value_counts(normalize=True).reset_index(name='perc')

print(res)
'''
   id value_type      perc
0   1          a  0.666667
1   1          b  0.333333
2   2          a  1.000000
3   3          a  0.500000
4   3          b  0.500000

Snakebear

snakebear.science › 07-Pandas › pandas-categorical.html

7.3. Working with Categorical Data — The Python and Pandas Field Guide

March 27, 2021 - Second, we can add an argument normalize that coverts frequency counts to percentages. By setting the normalize argument to the string 'index', we specify that we want values in each row converted to percentages of that row’s total.

Find elsewhere

Google Bing Mojeek

Stack Exchange

datascience.stackexchange.com › questions › 122522 › converting-categorical-to-the-percentage

data - Converting categorical to the percentage - Data Science Stack Exchange

Top answer

1 of 1

You can use the following code:

df['asset wellness percentage'] = (df['asset wellness'].value_counts(normalize = True)) * 100

The above code will count the number of Poor, Warning and Good from your data and then normalize will divide the counts by the total number of rows.

You can keep that which is normalized data and most of the models works well on such data (linear models) but if you still want the percentage then multiply it by 100.

Stack Overflow

stackoverflow.com › questions › 63394796 › calculate-percentage-of-categorical-column-using-conditional-groupby-and-count-i

pandas - Calculate percentage of categorical column using conditional groupby and count in Python - Stack Overflow

Top answer

1 of 2

groupby and apply transform to get the mean

df['num_true']=df.groupby('id').col1.transform('mean')



  id   col1  num_true
0   1   True      0.75
1   1   True      0.75
2   1  False      0.75
3   1   True      0.75
4   2  False      0.00
5   2  False      0.00

2 of 2

Here is the asked code:

import pandas as pd
df = pd.DataFrame({"col1": [True,True,False,True,False,False]}, index = [1,1,1,1,2,2])
grouped_df = df.groupby(df.index)
df["num_true"] = grouped_df.sum() / grouped_df.count()

What I did here is to group the dataframe by the index, After that, I sum the number of "True" values and divide it by the total number of values.

Result:

    col1    num_true
1   True    0.75
1   True    0.75
1   False   0.75
1   True    0.75
2   False   0.00
2   False   0.00

Stack Overflow

stackoverflow.com › questions › 60620351 › python-categorical-variable-bar-chart-with-percentages

matplotlib - Python - Categorical variable bar chart with percentages - Stack Overflow

Top answer

1 of 1

You're already giving the barplot function the dataframe, so your x and y arguments can be strings matching columns in your frame. You're correct that the barplot will yield percentage on the y-axis with that estimator, and you can pass any numerical column as a dummy to y:

sns.barplot(x="Gender", y="Age", data=ds, estimator=lambda x: len(x) / len(ds) *100)

GeeksforGeeks

geeksforgeeks.org › how-to-calculate-the-percentage-of-a-column-in-pandas

How to calculate the Percentage of a column in Pandas ? - GeeksforGeeks

September 29, 2023 - There are many methods to calculate the quantile, but pandas provide groupby.quantile() function to find it in a simple few lines of code. This is the Method to use when the desired quantile falls between two points.

Stack Overflow

stackoverflow.com › questions › 60253710 › calculate-categorical-data-as-a-percent-of-each-category-in-a-pandas-data-frame

python - Calculate Categorical Data as a Percent of Each Category in a Pandas Data Frame Using GroupBy - Stack Overflow

Top answer

1 of 1

Try:

df.groupby(["ZIP", "YEAR"])["COLOR"].value_counts(normalize=True).mul(100).unstack("COLOR").fillna(0)

Outputs:

COLOR           0      1
ZIP   YEAR
11111 1990   50.0   50.0
      2000    0.0  100.0
22222 1990  100.0    0.0
      2000   50.0   50.0

Stack Overflow

stackoverflow.com › questions › 64528883 › percentage-between-a-numerical-and-categorical-variable

python - Percentage between a numerical and categorical variable - Stack Overflow

Top answer

1 of 1

May be you can try with groupby and applying lambda to each group. Something like:

first apply groupby 'segment'
then for each group take the segment sum multiplied by 100
and divide by total sum of df

As below:

df.groupby('segment')['price'].apply(lambda g: sum(g)*100.0/df.price.sum())

Result:

segment
A    35.714286
B    40.476190
C    23.809524
Name: price, dtype: float64

Sublime Text 4 Using Bash in Sublime Text

rowannicholls.github.io › python › statistics › descriptive_statistics.html

Statistics in Python: Descriptive Statistics

# Calculate the percentages ct = (ct / n * 100).round(1) print(ct) ## Very negative Negative Neutral Compromisable Positive total ## False 5.0 12.5 12.5 9.2 11.7 50.8 ## True 11.7 10.0 5.0 10.8 11.7 49.2 ## total 16.7 22.5 17.5 20.0 23.3 100.0 · The best way to show two categorical variables ...

Spark By {Examples}

sparkbyexamples.com › home › pandas › pandas percentage total with groupby

Pandas Percentage Total With Groupby - Spark By {Examples}

December 2, 2024 - You can calculate the percentage of the total within each group using DataFrame.groupby() along with DataFrame.agg(), DataFrame.transform(), and DataFrame.apply() with lambda function.

Stack Overflow

stackoverflow.com › questions › 53556031 › convert-categorical-data-to-numeric-percentage-in-pandas

python - Convert Categorical data to numeric percentage in Pandas - Stack Overflow

Top answer

1 of 1

You want to make a cross-tabulation of two factors (col1 and col2) with the frequency normalized over each row. To do this you can use pd.crosstab() with normalize set to index:

>> df = pd.DataFrame({'col1': list('aaaaaabbb'), 'col2': list('xyxzzzxyx')})
>> pd.crosstab(df['col1'], df['col2'], normalize='index') * 100
col2    x           y           z
col1            
a       33.333333   16.666667   50.0
b       66.666667   33.333333   0.0

If you want to use multiple factors, just call crosstab with a list of factors:

>> df['col3'] = list('112231345')
>> pd.crosstab([df['col1'], df['col3']], df['col2'], normalize='index') * 100
        col2    x           y           z
col1    col3            
a       1       33.333333   33.333333   33.333333
        2       50.000000   0.000000    50.000000
        3       0.000000    0.000000    100.000000
b       3       100.000000  0.000000    0.000000
        4       0.000000    100.000000  0.000000
        5       100.000000  0.000000    0.000000

If you want to round up, just call round:

>> round(pd.crosstab(df['col1'], df['col2'], normalize='index') * 100, 2)
col2    x       y       z
col1            
a       33.33   16.67   50.0
b       66.67   33.33   0.0

DataCamp

datacamp.com › tutorial › categorical-data

Handling Machine Learning Categorical Data with Python Tutorial | DataCamp

February 23, 2023 - Let us examine them one by one: `value_counts()` is a function in the pandas library that returns the frequency of each unique value in a categorical data column. This function is useful when you want to get a quick understanding of the distribution of a categorical variable, such as the most ...

Thinking Neuron

thinkingneuron.com › home › how to visualize data distribution of a categorical variable in python

How to visualize data distribution of a categorical variable in Python - Thinking Neuron

September 21, 2020 - To make it useful, we can group ... this can be used for machine learning. This can be done in python using the replace() function of the pandas data frame....

Python.org

discuss.python.org › python help

Custom Discriptive Analysis - Python Help - Discussions on Python.org

March 11, 2024 - Hi, I am Barath I am new to Python, I have experience with R and SPSS but I am trying to automate the Basic descriptive analysis like Custom Table in SPSS where we get 2 * 2 table for categorical variables(n & %) and we can add the number of Independent variables in Row as given below.