pyspark pivot multiple columns - Brave Search

How to pivot on multiple columns in Spark SQL?

stackoverflow.com › questions › 45035940 › how-to-pivot-on-multiple-columns-in-spark-sql

Here's a non-UDF way involving a single pivot (hence, just a single column scan to identify all the unique dates).

dff = mydf.groupBy('id').pivot('day').agg(F.first('price').alias('price'),F.first('units').alias('unit'))

Here's the result (apologies for the non-matching ordering and naming):

+---+-------+------+-------+------+-------+------+-------+------+               
| id|1_price|1_unit|2_price|2_unit|3_price|3_unit|4_price|4_unit|
+---+-------+------+-------+------+-------+------+-------+------+
|100|     23|    10|     45|    11|     67|    12|     78|    13|
|101|     23|    10|     45|    13|     67|    14|     78|    15|
|102|     23|    10|     45|    11|     67|    16|     78|    18|
+---+-------+------+-------+------+-------+------+-------+------+

We just aggregate both on the price and the unit column after pivoting on the day.

If naming required as in question,

dff.select([F.col(c).name('_'.join(x for x in c.split('_')[::-1])) for c in dff.columns]).show()

+---+-------+------+-------+------+-------+------+-------+------+
| id|price_1|unit_1|price_2|unit_2|price_3|unit_3|price_4|unit_4|
+---+-------+------+-------+------+-------+------+-------+------+
|100|     23|    10|     45|    11|     67|    12|     78|    13|
|101|     23|    10|     45|    13|     67|    14|     78|    15|
|102|     23|    10|     45|    11|     67|    16|     78|    18|
+---+-------+------+-------+------+-------+------+-------+------+

Answer from Jedi on Stack Overflow

stackoverflow.com › questions › 45035940 › how-to-pivot-on-multiple-columns-in-spark-sql

python - How to pivot on multiple columns in Spark SQL? - Stack Overflow

Here's a non-UDF way involving a single pivot (hence, just a single column scan to identify all the unique dates).

dff = mydf.groupBy('id').pivot('day').agg(F.first('price').alias('price'),F.first('units').alias('unit'))

Here's the result (apologies for the non-matching ordering and naming):

+---+-------+------+-------+------+-------+------+-------+------+               
| id|1_price|1_unit|2_price|2_unit|3_price|3_unit|4_price|4_unit|
+---+-------+------+-------+------+-------+------+-------+------+
|100|     23|    10|     45|    11|     67|    12|     78|    13|
|101|     23|    10|     45|    13|     67|    14|     78|    15|
|102|     23|    10|     45|    11|     67|    16|     78|    18|
+---+-------+------+-------+------+-------+------+-------+------+

We just aggregate both on the price and the unit column after pivoting on the day.

If naming required as in question,

dff.select([F.col(c).name('_'.join(x for x in c.split('_')[::-1])) for c in dff.columns]).show()

+---+-------+------+-------+------+-------+------+-------+------+
| id|price_1|unit_1|price_2|unit_2|price_3|unit_3|price_4|unit_4|
+---+-------+------+-------+------+-------+------+-------+------+
|100|     23|    10|     45|    11|     67|    12|     78|    13|
|101|     23|    10|     45|    13|     67|    14|     78|    15|
|102|     23|    10|     45|    11|     67|    16|     78|    18|
+---+-------+------+-------+------+-------+------+-------+------+

The solution in the question is the best I could get. The only improvement would be to cache the input dataset to avoid double scan, i.e.

mydf.cache
pivot_udf(mydf,'price','units').show()

Spark By {Examples}

sparkbyexamples.com › home › pyspark › pyspark pivot and unpivot dataframe

PySpark Pivot and Unpivot DataFrame - Spark By {Examples}

October 10, 2025 - PySpark pivot() function is used to rotate/transpose the data from one column into multiple Dataframe columns and back using unpivot(). Pivot() It is an

Videos

How to pivot multiple columns in power query - YouTube

pivot() function in PySpark |Pyspark Interview question - YouTube

8. Solve Using Pivot and Explode Multiple columns |Top 10 PySpark ...

December 29, 2022

31. pivot() function in PySpark - YouTube

December 8, 2022

21. pivot and unpivot in pyspark | pyspark tutorial - YouTube

spark.apache.org › docs › latest › sql-ref-syntax-qry-select-pivot.html

PIVOT Clause - Spark 4.1.1 Documentation

Specifies new columns, which are used to match values in column_list as the aggregating condition. We can also add aliases for them. CREATE TABLE person (id INT, name STRING, age INT, class INT, address STRING); INSERT INTO person VALUES (100, 'John', 30, 1, 'Street 1'), (200, 'Mary', NULL, 1, 'Street 2'), (300, 'Mike', 80, 3, 'Street 3'), (400, 'Dan', 50, 4, 'Street 4'); SELECT * FROM person PIVOT ( SUM(age) AS a, AVG(class) AS c FOR name IN ('John' AS john, 'Mike' AS mike) ); +------+-----------+---------+---------+---------+---------+ | id | address | john_a | john_c | mike_a | mike_c | +--

spark.apache.org › docs › latest › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html

pyspark.sql.GroupedData.pivot — PySpark 4.1.1 documentation

Compute the sum of earnings for each year by course with each course as a separate column · >>> df1.groupBy("year").pivot( ... "course", ["dotNET", "Java"] ...

geeksforgeeks.org › python › python-pyspark-pivot-function

Python PySpark pivot() Function - GeeksforGeeks

July 26, 2025 - The pivot() function in PySpark is a powerful method used to reshape a DataFrame by transforming unique values from one column into multiple columns in a new DataFrame, while aggregating data in the process.

educba.com › home › software development › software development tutorials › python tutorial › pyspark pivot

PySpark pivot | Working and example of PIVOT in PySpark

April 11, 2023 - The PySpark pivot is used for the rotation of data from one Data Frame column into multiple columns. It is an aggregation function that is used for the rotation of data from one column to multiple columns in PySpark.

Call +917738666252

Address Unit no. 202, Jay Antariksh Bldg, Makwana Road, Marol, Andheri (East),, 400059, Mumbai

jaceklaskowski.github.io › spark-workshop › exercises › spark-sql-exercise-Pivoting-on-Multiple-Columns.html

Exercise: Pivoting on Multiple Columns | spark-workshop

Write a structured query that pivots a dataset on multiple columns.

Find elsewhere

Google Bing Mojeek

databricks.com › blog › 2016 › 02 › 09 › reshaping-data-with-pivot-in-apache-spark.html

Reshaping Data with Pivot in Apache Spark | Databricks Blog

February 9, 2016 - Although the syntax only allows pivoting on one column, you can combine columns to get the same result as pivoting multiple columns.

medium.com › towards-data-engineering › efficient-data-processing-with-pysparks-pivot-and-stack-functions-in-databricks-fa97261ec430

Efficient Data Processing with PySpark’s Pivot and Stack Functions in Databricks | by Naveen Sorout | Towards Data Engineering | Medium

December 29, 2023 - The “pivot()” function works by aggregating data based on one or more columns, and then transposing the values of one of those columns into separate columns in the DataFrame. This helps to organize and analyze the data in a more efficient way.

Machine Learning Plus

machinelearningplus.com › blog › pyspark pivot – a detailed guide harnessing the power of pyspark pivot

PySpark Pivot - A Detailed Guide Harnessing the Power of PySpark Pivot - machinelearningplus

April 19, 2023 - As shown above, we have successfully pivoted the data by region, displaying the revenue for each quarter in separate columns for the US and EU regions. ... To unpivot a DataFrame (i.e., convert it from wide format to long format), you can use the stack function with a combination of select and expr ... # Method:1 Unpivot PySpark DataFrame from pyspark.sql.functions import expr unpivotExpr = "stack(2, 'EU',EU, 'US', US) as (region,revenue)" unPivotDF = pivot_df.select("year","quarter", expr(unpivotExpr)).where("revenue is not null") unPivotDF.show()

sparkcodehub.com › pyspark › dataframe › pivot

Pivot Operation in PySpark DataFrames

Spark Code Hub.com is Free Online Tutorials Website Providing courses in Spark, PySpark, Python, SQL, Angular, Data Warehouse, ReactJS, Java, Git, Algorithms, Data Structure, and Interview Questions with Examples

Cloudera Community

community.cloudera.com › t5 › Support-Questions › How-do-I-run-multiple-pivots-on-a-Spark-DataFrame › td-p › 143712

How do I run multiple pivots on a Spark DataFrame? - Cloudera Community - 143712

June 20, 2016 - import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.functions.udf object pivotDF { // Define the application Name val AppName: String = "pivotDF" // Set the logging level Logger.getLogger("org.apache").setLevel(Level.ERROR) // Define a udf to concatenate two passed in string values val concat = udf( (first: String, second: String) => { first + " " + second } ) def main (args: Array[String]) { // define the input parameters val input_file = "/Users/gangadharkadam/myapps/pivot/src/main/

Spark By {Examples}

sparkbyexamples.com › home › apache spark › how to pivot and unpivot a spark data frame

How to Pivot and Unpivot a Spark Data Frame - Spark By {Examples}

April 25, 2024 - This article describes and provides scala example on how to Pivot Spark DataFrame ( creating Pivot tables ) and Unpivot back. Pivoting is used to rotate the data from one column into multiple columns. It is an aggregation where one of the grouping columns values transposed into individual columns with distinct data.

community.databricks.com › t5 › get-started-discussions › pivot-on-multiple-columns › td-p › 54092

Pivot on multiple columns - Databricks Community - 54092

November 29, 2023 - I want to pass multiple column as argument to pivot a dataframe in pyspark pivot like mydf.groupBy("id").pivot("day","city").agg(F.sum("price").alias("price"),F.sum("units").alias("units")).show(). One way I found is to create multiple df with different pivot and join them which will result in m...

medium.com › @uzzaman.ahmed › exploring-the-capabilities-and-limitations-of-pysparks-pivot-function-6a8eae09338b

Exploring the Capabilities and Limitations of PySpark’s Pivot Function | by Ahmed Uz Zaman | Medium

February 16, 2023 - As you can see, the pivot() function has created two new columns ("Math" and "Science") and filled them with the average values of the "Score" column for each group. This wide format is often easier to work with when performing further analysis ...

favtutor.com › blogs › pyspark-pivot-dataframe

PySpark pivot() DataFrame Function (Working & Example)

September 26, 2023 - PySpark provides a straightforward way to perform pivot operations using the pivot method on a DataFrame. ... pivot_col: This is the column whose unique values will become the new columns in the pivoted DataFrame. values_col: This column contains the values that you want to spread across the newly created columns. agg_func: The aggregation function you want to apply when there are multiple values for the same combination of aggregation and pivot columns.

stackoverflow.com › questions › 63285181 › pyspark-pivot-on-multiple-column-names

python - Pyspark pivot on multiple column names - Stack Overflow

Pivot is an expensive shuffle operation and should be avoided if possible. Try using this logic with arrays_zip and explode to dynamically collapse columns and groupby-aggregate.

from pyspark.sql import functions as F   

df.withColumn("cols", F.explode(F.arrays_zip(F.array([F.array(F.col(x),F.lit(x))\
                                                    for x in df.columns if x!='id']))))\
  .withColumn("name", F.col("cols.0")[1]).withColumn("val", F.col("cols.0")[0]).drop("cols")\
  .groupBy("name").agg(F.count(F.when(F.col("val")=='diff',1)).alias("diff"),\
                       F.count(F.when(F.col("val")=='same',1)).alias("same")).orderBy("name").show()

#+----+----+----+
#|name|diff|same|
#+----+----+----+
#|  c1|   2|   2|
#|  c2|   0|   4|
#|  c3|   1|   3|
#+----+----+----+

You can also do this by exploding a map_type by creating a map dynamically.

from pyspark.sql import functions as F
from itertools import chain

df.withColumn("cols", F.create_map(*(chain(*[(F.lit(name), F.col(name))\
                                  for name in df.columns if name!='id']))))\
  .select(F.explode("cols").alias("name","val"))\
  .groupBy("name").agg(F.count(F.when(F.col("val")=='diff',1)).alias("diff"),\
                       F.count(F.when(F.col("val")=='same',1)).alias("same")).orderBy("name").show()

#+----+----+----+
#|name|diff|same|
#+----+----+----+
#|  c1|   2|   2|
#|  c2|   0|   4|
#|  c3|   1|   3|
#+----+----+----+

from pyspark.sql.functions import *
df = spark.createDataFrame([(1,'diff','same','diff'),(2,'same','same','same'),(3,'diff','same','same'),(4,'same','same','same')],['idcol','C1','C2','C3'])
df.createOrReplaceTempView("MyTable")
#spark.sql("select * from MyTable").collect()
x1=spark.sql("select idcol, 'C1' AS col, C1 from MyTable union all select idcol, 'C2' AS col, C2 from MyTable  union all select idcol, 'C3' AS col, C3 from MyTable")
#display(x1)
x2=x1.groupBy('col').pivot('C1').agg(count('C1')).orderBy('col')
display(x2)

linuxhint.com › pyspark-pivot

PySpark Pivot()

Linux Hint LLC, [email protected] 1210 Kelly Park Circle, Morgan Hill, CA 95037 Privacy Policy and Terms of Use

statology.org › home › how to create a pivot table in pyspark (with example)

How to Create a Pivot Table in PySpark (With Example)

October 26, 2023 - We can use the following syntax to create a pivot table using team as the rows, position as the columns and the sum of points as the values within the pivot table:

spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.DataFrame.pivot_table.html

pyspark.pandas.DataFrame.pivot_table — PySpark 4.1.1 documentation

>>> df = ps.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", ... "bar", "bar", "bar", "bar"], ... "B": ["one", "one", "one", "two", "two", ... "one", "one", "two", "two"], ... "C": ["small", "large", "large", "small", ... "small", "large", "small", "small", ... "large"], ... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7], ... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]}, ... columns=['A', 'B', 'C', 'D', 'E']) >>> df A B C D E 0 foo one small 1 2 1 foo one large 2 4 2 foo one large 2 5 3 foo two small 3 5 4 foo two small 3 6 5 bar one large 4 6 6 bar one small 5 8 7 bar two small 6 9 8 bar two large 7 9 · This first example aggregates values by taking the sum. >>> table = df.pivot_table(values='D', index=['A', 'B'], ...