Here is one way (basically melt the DF, then pivot)

# First combine price and units into a map column
mydf = mydf.withColumn("price_units", F.create_map(F.lit("price"), "price", F.lit("units"), "units"))

# Now explode to get a melted dataframe
mydf = mydf.select("id", "day", F.explode("price_units").alias("name", "value"))

+---+---+-----+-----+
| id|day| name|value|
+---+---+-----+-----+
|100|  1|price|   23|
|100|  1|units|   10|
|100|  2|price|   45|
|100|  2|units|   11|
|100|  3|price|   67|
etc

# Then pivot
mydf.groupby("id", "name").pivot("day").agg(F.mean("value")).show()

+---+-----+----+----+----+----+
| id| name|   1|   2|   3|   4|
+---+-----+----+----+----+----+
|100|price|23.0|45.0|67.0|78.0|
|101|price|23.0|45.0|67.0|78.0|
|102|units|10.0|11.0|16.0|18.0|
|100|units|10.0|11.0|12.0|13.0|
|101|units|10.0|13.0|14.0|15.0|
|102|price|23.0|45.0|67.0|78.0|
+---+-----+----+----+----+----+
Answer from ags29 on Stack Overflow
🌐
Palantir
palantir.com › docs › foundry › transforms-python-spark › pyspark-aggregation
Python (Spark) • PySpark reference • Aggregation and pivot tables • Palantir
By default aggregations produce columns of the form aggregation_name(target_column). However, column names in Foundry cannot contain parentheses or other non-alphanumeric characters. Alias each aggregation to a specific name instead. Pivot tables in PySpark work very similarly to ordinary grouped ...
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark 4.1.1 documentation
>>> df1.groupBy("year").pivot("course").sum("earnings").sort("year").show() +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+
🌐
Medium
medium.com › @dhanashrisaner.30 › advanced-aggregations-and-grouping-in-pyspark-89ee7c9dcd6d
Advanced Aggregations and Grouping in PySpark | by Dhanashri Saner | Medium
November 14, 2024 - Efficient aggregation and grouping in PySpark allow data engineers to quickly analyze and summarize large datasets. Techniques like groupBy, pivot, and rollup empower you to produce detailed, multi-level reports that provide valuable insights, especially when working with large-scale data in ...
🌐
Spark Playground
sparkplayground.com › tutorials › pyspark › pivoting-data
How to Pivot Data in PySpark - Spark Playground
You can also apply more complex aggregation(s) via .agg() after .pivot(). For example: from pyspark.sql.functions import sum, avg agg_df = df.groupBy("Product") \ .pivot("Region") \ .agg( sum("Sales").alias("total_sales"), avg("Sales").alias("avg_sales") ) agg_df.show()
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-pyspark-pivot-function
Python PySpark pivot() Function - GeeksforGeeks
July 26, 2025 - And for Product A, the sales from two entries in the East region have been averaged ((100 + 50)/2 = 75.0). For Product B, the sales in the West region have also been aggregated ((250 + 100)/2 = 175.0). The pivot() function in PySpark is a powerful tool for transforming data.
🌐
Apache
downloads.apache.org › spark › docs › 3.4.0 › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark 3.4.0 documentation
>>> df1.groupBy("year").pivot("course").sum("earnings").show() +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+ >>> df2.groupBy("sales.year").pivot("sales.course").sum("sales.earnings").show() ... +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+ pyspark.sql.GroupedData.min pyspark.sql.GroupedData.sum
Find elsewhere
🌐
DataCamp
datacamp.com › fr › tutorial › pyspark-groupby
Mastering PySpark’s groupBy for Scalable Data Aggregation | DataCamp
July 16, 2025 - The benefit of PySpark is the ability to use some more advanced aggregation patterns, such as pivoting your data, rolling up, and creating data cubes.
🌐
Databricks
api-docs.databricks.com › python › pyspark › latest › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark master documentation
Pivots a column of the current DataFrame and perform the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs ...
🌐
spark-workshop
jaceklaskowski.github.io › spark-workshop › exercises › spark-sql-exercise-Pivoting-on-Multiple-Columns.html
Exercise: Pivoting on Multiple Columns | spark-workshop
Write a structured query that pivots a dataset on multiple columns. Since pivot aggregation allows for a single column only, find a solution to pivot on two or more columns.
🌐
Stack Overflow
stackoverflow.com › questions › 41832734 › pyspark-1-6-aliasing-columns-after-pivoting-with-multiple-aggregates
python 2.7 - Pyspark 1.6 - Aliasing columns after pivoting with multiple aggregates - Stack Overflow
This is happening because the column you are pivoting on doesn't have distinct values. This results in duplicate column names when the pivot occurs so spark gives it those column names to make them distinct.
🌐
Databricks
forums.databricks.com › questions › 1574 › agg-function-not-working-for-multiple-aggregations.html
Solved: agg function not working for multiple aggregations - Databricks Community - 30314
November 12, 2016 - Data has 2 columns: |requestDate|requestDuration| | 2015-06-17| 104| Here is the code: avgSaveTimesByDate = gridSaves.groupBy(gridSaves.requestDate).agg({"requestDuration": "min", "requestDuration": "max","requestDuration": "avg"}) avgSaveTimesByDate.show(100) Summary of Issue I expect 4 columns of ...
🌐
Svds
svds.com › pivoting-data-in-sparksql
Pivoting Data in SparkSQL - Silicon Valley Data Science
September 21, 2017 - >>> df.groupBy('class').pivot('year', [1999, 2008]).avg('hwy').show() Finally, just like a normal group by we can use multiple aggregations: >>> from pyspark.sql import functions as F >>> df.groupBy('class').pivot('year', [1999, 2008]).agg(F.min(df.hwy), F.max(df.hwy)).show()
🌐
Cloudera Community
community.cloudera.com › t5 › Support-Questions › How-do-I-run-multiple-pivots-on-a-Spark-DataFrame › td-p › 143712
How do I run multiple pivots on a Spark DataFrame? - Cloudera Community - 143712
June 20, 2016 - import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.functions.udf object pivotDF { // Define the application Name val AppName: String = "pivotDF" // Set the logging level Logger.getLogger("org.apache").setLevel(Level.ERROR) // Define a udf to concatenate two passed in string values val concat = udf( (first: String, second: String) => { first + " " + second } ) def main (args: Array[String]) { // define the input parameters val input_file = "/Users/gangadharkadam/myapps/pivot/src/main/
🌐
Hendra-herviawan
hendra-herviawan.github.io › pyspark-groupby-and-aggregate-functions.html
Pyspark: GroupBy and Aggregate Functions | M Hendra Herviawan
June 18, 2017 - A set of methods for aggregations on a DataFrame: agg · avg · count · max · mean · min · pivot · sum · df.groupBy('Company') <pyspark.sql.group.GroupedData at 0x7f532c65eba8> This returns a GroupedData object, off of which you can all various methods ·
🌐
Apache JIRA
issues.apache.org › jira › browse › SPARK-17458
[SPARK-17458] Alias specified for aggregates in a pivot are not honored - ASF JIRA
January 15, 2017 - When using pivot and multiple aggregations we need to alias to avoid special characters, but alias does not help because
🌐
Datumo
datumo.io › blog › spark-danger-pivot-is-an-action
Spark danger: pivot is an action! Hidden action in Apache Spark function | Blog | Datumo
May 20, 2024 - Code 2. Aggregation of transport dataset · Table 2. Example dataset after group by operation. Nonetheless, there are instances when we may prefer to pivot a single column. This transformation would transpose unique values from that particular column into new columns within an output DataFrame.
🌐
Spark Code Hub
sparkcodehub.com › pyspark › dataframe › pivot
Pivot Operation in PySpark DataFrames
Spark Code Hub.com is Free Online Tutorials Website Providing courses in Spark, PySpark, Python, SQL, Angular, Data Warehouse, ReactJS, Java, Git, Algorithms, Data Structure, and Interview Questions with Examples
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › trying-to-use-pivot-function-with-pyspark-for-count-aggregate › td-p › 50758
Trying to use pivot function with pyspark for count aggregate
November 10, 2023 - from pyspark.sql import functions as F testDF = (eventsDF .groupBy("user_id") .pivot("event_name") .agg(F.count("event_name")))