Here is one way (basically melt the DF, then pivot)

# First combine price and units into a map column
mydf = mydf.withColumn("price_units", F.create_map(F.lit("price"), "price", F.lit("units"), "units"))

# Now explode to get a melted dataframe
mydf = mydf.select("id", "day", F.explode("price_units").alias("name", "value"))

+---+---+-----+-----+
| id|day| name|value|
+---+---+-----+-----+
|100|  1|price|   23|
|100|  1|units|   10|
|100|  2|price|   45|
|100|  2|units|   11|
|100|  3|price|   67|
etc

# Then pivot
mydf.groupby("id", "name").pivot("day").agg(F.mean("value")).show()

+---+-----+----+----+----+----+
| id| name|   1|   2|   3|   4|
+---+-----+----+----+----+----+
|100|price|23.0|45.0|67.0|78.0|
|101|price|23.0|45.0|67.0|78.0|
|102|units|10.0|11.0|16.0|18.0|
|100|units|10.0|11.0|12.0|13.0|
|101|units|10.0|13.0|14.0|15.0|
|102|price|23.0|45.0|67.0|78.0|
+---+-----+----+----+----+----+
Answer from ags29 on Stack Overflow
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark 4.1.1 documentation
>>> df1.groupBy("year").pivot("course").sum("earnings").sort("year").show() +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+
🌐
Palantir
palantir.com › docs › foundry › transforms-python-spark › pyspark-aggregation
Python (Spark) • PySpark reference • Aggregation and pivot tables • Palantir
By default aggregations produce columns of the form aggregation_name(target_column). However, column names in Foundry cannot contain parentheses or other non-alphanumeric characters. Alias each aggregation to a specific name instead. Pivot tables in PySpark work very similarly to ordinary grouped ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-pyspark-pivot-function
Python PySpark pivot() Function - GeeksforGeeks
July 26, 2025 - And for Product A, the sales from two entries in the East region have been averaged ((100 + 50)/2 = 75.0). For Product B, the sales in the West region have also been aggregated ((250 + 100)/2 = 175.0). The pivot() function in PySpark is a powerful tool for transforming data.
🌐
Spark Playground
sparkplayground.com › tutorials › pyspark › pivoting-data
How to Pivot Data in PySpark - Spark Playground
You can also apply more complex aggregation(s) via .agg() after .pivot(). For example: from pyspark.sql.functions import sum, avg agg_df = df.groupBy("Product") \ .pivot("Region") \ .agg( sum("Sales").alias("total_sales"), avg("Sales").alias("avg_sales") ) agg_df.show()
🌐
Databricks
api-docs.databricks.com › python › pyspark › latest › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark master documentation
Pivots a column of the current DataFrame and perform the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs ...
🌐
Medium
medium.com › @dhanashrisaner.30 › advanced-aggregations-and-grouping-in-pyspark-89ee7c9dcd6d
Advanced Aggregations and Grouping in PySpark | by Dhanashri Saner | Medium
November 14, 2024 - Efficient aggregation and grouping in PySpark allow data engineers to quickly analyze and summarize large datasets. Techniques like groupBy, pivot, and rollup empower you to produce detailed, multi-level reports that provide valuable insights, especially when working with large-scale data in ...
🌐
Apache
downloads.apache.org › spark › docs › 3.4.0 › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark 3.4.0 documentation
>>> df1.groupBy("year").pivot("course").sum("earnings").show() +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+ >>> df2.groupBy("sales.year").pivot("sales.course").sum("sales.earnings").show() ... +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+ pyspark.sql.GroupedData.min pyspark.sql.GroupedData.sum
Find elsewhere
🌐
Databricks
databricks.com › blog › 2016 › 02 › 09 › reshaping-data-with-pivot-in-apache-spark.html
Reshaping Data with Pivot in Apache Spark | Databricks Blog
February 9, 2016 - Pivot, just like normal aggregations, supports multiple aggregate expressions, just pass multiple arguments to the agg method.
🌐
Best-practice-and-impact
best-practice-and-impact.github.io › ons-spark › spark-functions › pivot-tables.html
Pivot tables in Spark — Spark at the ONS
sdf_pivot() is quite awkward with multiple aggregations on the same column. fun.aggregate can take a named list, but only one aggregation can be applied to each column. As we want to get the sum and max of total_cost, we can create another column, total_cost_copy, and aggregate on this.
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.DataFrame.pivot_table.html
pyspark.pandas.DataFrame.pivot_table — PySpark 4.1.1 documentation
>>> table = df.pivot_table(values='D', index=['A', 'B'], ... columns='C', aggfunc='sum', fill_value=0) >>> table.sort_index() C large small A B bar one 4 5 two 7 6 foo one 4 1 two 0 6 · We can also calculate multiple types of aggregations for any given value column.
🌐
spark-workshop
jaceklaskowski.github.io › spark-workshop › exercises › spark-sql-exercise-Pivoting-on-Multiple-Columns.html
Exercise: Pivoting on Multiple Columns | spark-workshop
Write a structured query that pivots a dataset on multiple columns. Since pivot aggregation allows for a single column only, find a solution to pivot on two or more columns.
🌐
Medium
medium.com › @shubham.shardul2019 › chapter-4-pyspark-advanced-aggregations-pivoting-conditional-logic-and-joins-924ef5d7b82a
Chapter 4: PySpark — Advanced Aggregations, Pivoting, Conditional Logic, and Joins | by Shubham Shardul | Medium
March 24, 2025 - You learned how to: Use collect_list to aggregate related values into a single column. Pivot rows into columns for multi-dimensional analysis. Apply conditional logic with when/otherwise to create new classification flags.
🌐
Spark By {Examples}
sparkbyexamples.com › home › pyspark › pyspark pivot and unpivot dataframe
PySpark Pivot and Unpivot DataFrame - Spark By {Examples}
October 10, 2025 - PySpark pivot() function is used to rotate/transpose the data from one column into multiple Dataframe columns and back using unpivot(). Pivot() It is an
🌐
Machine Learning Plus
machinelearningplus.com › blog › pyspark pivot – a detailed guide harnessing the power of pyspark pivot
PySpark Pivot - A Detailed Guide Harnessing the Power of PySpark Pivot - machinelearningplus
April 19, 2023 - As shown above, we have successfully pivoted the data by region, displaying the revenue for each quarter in separate columns for the US and EU regions. ... To unpivot a DataFrame (i.e., convert it from wide format to long format), you can use the stack function with a combination of select and expr ... # Method:1 Unpivot PySpark DataFrame from pyspark.sql.functions import expr unpivotExpr = "stack(2, 'EU',EU, 'US', US) as (region,revenue)" unPivotDF = pivot_df.select("year","quarter", expr(unpivotExpr)).where("revenue is not null") unPivotDF.show()
🌐
Apache Spark
spark.apache.org › docs › latest › sql-ref-syntax-qry-select-pivot.html
PIVOT Clause - Spark 4.1.1 Documentation
Specifies new columns, which are used to match values in column_list as the aggregating condition. We can also add aliases for them. CREATE TABLE person (id INT, name STRING, age INT, class INT, address STRING); INSERT INTO person VALUES (100, 'John', 30, 1, 'Street 1'), (200, 'Mary', NULL, 1, 'Street 2'), (300, 'Mike', 80, 3, 'Street 3'), (400, 'Dan', 50, 4, 'Street 4'); SELECT * FROM person PIVOT ( SUM(age) AS a, AVG(class) AS c FOR name IN ('John' AS john, 'Mike' AS mike) ); +------+-----------+---------+---------+---------+---------+ | id | address | john_a | john_c | mike_a | mike_c | +--
🌐
Databricks Community
community.databricks.com › t5 › data-engineering › pyspark-alias-is-not-applied-in-pivot-if-only-one-aggregation › td-p › 82886
Pyspark - alias is not applied in pivot if only one aggregation
November 17, 2025 - When you specify only one aggregation ... alias you provide. However, if there are multiple aggregations, Databricks combines the column alias and the aggregation alias (e.g., value_sum), improving consistency....
🌐
Stack Overflow
stackoverflow.com › questions › 73506389 › pivot-multiple-columns-pyspark
Pivot Multiple columns pyspark - Stack Overflow
Copy trouble_df = mydf.withColumn('combcol',F.concat(F.lit('trouble_code_'),mydf['trouble_code'])).groupby('Job #').pivot('combcol').agg(F.first('trouble_status')) Below is the output from the code which isnt exactly what i'm looking. Fairly new to pyspark so still learning
🌐
FavTutor
favtutor.com › blogs › pyspark-pivot-dataframe
PySpark pivot() DataFrame Function (Working & Example)
September 26, 2023 - ... pivot_col: This is the column ... newly created columns. agg_func: The aggregation function you want to apply when there are multiple values for the same combination of aggregation and pivot columns....