Here is one way (basically melt the DF, then pivot)
# First combine price and units into a map column
mydf = mydf.withColumn("price_units", F.create_map(F.lit("price"), "price", F.lit("units"), "units"))
# Now explode to get a melted dataframe
mydf = mydf.select("id", "day", F.explode("price_units").alias("name", "value"))
+---+---+-----+-----+
| id|day| name|value|
+---+---+-----+-----+
|100| 1|price| 23|
|100| 1|units| 10|
|100| 2|price| 45|
|100| 2|units| 11|
|100| 3|price| 67|
etc
# Then pivot
mydf.groupby("id", "name").pivot("day").agg(F.mean("value")).show()
+---+-----+----+----+----+----+
| id| name| 1| 2| 3| 4|
+---+-----+----+----+----+----+
|100|price|23.0|45.0|67.0|78.0|
|101|price|23.0|45.0|67.0|78.0|
|102|units|10.0|11.0|16.0|18.0|
|100|units|10.0|11.0|12.0|13.0|
|101|units|10.0|13.0|14.0|15.0|
|102|price|23.0|45.0|67.0|78.0|
+---+-----+----+----+----+----+
Answer from ags29 on Stack OverflowApache
spark.apache.org › docs › latest › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark 4.1.1 documentation
>>> df1.groupBy("year").pivot("course").sum("earnings").sort("year").show() +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+
Videos
Palantir
palantir.com › docs › foundry › transforms-python-spark › pyspark-aggregation
Python (Spark) • PySpark reference • Aggregation and pivot tables • Palantir
By default aggregations produce columns of the form aggregation_name(target_column). However, column names in Foundry cannot contain parentheses or other non-alphanumeric characters. Alias each aggregation to a specific name instead. Pivot tables in PySpark work very similarly to ordinary grouped ...
Databricks
api-docs.databricks.com › python › pyspark › latest › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark master documentation
Pivots a column of the current DataFrame and perform the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs ...
Medium
medium.com › @dhanashrisaner.30 › advanced-aggregations-and-grouping-in-pyspark-89ee7c9dcd6d
Advanced Aggregations and Grouping in PySpark | by Dhanashri Saner | Medium
November 14, 2024 - Efficient aggregation and grouping in PySpark allow data engineers to quickly analyze and summarize large datasets. Techniques like groupBy, pivot, and rollup empower you to produce detailed, multi-level reports that provide valuable insights, especially when working with large-scale data in ...
Apache
downloads.apache.org › spark › docs › 3.4.0 › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark 3.4.0 documentation
>>> df1.groupBy("year").pivot("course").sum("earnings").show() +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+ >>> df2.groupBy("sales.year").pivot("sales.course").sum("sales.earnings").show() ... +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+ pyspark.sql.GroupedData.min pyspark.sql.GroupedData.sum
Best-practice-and-impact
best-practice-and-impact.github.io › ons-spark › spark-functions › pivot-tables.html
Pivot tables in Spark — Spark at the ONS
sdf_pivot() is quite awkward with multiple aggregations on the same column. fun.aggregate can take a named list, but only one aggregation can be applied to each column. As we want to get the sum and max of total_cost, we can create another column, total_cost_copy, and aggregate on this.
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.DataFrame.pivot_table.html
pyspark.pandas.DataFrame.pivot_table — PySpark 4.1.1 documentation
>>> table = df.pivot_table(values='D', index=['A', 'B'], ... columns='C', aggfunc='sum', fill_value=0) >>> table.sort_index() C large small A B bar one 4 5 two 7 6 foo one 4 1 two 0 6 · We can also calculate multiple types of aggregations for any given value column.
spark-workshop
jaceklaskowski.github.io › spark-workshop › exercises › spark-sql-exercise-Pivoting-on-Multiple-Columns.html
Exercise: Pivoting on Multiple Columns | spark-workshop
Write a structured query that pivots a dataset on multiple columns. Since pivot aggregation allows for a single column only, find a solution to pivot on two or more columns.
Machine Learning Plus
machinelearningplus.com › blog › pyspark pivot – a detailed guide harnessing the power of pyspark pivot
PySpark Pivot - A Detailed Guide Harnessing the Power of PySpark Pivot - machinelearningplus
April 19, 2023 - As shown above, we have successfully pivoted the data by region, displaying the revenue for each quarter in separate columns for the US and EU regions. ... To unpivot a DataFrame (i.e., convert it from wide format to long format), you can use the stack function with a combination of select and expr ... # Method:1 Unpivot PySpark DataFrame from pyspark.sql.functions import expr unpivotExpr = "stack(2, 'EU',EU, 'US', US) as (region,revenue)" unPivotDF = pivot_df.select("year","quarter", expr(unpivotExpr)).where("revenue is not null") unPivotDF.show()
Apache Spark
spark.apache.org › docs › latest › sql-ref-syntax-qry-select-pivot.html
PIVOT Clause - Spark 4.1.1 Documentation
Specifies new columns, which are used to match values in column_list as the aggregating condition. We can also add aliases for them. CREATE TABLE person (id INT, name STRING, age INT, class INT, address STRING); INSERT INTO person VALUES (100, 'John', 30, 1, 'Street 1'), (200, 'Mary', NULL, 1, 'Street 2'), (300, 'Mike', 80, 3, 'Street 3'), (400, 'Dan', 50, 4, 'Street 4'); SELECT * FROM person PIVOT ( SUM(age) AS a, AVG(class) AS c FOR name IN ('John' AS john, 'Mike' AS mike) ); +------+-----------+---------+---------+---------+---------+ | id | address | john_a | john_c | mike_a | mike_c | +--
Databricks Community
community.databricks.com › t5 › data-engineering › pyspark-alias-is-not-applied-in-pivot-if-only-one-aggregation › td-p › 82886
Pyspark - alias is not applied in pivot if only one aggregation
November 17, 2025 - When you specify only one aggregation ... alias you provide. However, if there are multiple aggregations, Databricks combines the column alias and the aggregation alias (e.g., value_sum), improving consistency....
Stack Overflow
stackoverflow.com › questions › 73506389 › pivot-multiple-columns-pyspark
Pivot Multiple columns pyspark - Stack Overflow
Copy trouble_df = mydf.withColumn('combcol',F.concat(F.lit('trouble_code_'),mydf['trouble_code'])).groupby('Job #').pivot('combcol').agg(F.first('trouble_status')) Below is the output from the code which isnt exactly what i'm looking. Fairly new to pyspark so still learning