pyspark pivot multiple aggregations

Pyspark Pivot with multiple aggregations

stackoverflow.com › questions › 61126917 › pyspark-pivot-with-multiple-aggregations

Here is one way (basically melt the DF, then pivot)

# First combine price and units into a map column
mydf = mydf.withColumn("price_units", F.create_map(F.lit("price"), "price", F.lit("units"), "units"))

# Now explode to get a melted dataframe
mydf = mydf.select("id", "day", F.explode("price_units").alias("name", "value"))

+---+---+-----+-----+
| id|day| name|value|
+---+---+-----+-----+
|100|  1|price|   23|
|100|  1|units|   10|
|100|  2|price|   45|
|100|  2|units|   11|
|100|  3|price|   67|
etc

# Then pivot
mydf.groupby("id", "name").pivot("day").agg(F.mean("value")).show()

+---+-----+----+----+----+----+
| id| name|   1|   2|   3|   4|
+---+-----+----+----+----+----+
|100|price|23.0|45.0|67.0|78.0|
|101|price|23.0|45.0|67.0|78.0|
|102|units|10.0|11.0|16.0|18.0|
|100|units|10.0|11.0|12.0|13.0|
|101|units|10.0|13.0|14.0|15.0|
|102|price|23.0|45.0|67.0|78.0|
+---+-----+----+----+----+----+

Answer from ags29 on Stack Overflow

Stack Overflow

stackoverflow.com › questions › 61126917 › pyspark-pivot-with-multiple-aggregations

python - Pyspark Pivot with multiple aggregations - Stack Overflow

Videos

youtube.com

How to Solve Pyspark Pivot Function Issues with Grouping ...

08:56

YouTube

PySpark GroupBy Tutorial for Beginners | Aggregations with agg() ...

February 1, 2026

08:53

YouTube

How to Use Pivot Function in PySpark | Transform and Summarize ...

April 14, 2025

View all

Apache

spark.apache.org › docs › latest › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html

pyspark.sql.GroupedData.pivot — PySpark 4.1.1 documentation

>>> df1.groupBy("year").pivot("course").sum("earnings").sort("year").show() +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+

Medium

medium.com › @dhanashrisaner.30 › advanced-aggregations-and-grouping-in-pyspark-89ee7c9dcd6d

Advanced Aggregations and Grouping in PySpark | by Dhanashri Saner | Medium

November 14, 2024 - Efficient aggregation and grouping in PySpark allow data engineers to quickly analyze and summarize large datasets. Techniques like groupBy, pivot, and rollup empower you to produce detailed, multi-level reports that provide valuable insights, especially when working with large-scale data in ...

Spark Playground

sparkplayground.com › tutorials › pyspark › pivoting-data

How to Pivot Data in PySpark - Spark Playground

You can also apply more complex aggregation(s) via .agg() after .pivot(). For example: from pyspark.sql.functions import sum, avg agg_df = df.groupBy("Product") \ .pivot("Region") \ .agg( sum("Sales").alias("total_sales"), avg("Sales").alias("avg_sales") ) agg_df.show()

GeeksforGeeks

geeksforgeeks.org › python › python-pyspark-pivot-function

Python PySpark pivot() Function - GeeksforGeeks

July 26, 2025 - And for Product A, the sales from two entries in the East region have been averaged ((100 + 50)/2 = 75.0). For Product B, the sales in the West region have also been aggregated ((250 + 100)/2 = 175.0). The pivot() function in PySpark is a powerful tool for transforming data.

Apache

downloads.apache.org › spark › docs › 3.4.0 › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html

pyspark.sql.GroupedData.pivot — PySpark 3.4.0 documentation

>>> df1.groupBy("year").pivot("course").sum("earnings").show() +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+ >>> df2.groupBy("sales.year").pivot("sales.course").sum("sales.earnings").show() ... +----+-----+------+ |year| Java|dotNET| +----+-----+------+ |2012|20000| 15000| |2013|30000| 48000| +----+-----+------+ pyspark.sql.GroupedData.min pyspark.sql.GroupedData.sum

Find elsewhere

Google Bing Mojeek

DataCamp

datacamp.com › fr › tutorial › pyspark-groupby

Mastering PySpark’s groupBy for Scalable Data Aggregation | DataCamp

July 16, 2025 - The benefit of PySpark is the ability to use some more advanced aggregation patterns, such as pivoting your data, rolling up, and creating data cubes.

Databricks

api-docs.databricks.com › python › pyspark › latest › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html

pyspark.sql.GroupedData.pivot — PySpark master documentation

Pivots a column of the current DataFrame and perform the specified aggregation. There are two versions of pivot function: one that requires the caller to specify the list of distinct values to pivot on, and one that does not. The latter is more concise but less efficient, because Spark needs ...

spark-workshop

jaceklaskowski.github.io › spark-workshop › exercises › spark-sql-exercise-Pivoting-on-Multiple-Columns.html

Exercise: Pivoting on Multiple Columns | spark-workshop

Write a structured query that pivots a dataset on multiple columns. Since pivot aggregation allows for a single column only, find a solution to pivot on two or more columns.

Stack Overflow

stackoverflow.com › questions › 41832734 › pyspark-1-6-aliasing-columns-after-pivoting-with-multiple-aggregates

python 2.7 - Pyspark 1.6 - Aliasing columns after pivoting with multiple aggregates - Stack Overflow

This is happening because the column you are pivoting on doesn't have distinct values. This results in duplicate column names when the pivot occurs so spark gives it those column names to make them distinct.

Databricks

forums.databricks.com › questions › 1574 › agg-function-not-working-for-multiple-aggregations.html

Solved: agg function not working for multiple aggregations - Databricks Community - 30314

November 12, 2016 - Data has 2 columns: |requestDate|requestDuration| | 2015-06-17| 104| Here is the code: avgSaveTimesByDate = gridSaves.groupBy(gridSaves.requestDate).agg({"requestDuration": "min", "requestDuration": "max","requestDuration": "avg"}) avgSaveTimesByDate.show(100) Summary of Issue I expect 4 columns of ...

Svds

svds.com › pivoting-data-in-sparksql

Pivoting Data in SparkSQL - Silicon Valley Data Science

September 21, 2017 - >>> df.groupBy('class').pivot('year', [1999, 2008]).avg('hwy').show() Finally, just like a normal group by we can use multiple aggregations: >>> from pyspark.sql import functions as F >>> df.groupBy('class').pivot('year', [1999, 2008]).agg(F.min(df.hwy), F.max(df.hwy)).show()

Cloudera Community

community.cloudera.com › t5 › Support-Questions › How-do-I-run-multiple-pivots-on-a-Spark-DataFrame › td-p › 143712

How do I run multiple pivots on a Spark DataFrame? - Cloudera Community - 143712

June 20, 2016 - import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.functions.udf object pivotDF { // Define the application Name val AppName: String = "pivotDF" // Set the logging level Logger.getLogger("org.apache").setLevel(Level.ERROR) // Define a udf to concatenate two passed in string values val concat = udf( (first: String, second: String) => { first + " " + second } ) def main (args: Array[String]) { // define the input parameters val input_file = "/Users/gangadharkadam/myapps/pivot/src/main/

Hendra-herviawan

hendra-herviawan.github.io › pyspark-groupby-and-aggregate-functions.html

Pyspark: GroupBy and Aggregate Functions | M Hendra Herviawan

June 18, 2017 - A set of methods for aggregations on a DataFrame: agg · avg · count · max · mean · min · pivot · sum · df.groupBy('Company') <pyspark.sql.group.GroupedData at 0x7f532c65eba8> This returns a GroupedData object, off of which you can all various methods ·

Apache JIRA

issues.apache.org › jira › browse › SPARK-17458

[SPARK-17458] Alias specified for aggregates in a pivot are not honored - ASF JIRA

January 15, 2017 - When using pivot and multiple aggregations we need to alias to avoid special characters, but alias does not help because

Datumo

datumo.io › blog › spark-danger-pivot-is-an-action

Spark danger: pivot is an action! Hidden action in Apache Spark function | Blog | Datumo

May 20, 2024 - Code 2. Aggregation of transport dataset · Table 2. Example dataset after group by operation. Nonetheless, there are instances when we may prefer to pivot a single column. This transformation would transpose unique values from that particular column into new columns within an output DataFrame.

Spark Code Hub

sparkcodehub.com › pyspark › dataframe › pivot

Pivot Operation in PySpark DataFrames

Spark Code Hub.com is Free Online Tutorials Website Providing courses in Spark, PySpark, Python, SQL, Angular, Data Warehouse, ReactJS, Java, Git, Algorithms, Data Structure, and Interview Questions with Examples

reddit.com › r/snowflake › how to pivot without aggregation?

r/snowflake on Reddit: How to pivot without aggregation?

March 24, 2023 -

Is there any other option to pivot in Snowflake without having to aggregate?