Here's a non-UDF way involving a single pivot (hence, just a single column scan to identify all the unique dates).

dff = mydf.groupBy('id').pivot('day').agg(F.first('price').alias('price'),F.first('units').alias('unit'))

Here's the result (apologies for the non-matching ordering and naming):

+---+-------+------+-------+------+-------+------+-------+------+               
| id|1_price|1_unit|2_price|2_unit|3_price|3_unit|4_price|4_unit|
+---+-------+------+-------+------+-------+------+-------+------+
|100|     23|    10|     45|    11|     67|    12|     78|    13|
|101|     23|    10|     45|    13|     67|    14|     78|    15|
|102|     23|    10|     45|    11|     67|    16|     78|    18|
+---+-------+------+-------+------+-------+------+-------+------+

We just aggregate both on the price and the unit column after pivoting on the day.

If naming required as in question,

dff.select([F.col(c).name('_'.join(x for x in c.split('_')[::-1])) for c in dff.columns]).show()

+---+-------+------+-------+------+-------+------+-------+------+
| id|price_1|unit_1|price_2|unit_2|price_3|unit_3|price_4|unit_4|
+---+-------+------+-------+------+-------+------+-------+------+
|100|     23|    10|     45|    11|     67|    12|     78|    13|
|101|     23|    10|     45|    13|     67|    14|     78|    15|
|102|     23|    10|     45|    11|     67|    16|     78|    18|
+---+-------+------+-------+------+-------+------+-------+------+
Answer from Jedi on Stack Overflow
🌐
Spark By {Examples}
sparkbyexamples.com › home › pyspark › pyspark pivot and unpivot dataframe
PySpark Pivot and Unpivot DataFrame - Spark By {Examples}
October 10, 2025 - PySpark pivot() function is used to rotate/transpose the data from one column into multiple Dataframe columns and back using unpivot(). Pivot() It is an
🌐
Apache Spark
spark.apache.org › docs › latest › sql-ref-syntax-qry-select-pivot.html
PIVOT Clause - Spark 4.1.1 Documentation
Specifies new columns, which are used to match values in column_list as the aggregating condition. We can also add aliases for them. CREATE TABLE person (id INT, name STRING, age INT, class INT, address STRING); INSERT INTO person VALUES (100, 'John', 30, 1, 'Street 1'), (200, 'Mary', NULL, 1, 'Street 2'), (300, 'Mike', 80, 3, 'Street 3'), (400, 'Dan', 50, 4, 'Street 4'); SELECT * FROM person PIVOT ( SUM(age) AS a, AVG(class) AS c FOR name IN ('John' AS john, 'Mike' AS mike) ); +------+-----------+---------+---------+---------+---------+ | id | address | john_a | john_c | mike_a | mike_c | +--
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.sql › api › pyspark.sql.GroupedData.pivot.html
pyspark.sql.GroupedData.pivot — PySpark 4.1.1 documentation
Compute the sum of earnings for each year by course with each course as a separate column · >>> df1.groupBy("year").pivot( ... "course", ["dotNET", "Java"] ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-pyspark-pivot-function
Python PySpark pivot() Function - GeeksforGeeks
July 26, 2025 - The pivot() function in PySpark is a powerful method used to reshape a DataFrame by transforming unique values from one column into multiple columns in a new DataFrame, while aggregating data in the process.
🌐
EDUCBA
educba.com › home › software development › software development tutorials › python tutorial › pyspark pivot
PySpark pivot | Working and example of PIVOT in PySpark
April 11, 2023 - The PySpark pivot is used for the rotation of data from one Data Frame column into multiple columns. It is an aggregation function that is used for the rotation of data from one column to multiple columns in PySpark.
Address   Unit no. 202, Jay Antariksh Bldg, Makwana Road, Marol, Andheri (East),, 400059, Mumbai
Find elsewhere
🌐
Databricks
databricks.com › blog › 2016 › 02 › 09 › reshaping-data-with-pivot-in-apache-spark.html
Reshaping Data with Pivot in Apache Spark | Databricks Blog
February 9, 2016 - Although the syntax only allows pivoting on one column, you can combine columns to get the same result as pivoting multiple columns.
🌐
Medium
medium.com › towards-data-engineering › efficient-data-processing-with-pysparks-pivot-and-stack-functions-in-databricks-fa97261ec430
Efficient Data Processing with PySpark’s Pivot and Stack Functions in Databricks | by Naveen Sorout | Towards Data Engineering | Medium
December 29, 2023 - The “pivot()” function works by aggregating data based on one or more columns, and then transposing the values of one of those columns into separate columns in the DataFrame. This helps to organize and analyze the data in a more efficient way.
🌐
Machine Learning Plus
machinelearningplus.com › blog › pyspark pivot – a detailed guide harnessing the power of pyspark pivot
PySpark Pivot - A Detailed Guide Harnessing the Power of PySpark Pivot - machinelearningplus
April 19, 2023 - As shown above, we have successfully pivoted the data by region, displaying the revenue for each quarter in separate columns for the US and EU regions. ... To unpivot a DataFrame (i.e., convert it from wide format to long format), you can use the stack function with a combination of select and expr ... # Method:1 Unpivot PySpark DataFrame from pyspark.sql.functions import expr unpivotExpr = "stack(2, 'EU',EU, 'US', US) as (region,revenue)" unPivotDF = pivot_df.select("year","quarter", expr(unpivotExpr)).where("revenue is not null") unPivotDF.show()
🌐
Spark Code Hub
sparkcodehub.com › pyspark › dataframe › pivot
Pivot Operation in PySpark DataFrames
Spark Code Hub.com is Free Online Tutorials Website Providing courses in Spark, PySpark, Python, SQL, Angular, Data Warehouse, ReactJS, Java, Git, Algorithms, Data Structure, and Interview Questions with Examples
🌐
Cloudera Community
community.cloudera.com › t5 › Support-Questions › How-do-I-run-multiple-pivots-on-a-Spark-DataFrame › td-p › 143712
How do I run multiple pivots on a Spark DataFrame? - Cloudera Community - 143712
June 20, 2016 - import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.hive.HiveContext import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.sql.functions.udf object pivotDF { // Define the application Name val AppName: String = "pivotDF" // Set the logging level Logger.getLogger("org.apache").setLevel(Level.ERROR) // Define a udf to concatenate two passed in string values val concat = udf( (first: String, second: String) => { first + " " + second } ) def main (args: Array[String]) { // define the input parameters val input_file = "/Users/gangadharkadam/myapps/pivot/src/main/
🌐
Spark By {Examples}
sparkbyexamples.com › home › apache spark › how to pivot and unpivot a spark data frame
How to Pivot and Unpivot a Spark Data Frame - Spark By {Examples}
April 25, 2024 - This article describes and provides scala example on how to Pivot Spark DataFrame ( creating Pivot tables ) and Unpivot back. Pivoting is used to rotate the data from one column into multiple columns. It is an aggregation where one of the grouping columns values transposed into individual columns with distinct data.
🌐
Databricks
community.databricks.com › t5 › get-started-discussions › pivot-on-multiple-columns › td-p › 54092
Pivot on multiple columns - Databricks Community - 54092
November 29, 2023 - I want to pass multiple column as argument to pivot a dataframe in pyspark pivot like mydf.groupBy("id").pivot("day","city").agg(F.sum("price").alias("price"),F.sum("units").alias("units")).show(). One way I found is to create multiple df with different pivot and join them which will result in m...
🌐
Medium
medium.com › @uzzaman.ahmed › exploring-the-capabilities-and-limitations-of-pysparks-pivot-function-6a8eae09338b
Exploring the Capabilities and Limitations of PySpark’s Pivot Function | by Ahmed Uz Zaman | Medium
February 16, 2023 - As you can see, the pivot() function has created two new columns ("Math" and "Science") and filled them with the average values of the "Score" column for each group. This wide format is often easier to work with when performing further analysis ...
🌐
FavTutor
favtutor.com › blogs › pyspark-pivot-dataframe
PySpark pivot() DataFrame Function (Working & Example)
September 26, 2023 - PySpark provides a straightforward way to perform pivot operations using the pivot method on a DataFrame. ... pivot_col: This is the column whose unique values will become the new columns in the pivoted DataFrame. values_col: This column contains the values that you want to spread across the newly created columns. agg_func: The aggregation function you want to apply when there are multiple values for the same combination of aggregation and pivot columns.
Top answer
1 of 2
2

Pivot is an expensive shuffle operation and should be avoided if possible. Try using this logic with arrays_zip and explode to dynamically collapse columns and groupby-aggregate.

from pyspark.sql import functions as F   

df.withColumn("cols", F.explode(F.arrays_zip(F.array([F.array(F.col(x),F.lit(x))\
                                                    for x in df.columns if x!='id']))))\
  .withColumn("name", F.col("cols.0")[1]).withColumn("val", F.col("cols.0")[0]).drop("cols")\
  .groupBy("name").agg(F.count(F.when(F.col("val")=='diff',1)).alias("diff"),\
                       F.count(F.when(F.col("val")=='same',1)).alias("same")).orderBy("name").show()

#+----+----+----+
#|name|diff|same|
#+----+----+----+
#|  c1|   2|   2|
#|  c2|   0|   4|
#|  c3|   1|   3|
#+----+----+----+

You can also do this by exploding a map_type by creating a map dynamically.

from pyspark.sql import functions as F
from itertools import chain

df.withColumn("cols", F.create_map(*(chain(*[(F.lit(name), F.col(name))\
                                  for name in df.columns if name!='id']))))\
  .select(F.explode("cols").alias("name","val"))\
  .groupBy("name").agg(F.count(F.when(F.col("val")=='diff',1)).alias("diff"),\
                       F.count(F.when(F.col("val")=='same',1)).alias("same")).orderBy("name").show()

#+----+----+----+
#|name|diff|same|
#+----+----+----+
#|  c1|   2|   2|
#|  c2|   0|   4|
#|  c3|   1|   3|
#+----+----+----+
2 of 2
-1
from pyspark.sql.functions import *
df = spark.createDataFrame([(1,'diff','same','diff'),(2,'same','same','same'),(3,'diff','same','same'),(4,'same','same','same')],['idcol','C1','C2','C3'])
df.createOrReplaceTempView("MyTable")
#spark.sql("select * from MyTable").collect()
x1=spark.sql("select idcol, 'C1' AS col, C1 from MyTable union all select idcol, 'C2' AS col, C2 from MyTable  union all select idcol, 'C3' AS col, C3 from MyTable")
#display(x1)
x2=x1.groupBy('col').pivot('C1').agg(count('C1')).orderBy('col')
display(x2)
🌐
Linux Hint
linuxhint.com › pyspark-pivot
PySpark Pivot()
Linux Hint LLC, [email protected] 1210 Kelly Park Circle, Morgan Hill, CA 95037 Privacy Policy and Terms of Use
🌐
Statology
statology.org › home › how to create a pivot table in pyspark (with example)
How to Create a Pivot Table in PySpark (With Example)
October 26, 2023 - We can use the following syntax to create a pivot table using team as the rows, position as the columns and the sum of points as the values within the pivot table:
🌐
Apache
spark.apache.org › docs › latest › api › python › reference › pyspark.pandas › api › pyspark.pandas.DataFrame.pivot_table.html
pyspark.pandas.DataFrame.pivot_table — PySpark 4.1.1 documentation
>>> df = ps.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo", ... "bar", "bar", "bar", "bar"], ... "B": ["one", "one", "one", "two", "two", ... "one", "one", "two", "two"], ... "C": ["small", "large", "large", "small", ... "small", "large", "small", "small", ... "large"], ... "D": [1, 2, 2, 3, 3, 4, 5, 6, 7], ... "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]}, ... columns=['A', 'B', 'C', 'D', 'E']) >>> df A B C D E 0 foo one small 1 2 1 foo one large 2 4 2 foo one large 2 5 3 foo two small 3 5 4 foo two small 3 6 5 bar one large 4 6 6 bar one small 5 8 7 bar two small 6 9 8 bar two large 7 9 · This first example aggregates values by taking the sum. >>> table = df.pivot_table(values='D', index=['A', 'B'], ...