Most probably your DataFrame is the Pandas DataFrame object, not Spark DataFrame object.
try:
spark.createDataFrame(df).write.saveAsTable("dashboardco.AccountList")
Answer from Alex Ott on Stack OverflowIt's related to the Databricks Runtime (DBR) version used - the Spark versions in up to DBR 12.2 rely on .iteritems function to construct a Spark DataFrame from Pandas DataFrame. This issue was fixed in the Spark 3.4 that is available as DBR 13.x.
If you can't upgrade to DBR 13.x, then you need to downgrade the Pandas to latest 1.x version (1.5.3 right now) by using %pip install -U pandas==1.5.3 command in your notebook. Although it's just better to use Pandas version shipped with your DBR - it was tested for compatibility with other packages in DBR.
I couldn't change package versions, but it looks like this was a name change only.
So I did
df.iteritems = df.items
and spark.createDataFrame(df) works now.
Sure, it's ugly, and it will break my notebook when I move to a cluster with a new DBR, but it works for now.
EDIT: AyoubH's answer is better because you only have to do it once. With the code above, you have to modify every data frame you display.
I've seen such error when driver & executors had different version of Pandas installed. In my case it was driver with Pandas 1.1.0 (via databricks-connect), and executors were on Databricks Runtime 7.3 with Pandas 1.0.1. Pandas 1.1.0 has a big change in internals, so the code sent by the driver to executors is broken. You need to check that your executors and driver have the same version of the Pandas (you can find version of the Pandas used by Databricks Runtimes in the release notes). You can use the following script to compare version of the Python libraries on executors & driver.
I come across the same question。
i think it is due to pandas version difference。
i solved this bug by updating my pandas version from 1.0.1 to 1.0.5
As a workaround, downgrade to pandas v1.5
%pip install --upgrade pandas==1.5
The answers provided till now used to work prior to 3rd April 2023.
As of April 4, with pandas 2.0.0, you are not able to convert a Pandas DataFrame to a Spark DataFrame using the command:
spark.createDataFrame(df)
Using the above command leads to the error mentioned in the question:
AttributeError: 'DataFrame' object has no attribute 'iteritems'
The iteritems function seems to have been removed in pandas 2.0.0. From the changelog of pandas 2.0.0:
Removed deprecated Series.iteritems(), DataFrame.iteritems(), use obj.items instead
While the code written in spark to convert pandas dataframe to a spark dataframe still uses iteritems
/databricks/spark/python/pyspark/sql/pandas/conversion.py in createDataFrame(self, data, schema, samplingRatio, verifySchema)
308 warnings.warn(msg)
309 raise
--> 310 data = self._convert_from_pandas(data, schema, timezone)
311 return self._create_dataframe(data, schema, samplingRatio, verifySchema)
312
/databricks/spark/python/pyspark/sql/pandas/conversion.py in _convert_from_pandas(self, pdf, schema, timezone)
340 pdf[field.name] = s
341 else:
--> 342 for column, series in pdf.iteritems():
343 s = _check_series_convert_timestamps_tz_local(series, timezone)
344 if s is not series:
Looks like we will have to wait for a fix to use Pandas 2.0.0.
You just need to use display function passing Pandas DataFrame as the argument - not try to call it as a member of the Pandas DataFrame class.
display(pdf)

Or you can simply specify variable name with Pandas DataFrame object - then it will be printed using Panda's built-in representation
import pyspark.sql.functions as F
pdf = spark.range(10).withColumn("rnd", F.rand()).toPandas()
