This is a current issue with pyspark 2.4.0 installed via conda. You'll want to downgrade to pyspark 2.3.0 via conda prompt or Linux terminal:
conda install pyspark=2.3.0
Answer from N.Yasarturk on Stack Overflowdocker - Py4JJavaError: An error occurred while calling o45.load. : java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/StreamWriteSupport - Stack Overflow
python - How to resolve this error: Py4JJavaError: An error occurred while calling o70.showString? - Stack Overflow
python 3.x - py4j.protocol.py4jjavaerror: an error occurred while calling o36.load. : java.lang.noclassdeffounderror: scala/$less$colon$less - Stack Overflow
python - Py4JJavaError: An error occurred while calling o41.load. : java.lang.ClassNotFoundException: - Stack Overflow
This is a current issue with pyspark 2.4.0 installed via conda. You'll want to downgrade to pyspark 2.3.0 via conda prompt or Linux terminal:
conda install pyspark=2.3.0
You may not have right permissions.
I have the same problem when I use a docker image jupyter/pyspark-notebook to run an example code of pyspark, and it was solved by using root within the container.
Anyone also use the image can find some tips here.
before running the above code you can manually set the env variable like this
import os
import sys
os.environ['PYSPARK_PYTHON'] = sys.executable
os.environ['PYSPARK_DRIVER_PYTHON'] = sys.executable
this worked in jupyter notebook for me.
The key is in this part of the error message:
RuntimeError: Python in worker has different version 3.9 than that in driver 3.10, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
You need to have exactly the same Python versions in driver and worker nodes.
Probably a quick solution would be to downgrade your Python version to 3.9 (assuming driver is running on the client you're using).