In my environment (using docker and the image sequenceiq/spark:1.1.0-ubuntu), I ran in to this. If you look at the pyspark shell script, you'll see that you need a few things added to your PYTHONPATH:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
That worked in ipython for me.
Update: as noted in the comments, the name of the py4j zip file changes with each Spark release, so look around for the right name.
Answer from nealmcb on Stack OverflowIn my environment (using docker and the image sequenceiq/spark:1.1.0-ubuntu), I ran in to this. If you look at the pyspark shell script, you'll see that you need a few things added to your PYTHONPATH:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
That worked in ipython for me.
Update: as noted in the comments, the name of the py4j zip file changes with each Spark release, so look around for the right name.
I solved this problem by adding some paths in .bashrc
export SPARK_HOME=/home/a141890/apps/spark
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
After this, it never raise ImportError: No module named py4j.java_gateway.
» pip install py4j
Using findspark is expected to solve the problem:
Install findspark
$pip install findspark
In you code use:
import findspark
findspark.init()
Optionally you can specify "/path/to/spark" in the init method above; findspark.init("/path/to/spark")
As outlined @ pyspark error does not exist in the jvm error when initializing SparkContext, adding PYTHONPATH environment variable (with value as:
%SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-<version>-src.zip:%PYTHONPATH%,
- just check what py4j version you have in your spark/python/lib folder) helped resolve this issue.