There is a number of issues with your question:
To start with, PySpark is not an add-on package, but an essential component of Spark itself; in other words, when installing Spark you get also PySpark by default (you cannot avoid it, even if you would like to). So, step 2 should be enough (and even before that, PySpark should be available in your machine since you have been using Spark already).
Step 1 is unnecessary: Pyspark from PyPi (i.e. installed with pip or conda) does not contain the full Pyspark functionality; it is only intended for use with a Spark installation in an already existing cluster. From the docs:
The Python packaging for Spark is not intended to replace all of the other use cases. This Python packaged version of Spark is suitable for interacting with an existing cluster (be it Spark standalone, YARN, or Mesos) - but does not contain the tools required to setup your own standalone Spark cluster. You can download the full version of Spark from the Apache Spark downloads page.
NOTE: If you are using this with a Spark standalone cluster you must ensure that the version (including minor version) matches or you may experience odd errors
Based on the fact that, as you say, you have already been using Spark (via Scala), your issue seems rather to be about upgrading. Now, if you use pre-built Spark distributions, you have actually nothing to install - you just download, unzip, and set the relevant environment variables (SPARK_HOME etc) - see my answer on "upgrading" Spark, which is actually also applicable for first-time "installations".
python - Installing pyspark on MacBook - Stack Overflow
installation - How to install and use pyspark on mac - Stack Overflow
python - Installing PySpark on Mac with pipenv - Stack Overflow
installing pyspark on my m1 mac, getting an env error
Videos
There is a number of issues with your question:
To start with, PySpark is not an add-on package, but an essential component of Spark itself; in other words, when installing Spark you get also PySpark by default (you cannot avoid it, even if you would like to). So, step 2 should be enough (and even before that, PySpark should be available in your machine since you have been using Spark already).
Step 1 is unnecessary: Pyspark from PyPi (i.e. installed with pip or conda) does not contain the full Pyspark functionality; it is only intended for use with a Spark installation in an already existing cluster. From the docs:
The Python packaging for Spark is not intended to replace all of the other use cases. This Python packaged version of Spark is suitable for interacting with an existing cluster (be it Spark standalone, YARN, or Mesos) - but does not contain the tools required to setup your own standalone Spark cluster. You can download the full version of Spark from the Apache Spark downloads page.
NOTE: If you are using this with a Spark standalone cluster you must ensure that the version (including minor version) matches or you may experience odd errors
Based on the fact that, as you say, you have already been using Spark (via Scala), your issue seems rather to be about upgrading. Now, if you use pre-built Spark distributions, you have actually nothing to install - you just download, unzip, and set the relevant environment variables (SPARK_HOME etc) - see my answer on "upgrading" Spark, which is actually also applicable for first-time "installations".
Step 1: If you don't have brew first install brew using the following command in terminal
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Step 2: Once you have brew then run below command to install java on your Mac.
brew cask install homebrew/cask-versions/adoptopenjdk8
Step 3: Once Java is installed run the below command to install spark on Mac
brew install apache-spark
Step 4: type pyspark -version
I wanted to learn spark on both scala and Python. I had python3 already so started off with Scala installation first
brew install coursier/formulas/coursier && cs setup
this installed a bunch of stuff scala, scala cli etc etc
Then I ran
brew install apache-spark
which installed a whole bunch of stuff as well. The apache-spark docs said pypark comes along with it, and you can run it by just running "pyspark" which I did, and a got a python like REPL
whenever I imported it in the Jupyter notebook however, it didn't work, so I had to do a pip3 install
then I ran
from pyspark import SparkContext sc = SparkContext() n = sc.parallelize([4,10,9,7]) n.take(3)
I got this as an error
raise RuntimeError(("Python in worker has different version %s than that in " +
RuntimeError: Python in worker has different version 3.8 than that in driver 3.7, PySpark cannot run with different minor versions. Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.I got the path using "which python3" and tried setting the above two paths in zshrc and z_profile to no effect. I'm a bit of a noob with this stuff and now I'm lost.
Can someone help me out here? TIA!