From: https://nerdsrule.co/2016/06/15/ipython-notebook-and-spark-setup-for-windows-10/
Install Java jdk
You can download and install from Oracle here. Once installed I created a new folder called Java in program files moved the JDK folder into it. Copy the path and set JAVA_HOME
within your system environment variables. The add %JAVA_HOME%\bin
to Path
variable. To get the click start, then settings, and then search environment variables. This should bring up ‘Edit your system environment variables’, which should look like this:
Installing and setting up Python and IPython
For simplicity I downloaded and installed Anaconda with the python 2.7 version from Continuum Analytics (free) using the built-in install wizard. Once installed you need to do the same thing you did with Java. Name the variable as PYTHONPATH
, where my paths were C:\User\cconnell\Anaconda2. You will also need to add %PYTHONPATH%
to the Path
variable
Installing Spark
I am using 1.6.0 (I like to go back at least one version) from Apache Spark here. Once unzipped do the same thing as before setting Spark_Home
variable to where your spark folder is and then add %Spark_Home%\bin
to the path
variable.
Installing Hadoop binaries
Download and unzip the Hadoop common binaries and set a HADOOP_HOME
variable to that location.
Getting everything to work together
- Go into your spark/conf folder and rename log4j.properties.template to log4j.properties
- Open log4j.properties in a text editor and change log4j.rootCategory to WARN from INFO
- Add two new environment variables like before:
PYSPARK_DRIVER_PYTHON
tojupyter
(edited from ‘ipython’ in the pic) andPYSPARK_DRIVER_PYTHON_OPTS
tonotebook
Now to launch your PySpark notebook just type pyspark
from the console and it will automatically launch in your browser.