Run Apache Spark on Windows (yeah, I know!)

Download the suitable distribution from Apache’s Spark website (select the version and the package type you’d want).

Download winutils.exe from here or here.

Place it in a directory (Maybe, C:/Hadoop/bin/winutils.exe), go to the directory containing winutils.exe and run the following command.

winutils.exe chmod 777 /tmp/hive

You need to set environment variables HADOOP_HOME and before you proceed further.

Set HADOOP_HOME = C:/Hadoop (Note: winutils.exe is taken from %HADOOP_HOME%/bin, so point HADOOP_HOME just to the root directory)




Run <your-spark-directory>/bin/spark-shell.cmd

Run this in your browser!


For testing Spark, create a file called test.json and add the following data into it.

    "glossary": {
        "title": "example glossary",
		"GlossDiv": {
            "title": "S",
			"GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
					"SortAs": "SGML",
					"GlossTerm": "Standard Generalized Markup Language",
					"Acronym": "SGML",
					"Abbrev": "ISO 8879:1986",
					"GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
						"GlossSeeAlso": ["GML", "XML"]
					"GlossSee": "markup"

Run the following commands in Spark shell in sequence and test it:

val sc: SparkContext // An existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

val df ="path/test.json")

// Displays the content of the DataFrame to stdout



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s