Apache Spark Multi-node setup

In this article, I’d be showing how to setup a 2-node Spark cluster. (i.e., a master and a slave/worker node)

Download the suitable distribution from Apache’s Spark website (select the version and the package type you’d want).

export SPARK_HOME=/ebs/apps/spark

export SPARK_HOME=/ebs/apps/spark
Enable ssh connectivity between master and slaves.

sudo apt-get install openssh-server
Configure Spark from commandline options. Following are the options for Spark master:
 -i HOST, --ip HOST Hostname to listen on (deprecated, please use --host or -h)
 -h HOST, --host HOST Hostname to listen on
 -p PORT, --port PORT Port to listen on (default: 7077)
 --webui-port PORT Port for web UI (default: 8080)
 --properties-file FILE Path to a custom Spark properties file.
 Default is conf/spark-defaults.conf.

More details here.

Set master node host IP to: 0.0.0.0

This bind IP enables connectivity from anywhere.

Master:

./sbin/start-master.sh -h 0.0.0.0

Slave:

./sbin/start-slave.sh spark://<master-hostname-ip>:7077
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s