Heroku Button deploy of Apache Zeppelin notebooks for Apache Spark and Postgres
requires: a private space with dns-discovery enabled, and either a spark-in-space cluster running. or heroku postgres attached.
Note: The rest of this readme assumes you have set your app name as
$app, and your spark cluster app name as
$spark in your shell, like so
When performing the button app creation you should set the
Once your button deploy completes, you can tail the logs and see the zeppelin and web process come online.
heroku logs -t -a $app
You can open the zeppelin web ui by running
heroku open -a $app. The default basic auth creds are
you set different credentials during app creation use those credentials.
Once you see the zeppelin ui, click
Create new note, and name the note. You will be taken to the UI for the note.
Enter the following text, and then hit Shift+Enter to run the notebook.
sc.parallelize(1 to 10000).reduce(_+_)
The first run will take several seconds to spin up a spark driver and connect to your spark cluster.
Edit the text to read as follows, and then hit Shift+Enter to run the notebook again. You should see the result very rapidly.
sc.parallelize(1 to 12345).reduce(_+_)
You can also look at the spark cluster ui to see what is runing. See the spark-in-space docs on how to look at the ui.
To add more spark-in-space clusters to the zeppelin interpreter list, simply set the
SPARK_INTERPRETER_APPS, in a comma seperated format.
heroku config:set SPARK_INTERPRETER_APPS=my-spark-cluster,my-other-spark-cluster -a $app
You can set the binding for your previously created notebook by clicking the black settings icon in the top right corner.
If you attach existing heroku postgres databases to your zeppelin-in-space app you can create notebook for postgres.
Note: The following assumes you have set your postgres app name as
heroku addons -a $pg #find the database name that you want to attach and set it as a shell var database=some-heroku-haiku-12345 heroku addons:attach $database -a $app --as FIRST_DB heroku config:set DATABASE_INTERPRETER_URLS=FIRST_DB_DATABASE_URL -a $app
Once the zeppelin app restarts, create a new note, enter the following and execute it.
%psql select * from pg_stat_activity;
You should see the query result.
To add more postgres databases, follow the same procedure, and append them in a comma seperated manner to
heroku config:set DATABASE_INTERPRETER_URLS=FIRST_DB_DATABASE_URL,SOMEOTHER_DATABASE_URL -a $app
By default the button deploy uses S3 via the bucketeer addon, created with the
--as ZEPPELIN_S3 option.
Zeppelin itself and the spark configuration generated for this app substitutes in the bucket and credentials provided by that addon, so reads and writes by zeppelin and any spark jobs to S3 will use those credentials.
If you want to bring your own S3 bucket, simply remove the bucketeer addon and manually set the configuration.
heroku addons:remove bucketeer -a $app heroku config:set ZEPPELIN_S3_BUCKET_NAME=<the bucket you want to store notebooks in> -a $app heroku config:set ZEPPELIN_S3_AWS_ACCESS_KEY_ID=<creds that can read and write your buckets> -a $app heroku config:set ZEPPELIN_S3_AWS_SECRET_KEY=<creds that can read and write your buckets> -a $app
Any spark code you run from this app wil have the spark context configured with those credentials, so you should be able to access