Installation¶
Getting started¶
The core of Bolt is pure Python and the only dependency is numpy, so installation is straightforward. Obtain Python 2.7+ or 3.4+ (we strongly recommend using Anaconda), and then install with pip
$ pip install bolt-python
To use Bolt with one of its backends, follow the instructions below.
If you just want to play around with Bolt, try the live notebooks which use Docker and tmpnb to generate temporary interactive notebook environments with all dependencies loaded. The same notebooks are available in this repo.
Backends¶
Local¶
The local backend just uses numpy, so nothing special required:
>>> from bolt import ones
>>> a = ones((2, 3, 4))
>>> a.shape
(2, 3, 4)
Spark¶
Bolt offers easy integration with Spark. Rather than make Spark a hard dependency, or add complex custom executables, using Bolt with Spark just requires that an existing valid SparkContext has been defined, either within an interactive notebook or inside an application. We cover the basics of a local installation here, but consult the official documentation for more information, especially for cluster deployment.
For local testing, the easiest way to get Spark download a prepackaged version here (get version 1.4+, compiled for any version of Hadoop), or if you are on Mac OS X, you can install using homebrew:
$ brew install apache-spark
Then just find and run the pyspark executable in the bin folder of your Spark installation.
With Spark installed and deployed, you can launch through the pyspark executable at which point a SparkContext will already be defined as sc. To use it with Bolt pass sc as a constructor input:
>>> from bolt import ones
>>> a = ones((100, 20), sc)
>>> a.shape
(100, 20)
If you write a Spark application in Python and submit it with spark-submit, you would define a SparkContext within your application and then use it similarly with Bolt
>>> from pyspark import SparkContext
>>> sc = SparkContext(appName='test', master='local')
>>> from bolt import ones
>>> a = ones((100, 20), sc)
>>> a.shape
(100, 20)
If you are using Spark on a cluster, you just need to run
$ pip install bolt-python
on all the of the cluster nodes.
Docker image¶
We provide a Docker image with Bolt and its backends installed and configured, alongside an example Jupyter notebook. This is a great way to try out Bolt. The docker file is on GitHub and the image is hosted on Dockerhub. To run the image on OS X follow these instructions:
- Download and install boot2docker (if you don’t have it already)
- Launch the boot2docker application from your Applications folder
- Type docker run -i -t -p 8888:8888 freemanlab/bolt
- Point a web browser to http://192.168.59.103:8888/