API

The Spark bolt array implements the core API, as well as the extra methods highlighted here. Some are used to implement the core methods (e.g. swap) and may be of interest to developers. Others are Spark-specific convienience functions (e.g. cache). Full documentation can be found below.

stack([size]) Aggregates records of a distributed array.
chunk([size, axis]) Chunks records of a distributed array.
swap(kaxes, vaxes[, size]) Swap axes from keys to values.
cache() Cache the underlying RDD in memory.
unpersist() Remove the underlying RDD from memory.
toarray() Returns the contents as a local array.
tordd() Return the underlying RDD of the bolt array.
split Axis at which the array is split into keys/values.

Detailed API

class bolt.spark.array.BoltArraySpark(rdd, shape=None, split=None, dtype=None)[source]
T[source]

Transpose by reversing the order of the axes.

astype(dtype, casting='unsafe')[source]

Cast the array to a specified type.

Parameters:

dtype : str or dtype

Typecode or data-type to cast the array to (see numpy)

cache()[source]

Cache the underlying RDD in memory.

chunk(size='150', axis=None)[source]

Chunks records of a distributed array.

Chunking breaks arrays into subarrays, using an specified size of chunks along each value dimension. Can alternatively specify an average chunk byte size (in megabytes) and the size of chunks (as ints) will be computed automatically.

Parameters:

size : tuple, int, or str, optional, default = “150”

A string giving the size in megabytes, or a tuple with the size of chunks along each dimension.

axis : int or tuple, optional, default = None

One or more axis to chunk array along, if None will use all axes,

Returns:

ChunkedArray

concatenate(arry, axis=0)[source]

Join this array with another array.

Returns:BoltArraySpark
display()[source]

Show a pretty-printed representation of this BoltArrayLocal.

dtype[source]

Data-type of array.

filter(func, axis=(0, ))[source]

Filter array along an axis.

Applies a function which should evaluate to boolean, along a single axis or multiple axes. Array will be aligned so that the desired set of axes are in the keys, which may incur a swap.

Parameters:

func : function

Function to apply, should return boolean

axis : tuple or int, optional, default=(0,)

Axis or multiple axes to filter along.

Returns:

BoltArraySpark

first()[source]

Return the first element of an array

keys[source]

Returns a restricted keys.

map(func, axis=(0, ), value_shape=None)[source]

Apply a function across an axis.

Array will be aligned so that the desired set of axes are in the keys, which may incur a swap.

Parameters:

func : function

Function of a single array to apply

axis : tuple or int, optional, default=(0,)

Axis or multiple axes to apply function along.

value_shape : tuple, optional, default=None

Known shape of values resulting from operation

Returns:

BoltArraySpark

max(axis=None, keepdims=False)[source]

Return the maximum of the array over the given axis.

Parameters:

axis : tuple or int, optional, default=None

Axis to compute statistic over, if None will compute over all axes

keepdims : boolean, optional, default=False

Keep axis remaining after operation with size 1.

mean(axis=None, keepdims=False)[source]

Return the mean of the array over the given axis.

Parameters:

axis : tuple or int, optional, default=None

Axis to compute statistic over, if None will compute over all axes

keepdims : boolean, optional, default=False

Keep axis remaining after operation with size 1.

min(axis=None, keepdims=False)[source]

Return the minimum of the array over the given axis.

Parameters:

axis : tuple or int, optional, default=None

Axis to compute statistic over, if None will compute over all axes

keepdims : boolean, optional, default=False

Keep axis remaining after operation with size 1.

ndim[source]

Number of dimensions.

reduce(func, axis=(0, ), keepdims=False)[source]

Reduce an array along an axis.

Applies a commutative/associative function of two arguments cumulatively to all arrays along an axis. Array will be aligned so that the desired set of axes are in the keys, which may incur a swap.

Parameters:

func : function

Function of two arrays that returns a single array

axis : tuple or int, optional, default=(0,)

Axis or multiple axes to reduce along.

Returns:

BoltArraySpark

reshape(*shape)[source]

Return an array with the same data but a new shape.

Currently only supports reshaping that independently reshapes the keys, or the values, or both.

Parameters:

shape : tuple of ints, or n ints

New shape

shape[source]

Size of each dimension.

size[source]

Total number of elements.

split[source]

Axis at which the array is split into keys/values.

squeeze(axis=None)[source]

Remove one or more single-dimensional axes from the array.

Parameters:

axis : tuple or int

One or more singleton axes to remove.

stack(size=None)[source]

Aggregates records of a distributed array.

Stacking should improve the performance of vectorized operations, but the resulting StackedArray object only exposes a restricted set of operations (e.g. map, reduce). The unstack method can be used to restore the full bolt array.

Parameters:

size : int, optional, default=None

The maximum size for each stack (number of original records), will aggregate groups of records per partition up to this size, if None will aggregate all records on each partition.

Returns:

StackedArray

std(axis=None, keepdims=False)[source]

Return the standard deviation of the array over the given axis.

Parameters:

axis : tuple or int, optional, default=None

Axis to compute statistic over, if None will compute over all axes

keepdims : boolean, optional, default=False

Keep axis remaining after operation with size 1.

sum(axis=None, keepdims=False)[source]

Return the sum of the array over the given axis.

Parameters:

axis : tuple or int, optional, default=None

Axis to compute statistic over, if None will compute over all axes

keepdims : boolean, optional, default=False

Keep axis remaining after operation with size 1.

swap(kaxes, vaxes, size='150')[source]

Swap axes from keys to values.

This is the core operation underlying shape manipulation on the Spark bolt array. It exchanges an arbitrary set of axes between the keys and the valeus. If either is None, will only move axes in one direction (from keys to values, or values to keys). Keys moved to values will be placed immediately after the split; values moved to keys will be placed immediately before the split.

Parameters:

kaxes : tuple

Axes from keys to move to values

vaxes : tuple

Axes from values to move to keys

size : tuple or int, optional, default = “150”

Can either provide a string giving the size in megabytes, or a tuple with the number of chunks along each value dimension being moved

Returns:

BoltArraySpark

swapaxes(axis1, axis2)[source]

Return the array with two axes interchanged.

Parameters:

axis1 : int

The first axis to swap

axis2 : int

The second axis to swap

toarray()[source]

Returns the contents as a local array.

Will likely cause memory problems for large objects.

tolocal()[source]

Returns a local bolt array by first collecting as an array.

tordd()[source]

Return the underlying RDD of the bolt array.

transpose(*axes)[source]

Return an array with the axes transposed.

This operation will incur a swap unless the desiured permutation can be obtained only by transpoing the keys or the values.

Parameters:

axes : None, tuple of ints, or n ints

If None, will reverse axis order.

unpersist()[source]

Remove the underlying RDD from memory.

var(axis=None, keepdims=False)[source]

Return the variance of the array over the given axis.

Parameters:

axis : tuple or int, optional, default=None

Axis to compute statistic over, if None will compute over all axes

keepdims : boolean, optional, default=False

Keep axis remaining after operation with size 1.