Table Of Contents

Previous topic

Installing

Next topic

Troubleshooting

This Page

Tutorial

This tutorial is intended as an introduction to working with Cassandra and phpcassa.

Prerequisites

Before we start, make sure that you have phpcassa installed.

This tutorial also assumes that a Cassandra instance is running on the default host and port. Read the instructions for getting started with Cassandra. You can start Cassandra like so:

$ pwd
~/cassandra
$ bin/cassandra -f

Creating a Keyspace and Column Families

We need to create a keyspace and some column families to work with.

The cassandra-cli utility is included with Cassandra; it allows you to create and modify the schema, explore or modify data, and examine a few things about your cluster. Here’s how to create the keyspace and column family we need for this tutorial:

user@~ $ cassandra-cli
Welcome to cassandra CLI.

Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit.
[default@unknown] connect localhost/9160;
Connected to: "Test Cluster" on localhost/9160

[default@unknown] create keyspace Keyspace1
    with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options = {replication_factor:1};
4f9e42c4-645e-11e0-ad9e-e700f669bcfc
Waiting for schema agreement...
... schemas agree across the cluster

[default@unknown] use Keyspace1;
Authenticated to keyspace: Keyspace1

[default@Keyspace1] create column family ColumnFamily1;
632cf985-645e-11e0-ad9e-e700f669bcfc
Waiting for schema agreement...
... schemas agree across the cluster
[default@Keyspace1] quit;
user@~ $

This connects to a local instance of Cassandra and creates a keyspace named ‘Keyspace1’ with a column family named ‘ColumnFamily1’.

Autoload and Imports

You can bootstrap the PHP autoloader to easily import phpcassa classes by requiring or including lib/phpcassa/autoload.php.

To start the tutorial, require the bootstrap file in one way or another and import a couple of classes:

<?php
    require('lib/phpcassa/autoload.php');

    use phpcassa\ColumnFamily;
    use phpcassa\ColumnSlice;
    use phpcassa\Connection\ConnectionPool;

    // tutorial code starts here
?>

Making a Connection

The first step when working with phpcassa is to create a ConnectionPool which will connect to the Cassandra instance(s):

$pool = new ConnectionPool("Keyspace");

The above code will connect on the default host and port. We can also specify a list of ‘host:port’ combinations like this:

$servers = array("192.168.2.1:9160", "192.168.2.2:9160");
$pool = new ConnectionPool("Keyspace1", $servers);

If omitted, the port defaults to 9160.

Getting a ColumnFamily

A column family is a collection of rows and columns in Cassandra, and can be thought of as roughly the equivalent of a table in a relational database. We’ll use one of the column families that were already included in the schema file:

$column_family = new ColumnFamily($pool, 'ColumnFamily1');

Inserting Data

To insert a row into a column family we can use the ColumnFamily::insert() method:

$column_family->insert('row_key', array('col_name' => 'col_val'));

We can also insert more than one column at a time:

$column_family->insert('row_key', array('name1' => 'val1', 'name2' => 'val2'));

And we can insert more than one row at a time:

$row1 = array('name1' => 'val1', 'name2' => 'val2');
$row2 = array('foo' => 'bar');
$column_family->batch_insert(array('row1' => $row1, 'row2' => $row2);

Getting Data

There are many more ways to get data out of Cassandra than there are to insert data.

The simplest way to get data is to use ColumnFamily::get()

$column_family->get('row_key');
// returns: array('colname' => 'col_val')

Without any other arguments, ColumnFamily::get() returns every column in the row (up to $column_count, which defaults to 100). If you only want a few of the columns and you know them by name, you can specify them using a $column_names argument:

$column_names = array('name1', 'name2');
$column_family->get('row_key', $column_slice=null, $column_names=$column_names);
// returns: array('name1' => 'foo', 'name2' => 'bar')

We may also get a slice (or subrange) or the columns in a row. To do this, we need to make a ColumnSlice object.

The first two parameters are $start and $finish. One or both of these may be left as empty strings to allow the slice to extend to one or both ends of the row.

Assuming we’ve inserted several columns with names ‘1’ through ‘9’, we can do the following:

$slice = new ColumnSlice('5', '7');
$column_family->get('row_key', $slice);
// returns: array('5' => 'foo', '6' => 'bar', '7' => 'baz')

There are also two ways to get multiple rows at the same time. The first is to specify them by name using ColumnFamily::multiget()

$column_family->multiget(['row_key1', 'row_key2']);
// returns: array('row_key1' => array('name' => 'val'), 'row_key2' => array('name' => 'val'))

The other way is to get a range of keys at once by using ColumnFamily::get_range() The parameter $key_finish is also inclusive here, too. Assuming we’ve inserted some rows with keys ‘row_key1’ through ‘row_key9’, we can do this:

$rows = $column_family->get_range($key_start='row_key5', $key_finish='row_key7');
// returns an Iterator over:
// array('row_key5' => array('name' => 'val'),
//       'row_key6' => array('name' => 'val'),
//       'row_key7' => array('name' => 'val'))

foreach($rows as $key => $columns) {
    // Do stuff with $key or $columns
    Print_r($columns);
}

Note

Cassandra must be using an OrderPreservingPartitioner for you to be able to get a meaningful range of rows; the default, RandomPartitioner, stores rows in the order of the MD5 hash of their keys. See http://www.datastax.com/docs/1.0/cluster_architecture/partitioning.

It’s also possible to specify a set of columns or a slice for multiget() and get_range() just like we did for get().

Removing Data

You may remove data from a column family with ColumnFamily::remove().

You can remove an entire row at once:

$column_family->remove('key');

Or a specific set of columns from a row:

$column_family->remove('key', array("col1", "col2");

You cannot remove a slice of columns from a row.

Counting

If you just want to know how many columns are in a row, you can use ColumnFamily::get_count():

$column_family->get_count('row_key');
// returns: 3

If you only want to get a count of the number of columns that are inside of a slice or have particular names, you can do that as well:

$column_names=array('foo', 'bar');
$column_family->get_count('row_key', $column_slice=null, $column_names=$column_names);
// returns: 2

$slice = new ColumnSlice('foo');
$column_family->get_count('row_key', $slice);
// returns: 3

You can also do this in parallel for multiple rows using ColumnFamily::multiget_count():

$column_family->multiget_count(array('fib0', 'fib1', 'fib2', 'fib3', 'fib4'));
// returns: array('fib0' => 1, 'fib1' => 1, 'fib2' => 2, 'fib3' => 3, 'fib4' => 5)
$names = array('col1', 'col2', 'col3');
$column_family->multiget_count(array('fib0', 'fib1', 'fib2', 'fib3', 'fib4'),
                               $column_slice=null, $column_names=$names);
// returns: array('fib0' => 1, 'fib1' => 1, 'fib2' => 2, 'fib3' => 3, 'fib4' => 3)
$slice = new ColumnSlice('col1', 'col3');
$column_family->multiget_count(array('fib0', 'fib1', 'fib2', 'fib3', 'fib4'), $slice);
// returns: array('fib0' => 1, 'fib1' => 1, 'fib2' => 2, 'fib3' => 3, 'fib4' => 3)

Super Columns

Cassandra allows you to group columns in “super columns” when using super column families. You can create a super column family using cassandra-cli like this:

[default@Keyspace1] create column family Super1 with column_type=Super;
632cf985-645e-11e0-ad9e-e700f669bcfc

To use a super columns in phpcassa, you need to create an instance of phpcassa\SuperColumnFamily:

use phpcassa\SuperColumnFamily;


$column_family = new SuperColumnFamily($conn, 'Super1');
$column_family->insert('row_key', array('supercol_name' => array('col_name' => 'col_val')));
$column_family->get('row_key');
// returns: array('supercol_name' => ('col_name' => 'col_val'))
$column_family->remove_super_column('row_key', 'supercolumn_name');

Typed Column Names and Values

In Cassandra, you can specify a comparator type for column names and a validator type for row keys and column values.

There are several types supported by default.

The column name comparator types affect how columns are sorted within a row. You can use these with standard column families as well as with super column families; with super column families, the subcolumns may even have a different comparator type.

Cassandra requires clients to pack these types into a binary format it can understand. When phpcassa sees that a column family uses these types, it knows to pack and unpack these data types automatically. So, if we want to write to the StandardInt column family, we can do the following:

$column_family = new ColumnFamily($conn, 'StandardInt');
$column_family->insert('row_key', array(42 => 'some_val'));
$column_family->get('row_key')
// returns: array(42 => 'some_val')

Notice that 42 is an integer here, not a string.

As mentioned above, Cassandra also offers validators on column values with the same set of types. Validators can be set for an entire column family, for individual columns, or both:

$column_family = new ColumnFamily($pool, 'Users')
$column_family->insert($user_id, array('name' => 'joe', 'age' => 23));
$column_family->get('row_key');
// returns: array('name' => 'joe', 'age' => 23)

Of course, if phpcassa’s automatic behavior isn’t working for you, you can turn it off with ColumnFamily::$autopack_names and the other $autopack attributes.

$column_family = new ColumnFamily($conn, 'ColumnFamily1');
$column_family->$autopack_names=False;
$column_family->$autopack_values=False;

Indexes

Cassandra supports built-in secondary indexes, which allow you to efficiently get only rows which match a certain expression.

In order to use ColumnFamily::get_indexed_slices() , we need to create an IndexClause which contains a list of IndexExpression objects.

Suppose we are only interested in rows where ‘birthdate’ is 1984. We might do the following:

use phpcassa\SystemManager;
use phpcassa\Index\IndexExpression;
use phpcassa\Index\IndexClause;
use phpcassa\Schema\DataType\LongType;

$sys = new SystemManager();
$sys->create_column_family('Keyspace', 'Users');
$sys->create_index('Keyspace', 'Users', 'birthdate', LongType);

$column_family = new ColumnFamily($conn, 'Users');
$column_family->insert('winston', array('birthdate' => 1984));

$index_exp = new IndexExpression('birthdate', 1984);
$index_clause = new IndexClause(array($index_exp));
$rows = $column_family->get_indexed_slices($index_clause);
// returns an Iterator over:
//    array('winston smith' => array('birthdate' => 1984))

foreach($rows as $key => $columns) {
    // Do stuff with $key and $columns
    Print_r($columns);
}

Although at least one IndexExpression in every clause must use an equality match against an indexed column, you may also have other expressions which are apply to non-indexed columns or use comparators other than “EQ”.