Rackspace Contributes Cassandra CQL Driver For Node.js

Filed in Product & Development by Gary Dusbabek | April 25, 2012 9:00 am

One of the great technology enablers of the last decade has been open-source software. It encourages commercial software developers to create better products by fostering processes that cut across companies, utilizing the best talent from each. Additionally, competition from open-source projects creates an ecosystem that gives consumers more choice in the software they use to power their enterprises.

Rackspace supports open-source software development in several ways, the most obvious being OpenStack[1]. Rackspace believes that software should be developed in the open when it makes sense. This post describes some of the work we have been doing within the Node.js[2] and Cassandra[3] communities.

Rackspace has a long history of working within the Cassandra community. Rackspace engineers participated in crucial stages of its development by moving to get Cassandra into the Apache Incubator and were instrumental in further developing it into a top-level Apache project. Cassandra is used today by dozens of companies and is one of the leaders in the “NoSQL” space.

The release of Cassandra Query Language (CQL), initially developed by a Rackspace engineer, coincided with the release of Cassandra 0.8. Rackspace subsequently supported substantial development for the Java, Python and Node.js CQL drivers.

CQL aims to be an easier way to read and write to Cassandra databases. CQL was designed to closely resemble SQL, easing the burden of developers being exposed to Cassandra for the first time.

When Rackspace started developing Cloud Monitoring[4], we decided to use Node.js for our API service endpoints. These services would need to communicate with a Cassandra storage cluster. We made the decision to use CQL even though it was still under active development and no Node.js driver existed for it at the time. We figured it would be easier to implement the driver than to use the existing Cassandra Thrift client. Also, we knew that we would be able to contribute the driver back to the project so that other Node.js users would benefit.

The main driver repository is on Github[5]. A mirror is also maintained at Apache Extras[6]. Currently CQL 2 is supported, but we plan to support CQL 3 in the future.

node-cassandra-client Examples

The rest of this post is intended as a node-cassandra-client primer of sorts. It contains examples of how to connect, update and select results from your Cassandra database in Javascript using node-cassandra-client.

First we need to create a keyspace. I’m going to use cqlsh, which ships with Cassandra, to do that. I’ve got Cassandra set up to listen to port 19170.

$ cqlsh localhost 19170
cqlsh>

After that, you can paste the following CQL into cqlsh to set up keyspaces and column families.

CREATE KEYSPACE EXAMPLE_KS WITH strategy_class = SimpleStrategy AND \
strategy_options:replication_factor=1;
USE EXAMPLE_KS;

This column family is a simple column family where keys and column names are strings, but column values are expected to be integers:

CREATE COLUMNFAMILY simple_cf (
    KEY ascii PRIMARY KEY
) WITH comparator=text AND default_validation=varint;

Cassandra supports more complex types too. This column family has string keys, UUID column names and binary column values:

CREATE COLUMNFAMILY complex_cf (
   KEY ascii PRIMARY KEY
) WITH comparator=uuid AND default_validation=blob;

Of course, you are free to specify every column as well. This column family specifies string keys with three columns that are obviously typed.

CREATE COLUMNFAMILY very_complex_cf (
    KEY ascii PRIMARY KEY,
    string_col text,
    uuid_col uuid,
    int_col varint
);

The remaining examples all use the very_complex_cf column family.

It is easy to connect to a Cassandra cluster using node-cassandra-client. You simply create an options hash, pass it to the Connection constructor and call connect().

var Connection = require('cassandra-client').Connection; 
function doSimpleConnect(callback) {
 // these are the connection parameters you need to connect to Cassandra.
 var connectionOptions = {
   host: '127.0.0.1',
   port: 19170,
   keyspace: 'EXAMPLE_KS',
   use_bigints: false
 };

 var con = new Connection(connectionOptions);
 con.connect(function(err) {
   // if err != null, something bad happened. 
   // else, assume all is good.  your connection is ready to use.
   if (!err) {
     // close the connection and return to caller.
     con.close(callback);
   } else {
     // no need to close, just return to caller.
     callback(err);
   }
 });
}

NOTE: use_bigints is a flag that tells the driver to wrap numerical results in wrapper objects. This enables you to get around numerical limitations in Javascript[7].

node-cassandra-client also supports connection pools. After establishing a connection pool, the query API is identical to regular connections, so you will not have to worry about whether you are using one or the other. 

var PooledConnection = require('cassandra-client').PooledConnection; 
function doPoolConnect(callback) {
 var conOptions = { hosts: ['127.0.0.1:19170'],
                    keyspace: 'EXAMPLE_KS',
                    use_bigints: false };
 var con = new PooledConnection(conOptions);
 con.shutdown(function() {
   // no error object by design.
   callback();
 });
}

While the query API between Connection and PooledConnection is identical, there are a few differences in construction and tear-down that should be noted:

1) The constructor hash for Connection accepts a single string for host, while in PooledConnection it expects a list of host:port strings.
2) Terminating a Connection is done via Connection.close(), while terminating a PooledConnection is done via PooledConnection.shutdown(). Each method accepts a callback.
3) PooledConnections have no connect() method. You are free to start executing queries as soon as it is constructed.

Everything else is done using the execute() method of either a Connection or a PooledConnection instance. It takes three parameters:

1) a parameterized CQL statement,
2) an array of the parameters that are appropriately typed,
3) a callback accepting (err) for UPDATEs, or (err, rows) for SELECTs.

Let’s insert some data into very_complex_cf. Some of these examples use the node-async[8] library to avoide deeply nested callbacks.

function doComplexInsert(callback) {
 var conOptions = { hosts: ['127.0.0.1:19170'],
                    keyspace: 'EXAMPLE_KS',
                    use_bigints: false };
 var con = new PooledConnection(conOptions);
 var cql = 'UPDATE very_complex_cf SET ?=?, ?=?, ?=? where KEY=?';
 var params = ['string_col', 'string_value',
               'uuid_col', '6f8483b0-65e0-11e0-0000-fe8ebeead9fe',
               'int_col', 42,
               'complex_insert_row'];
 con.execute(cql, params, function(err) {
   // demonstrates use of a callback.  A simplification would have been:
   // con.execute(cql, params, callback);
   if (err) {
     console.log(err);
   }
   con.shutdown(callback);
 });
}

You can even batch multiple updates in a single statement:

function doComplexBatchInsert(callback) { 

 var conOptions = { hosts: ['127.0.0.1:19170'],
                    keyspace: 'EXAMPLE_KS',
                    use_bigints: false }; 
 var con = new PooledConnection(conOptions);
 con.execute('BEGIN BATCH USING CONSISTENCY ONE \
              UPDATE very_complex_cf SET ?=?, ?=?, ?=? where KEY=?; \
              UPDATE very_complex_cf SET ?=?, ?=?, ?=? where KEY=?; \
              UPDATE very_complex_cf SET ?=?, ?=?, ?=? where KEY=?; \
              APPLY BATCH;',
              [ // first row
                'string_col', 'value for string col in row_1',
                'uuid_col', '6f8483b0-65e0-11e0-0000-fe8ebeead9ff',
                'int_col', 25,
                'complex_batch_row_1',      
                 // second row
                'string_col', 'value for string col in row_1',
                'uuid_col', '6f8483b0-65e0-11e0-0000-fe8ebeeada00',
                'int_col', 26,
                'complex_batch_row_4',

                // third row
                'string_col', 'value for string col in row_1',
                'uuid_col', '6f8483b0-65e0-11e0-0000-fe8ebeeada02',
                'int_col', 27,
                'complex_batch_row_3'
              ],
              function(err) {
                if (err) {
                  console.log(err);
                }
                con.shutdown(callback);
             }
  );
}

A simple SELECT statement works like this:

function doSelectAll(callback) {
 var conOptions = { hosts: ['127.0.0.1:19170'],
                    keyspace: 'EXAMPLE_KS',
                    use_bigints: false };
 var con = new PooledConnection(conOptions);
 con.execute('SELECT * from very_complex_cf', [], function(err, rows) {
   if (!err) {
     console.log(rows.length); // should be 4 at this point.
     rows.forEach(function(row) {
       console.log(row.key);

       // access column names and values by index.
       for (var i = 0; i < row.colCount(); i++) {
         console.log(row.cols[i].name);
         console.log(row.cols[i].value);
       }

       // access column values by hash.
       console.log(row.colHash['string_col']); // it's a string        
       // an instance of require('cassandra-client').UUID
       console.log(row.colHash['uuid_col']);
       console.log(row.colHash['int_col']); // it's a number.
     });
   }
   con.shutdown(callback);
 });
}

The CQL could have been altered to return a single row or individual columns:

function doSelectiveSelect(callback) {
 var conOptions = { hosts: ['127.0.0.1:19170'],
                    keyspace: 'EXAMPLE_KS',
                    use_bigints: false };
 var con = new PooledConnection(conOptions);
 con.execute(
   'SELECT uuid_col FROM very_complex_cf WHERE KEY=?',
   ['complex_batch_row_1'],
   function(err, rows) {
     if (!err) {
       // rows.length should == 1.
       var row = rows[0];
       console.log(row.key);
       console.log(row.colCount()); // should be 1 also.
       // getting the values out is an exercise for the reader.
     }
     con.shutdown(callback);
 });
}

node-cassandra-client also has support for UUID and binary column types. Using the complex_cf columnfamily:

function doComplexColumnNames(callback) {
 var conOptions = { hosts: ['127.0.0.1:19170'],
                          keyspace: 'EXAMPLE_KS',
                          use_bigints: false };
 var con = new PooledConnection(conOptions);
 var updateArgs = ['6f8483b0-65e0-11e0-0000-fe8ebeead9ff',
 new Buffer([1,2,3,4,5]),
                   'uuid_row_1' ];
 async.series([
   function insert(callback) {
     con.execute('UPDATE complex_cf set ?=? where KEY=?', updateArgs, callback);
   },
   function select(callback) {
     con.execute('SELECT * from complex_cf WHERE KEY=?', ['uuid_row_1'], 
     function(err, rows) {
       if (!err) {
         // rows.length should == 1.

         // first access the col by index (when you know it).
         // col name should be an instance of UUID.
         console.log(rows[0].cols[0].name);
         // col value should be <Buffer 01 02 03 04 05>
         console.log(rows[0].cols[0].value);

         // if you don't know the col index, you need to use the col hash.
         // of course, this assumes you know the column name.
         // in this case, it should point to the same buffer already referenced.
         console.log(rows[0].colHash['6f8483b0-65e0-11e0-0000-fe8ebeead9ff']);
       }
       callback(err);
     });
   }
 ],
 function(err) {
   if (err) {
     console.log(err);
   }
   con.shutdown(callback);
 });
}

As demonstrated, when working with results from a SELECT query, you can access columns by index or by column name. If you access columns by name, you need to use the string version of the column name.

NOTE: All the code used in these examples can be found in this gist[9].

Conclusion

Rackspace understands open-source development. Development on node-cassandra-client is ongoing and patches are always welcome. In fact, we regularly push new releases to the npm registry[10].

If you think this kind of work would be challenging and fun, and enjoy contributing to open-source projects, we are always looking for talented engineers[11].

Endnotes:
  1. OpenStack: http://www.openstack.org/
  2. Node.js: http://nodejs.org/
  3. Cassandra: http://cassandra.apache.org/
  4. Cloud Monitoring: http://../../cloud/cloud_hosting_products/monitoring/
  5. Github: https://github.com/racker/node-cassandra-client
  6. Apache Extras: http://code.google.com/a/apache-extras.org/p/cassandra-node/
  7. numerical limitations in Javascript: http://www.jwz.org/blog/2010/10/every-day-i-learn-something-new-and-stupid/
  8. node-async: https://github.com/caolan/async
  9. this gist: https://gist.github.com/2369391
  10. npm registry: http://search.npmjs.org/#/cassandra-client
  11. talented engineers: http://jobs.rackspace.com/search?q=developer

Source URL: http://www.rackspace.com/blog/rackspace-contributes-cassandra-cql-driver-for-node-js/