CQL with Cassandra Driver for NodeJS

I’ve recently been trying to carve out some time to learn JavaScript and Node.  My background is firmly in the classroom of Java and the Enterprise, with recent years focused on NoSQL technology.  Having finally decided to go and see what all the noise in the playground of JavaScript is all about I decided to share some sample code using the DataStax Cassandra driver for NodeJS.  Here is what I learnt.

Here are the CQL statements you should be able to run directly in cqlsh for this exercise:

DROP KEYSPACE IF EXISTS nodespace;

CREATE KEYSPACE IF NOT EXISTS nodespace WITH replication = {'class' : 'SimpleStrategy', 'replication_factor' : 1};

CREATE TABLE nodespace.sample (
    primaryRowKey text,
    secondaryRowKey text,
    primaryColumnKey timeuuid,
    secondaryColumnKey text,
    atime timestamp,
    aboolean boolean,
    abigint bigint,
    auuid uuid,
    adouble double,
    adecimal decimal,
    aset set,
    amap map <text,text>,
    alist list,
    ablob blob,
    PRIMARY KEY ((primaryRowKey, secondaryRowKey), primaryColumnKey, secondaryColumnKey)
);

INSERT INTO nodespace.sample (
    primaryRowKey, secondaryRowKey,
    primaryColumnKey, secondaryColumnKey,
    atime, aboolean, abigint,
    auuid, adouble, adecimal,
    aset, amap, alist
) VALUES (
    'row1','part1',
    now(),'group1',
    dateof(now()),true,876543210987654321,
    uuid(),1.5,1.1234567890123456789,
    {'abc', 'def', 'ghi'}, {'key1': 'value1', 'key2': 'value2'}, ['xxx', 'jjj', 'ttt']
);

SELECT * FROM nodespace.sample;

The aim of the exercise was to convert the above into a NodeJS process where values for all the data come from JavaScript objects in a prepared statement.

Step 1) Connect through the driver to our localhost instance.

Getting a connection, creating a keyspace, creating a column family, were all straight forward enough.  Following the documentation I was able to embed this sequence into the code such that I could evolve the scope in a repeatable way.  Here is the connection call for brevity (assuming you’ve already run the CQL above to create the keyspace and column family this should be enough):

var cassandra = require('cassandra-driver');
var client = new cassandra.Client({contactPoints: ['localhost']});
 
client.connect(function (err, result) {
    ....
});

Step 2) Create a prepared statement to insert our row of data:

    var cqlInsertSampleData =
        "INSERT INTO nodespace.sample (" +
        "   primaryRowKey, secondaryRowKey, " +
        "   primaryColumnKey, secondaryColumnKey," +
        "   atime, aboolean, abigint," +
        "   auuid, adouble, adecimal," +
        "   aset, amap, alist" +
        ") VALUES (?,?, ?,?, ?,?,?, ?,?,?, ?,?,?)";
 
    var aset = ["abc", "def", "ghi"];
    var alist = ["xxx", "jjj", "ttt"];
    var amap = {"key1": "value1", "key2": "value2"};
 
    client.execute(
        cqlInsertSampleData,
        [   'row1', 'part1', cassandra.types.timeuuid(), 'columnGroup',
            new Date(), true, new cassandra.types.Long.fromString("876543210987654321"),
            cassandra.types.uuid(), 1.5, 1.1234567890123456789,
            aset, amap, alist
        ],
        {hints: [
            null, null, null, null,
            null, null, null,
            null, null, null,
            null, 'map', null
        ]},
        function (err, result) {
            if (err) {
                console.log('Insert Sample Data Failed: ' + JSON.stringify(err));
                // needed to add this to get the error message as sometimes stringify didn't include it
                console.log('Insert Sample Data Failed: ' + err.message);
            } else {
                console.log('Inserted Sample Data: ' + JSON.stringify(result));
                ...
            }
        }
    );

The optional ‘hints‘ are only necessary to tell the driver where the object is type ‘map‘. Without this hint (or indeed the entire ‘options’ argument) the driver blows with an error like this: Target data type could not be guessed, you must specify a hint for value…. It took a while to find this out because the JSON.stringify() didn’t output the ‘result.message’ properly (for some reason I’m not yet aware of), and because the sample code for NodeJS driver usage doesn’t go into handling of maps.  Some legacy code suggests you can inline this hint but that doesn’t work either.

Step 3)  Finally we can select the data back and convert the incoming results back into JavaScript objects

    var cqlSelectSampleData =
        "SELECT * FROM nodespace.sample " +
        "WHERE primaryRowKey = ? and secondaryRowKey = ?";
 
    client.execute(
        cqlSelectSampleData, ['row1','part1'],
        function (err, result) {
            if (err) {
                console.log('Select Sample Data Failed: ' + JSON.stringify(err));
                console.log('Select Sample Data Failed: ' + err.message);
            } else {
                console.log('Selected Sample Data: ' + JSON.stringify(result.rows[0]));
                // notice the names are all lowercase
                console.log('primaryRowKey      : ' + result.rows[0]['primaryrowkey']);
                console.log('secondaryRowKey    : ' + result.rows[0]['secondaryrowkey']);
                console.log('primaryColumnKey   : ' + result.rows[0]['primarycolumnkey']);
                console.log('secondaryColumnKey : ' + result.rows[0]['secondarycolumnkey']);
 
                console.log('atime              : ' + new Date(result.rows[0]['atime']));
                console.log('aboolean           : ' + result.rows[0]['aboolean']);
                console.log('abigint            : ' + result.rows[0]['abigint']);
 
                console.log('auuid              : ' + result.rows[0]['auuid']);
                console.log('adecimal           : ' + result.rows[0]['adecimal'].readDoubleBE(0));
                console.log('adouble            : ' + result.rows[0]['adouble']);
 
                console.log('aset               : ' + result.rows[0]['aset']);
                console.log('amap               : ' + JSON.stringify(result.rows[0]['amap']));
                console.log('alist              : ' + result.rows[0]['alist']);
                process.exit(0);
            }
        }
    );

One last trick…

Cassandra needs a ‘design for querying’ data model. It is therefore important to understand enough about how your data is partitioned and stored to get the best from Cassandra.

Here is the result of a call to list the data we have inserted above using the cassandra-cli tool. This tool will be deprecated in Cassandra v3 but I’ve found it very handy for showing the structure under the hood while migrating from our old Thrift code to CQL.

~/.../cassandra/bin > ./cassandra-cli
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 2.0.7
...
[default@unknown] use nodespace;
Authenticated to keyspace: nodespace
[default@nodespace] list sample;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: row1:part1
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:, value=, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:abigint, value=0c2a1b6801330cb1, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:aboolean, value=01, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:adecimal, value=3ff1f9add3746f66, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:adouble, value=3ff8000000000000, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:alist:d3e11c2071a711e4814d25b14fe8537a, value=787878, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:alist:d3e11c2171a711e4814d25b14fe8537a, value=6a6a6a, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:alist:d3e11c2271a711e4814d25b14fe8537a, value=747474, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:amap:6b657931, value=76616c756531, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:amap:6b657932, value=76616c756532, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:aset:616263, value=, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:aset:646566, value=, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:aset:676869, value=, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:atime, value=00000149d38036df, ...)
=> (name=d3e07fe0-71a7-11e4-b8d7-e5dfff278e51:columnGroup:auuid, value=d439e4257f95494a84f727b9a93abb3a, ...)

I deliberately used purposeful naming to help illustrate how your PRIMARY KEY alters the storage and should be optimised for query performance.

My primaryRowKey and secondaryRowKey are concatenated together (see RowKey: row1:part1). This means all data associated with that rowKey will be stored together on the same Cassandra cluster nodes. All columns meanwhile, start with the concatenation of my primaryColumnKey (a timeuuid value) and the secondaryColumnKey (columnGroup). This allows you to query the data by time based range querying. Lastly, notice how the other columns are storing data, especially the collections that get broken out into columns.

There are better and more comprehensive materials on this subject but I thought it might be helpful to provide this little trick for viewing your data model under the hood.

The full code for this exercise is available from my GitHub node-sandpit repository:
GitHub Code: CQL with Cassandra Driver for NodeJS

It's only fair to share...Tweet about this on TwitterShare on LinkedInEmail this to someone

Leave a Reply

Your email address will not be published. Required fields are marked *