!!! Cassandra

Just some free format stuff for a small studying / experiment with NoSQL stuff.

[{TableOfContents }]


!! Resources
* [C* 2012: Building a Cassandra Based Application from Scratch (Patrick McFadin, Hobsons)
|http://www.youtube.com/watch?v=myka6Elo-dM]
* [Introduction to NoSQL by Martin Fowler|http://www.youtube.com/watch?v=qI_g07C_Q5I]
* [http://cassandra.apache.org/]
* [Datastax Documentation|http://www.datastax.com/docs/1.2/index]
!! Install/config

! lxc

* /etc/default/lxc => change subnet from 10.0.3 to 10.0.4 (10.0.3 is already in use somewhere else)
* add static route in wireless router (10.0.4.0/8 => via 10.0.0.164)
* adjust {{ /etc/network/interfaces }} of container to:
{{{
auto eth0
#iface eth0 inet dhcp
iface eth0 inet static
        address 10.0.4.10
        netmask 255.255.255.0
        network 10.0.4.0
        broadcast 10.0.4.255
        gateway 10.0.4.1
        post-up route add default gw 10.0.4.1 dev eth0
        # dns-* options are implemented by the resolvconf package, if installed
        dns-nameservers 213.197.28.3 213.197.30.28
        dns-search computerhok.nl
}}}
* lxc-host ubuntu1 => 10.0.4.11 (user=ubuntu, password=ubuntu)


! cassandra

* useradd cssndra && mkdir /home/cssndra && chown -R cssndra /home/cssndra + change {{/etc/passwd}} => sh => bash
* wget 'http://mirrors.supportex.net/apache/cassandra/1.2.4/apache-cassandra-1.2.4-bin.tar.gz'
* apt-get install openjdk-7-jdk
* sudo mkdir /opt/apache-cassandra-1.2.4 && sudo chown cssndra /opt/apache-cassandra-1.2.4 && sudo ln -s /opt/apache-cassandra-1.2.4 /opt/cassandra
* cssndra@ubuntu1:~$ cd /opt && tar -xf ~/apache-cassandra-1.2.4-bin.tar
* ls -l
{{{
cssndra@ubuntu1:/opt/cassandra$ ls -l
total 248
-rw-r--r-- 1 cssndra cssndra 152928 Apr  8 19:21 CHANGES.txt
-rw-r--r-- 1 cssndra cssndra  11609 Apr  8 19:21 LICENSE.txt
-rw-r--r-- 1 cssndra cssndra  47580 Apr  8 19:21 NEWS.txt
-rw-r--r-- 1 cssndra cssndra   1820 Apr  8 19:21 NOTICE.txt
-rw-r--r-- 1 cssndra cssndra   3569 Apr  8 19:21 README.txt
drwxr-xr-x 2 cssndra cssndra   4096 May 15 21:43 bin
drwxr-xr-x 2 cssndra cssndra   4096 May 15 21:43 conf
drwxr-xr-x 2 cssndra cssndra   4096 May 15 21:43 interface
drwxr-xr-x 4 cssndra cssndra   4096 May 15 21:43 javadoc
drwxr-xr-x 3 cssndra cssndra   4096 May 15 21:43 lib
drwxr-xr-x 3 cssndra cssndra   4096 May 15 21:43 pylib
drwxr-xr-x 4 cssndra cssndra   4096 Apr  8 19:21 tools
}}}
* sudo mkdir -p /var/lib/cassandra/data /var/lib/cassandra/commitlog && sudo chown -R cssndra /var/lib/cassandra
* sudo mkdir /var/log/cassandra/ && sudo chown -R cssndra /var/log/cassandra    
* limit the heap size usage by editing conf/cassandra-env.sh : MAX_HEAP_SIZE="512M" HEAP_NEWSIZE="100M"
* edit {{./conf/cassandra.yaml}}: 
** change the ''listen_address'' to 10.0.4.11 , necessary for multinode cluster communication (as OS hostname/ip is not properly configured)
** change the ''rpc_address'' to 10.0.4.11 , necessary to make it reachable from non-localhost (as OS hostname/ip is not properly configured)
** change the seeds parameter from 127.0.0.1 to 10.0.4.11 (this first node becomes the seed node for all nodes)
** configure ''endpoint_snitch: GossipingPropertyFileSnitch''
* edit {{./conf/log4j-server.properties}} : remove logging to stdout
* edit {{./conf/cassandra-rackdc.properties}} : see [Cassandra#Cluster config]
* edit {{./conf/cassandra-env.sh}} (at the bottom of the file) : uncomment and fill in : '' JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=10.0.4.12" '' %%small (this makes it possible to run nodetool against remote hosts) %%
* ==> now first clone the VM : ''lxc-clone -o ubuntu1 -n ubuntu2'' , ubuntu2 has address 10.0.4.12
* continue with the first node and there start the thing with : ''./bin/cassandra -f ''
* add keyspace:
%%small {{{
cssndra@ubuntu1:/opt/cassandra$ ./bin/cassandra-cli
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.2.4

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] create keyspace DEMO with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1}; 
f243abc2-57dc-32b4-9390-beac4e988c5b
[default@unknown] use DEMO;
Authenticated to keyspace: DEMO
[default@DEMO] create column family Users with key_validation_class = 'UTF8Type' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type';
bb07d315-a824-3bbb-9b48-d759288e57b4
[default@DEMO] set Users[1234][name] = scott;
Value inserted.
Elapsed time: 63 msec(s).
[default@DEMO] set Users[1234][password] = tiger;
Value inserted.
Elapsed time: 3.96 msec(s).
[default@DEMO] get Users[1234];
=> (column=name, value=scott, timestamp=1368679313136000)
=> (column=password, value=tiger, timestamp=1368679322745000)
Returned 2 results.
Elapsed time: 58 msec(s).
[default@DEMO] 
}}} %%

Now on the second node (ubuntu2) start the node with ''/opt/cassandra/bin/cassandra'' , and start creating the cluster.
First create the proper tokens for a 4 node cluster, by using the following py :
%%prettify
{{{
# Number of nodes in the cluster
num_node = 4

for n in range(num_node):
    print int(2**127 / num_node * n)
}}}
%%
And execute it :
{{{
cssndra@ubuntu1:~$ python calcToken.py 
0
42535295865117307932921825928971026432
85070591730234615865843651857942052864
127605887595351923798765477786913079296
}}}

First startup cassandra on all 4 nodes by executing from the host: ''ssh ubuntu@10.0.4.11 'sudo su - cssndra /opt/cassandra/bin/cassandra' ''
!! Cluster config

||IP||DC||RACK||seeder
|10.0.4.11|DC1|RAC1|Y
|10.0.4.12|DC1|RAC2|N
|10.0.4.13|DC2|RAC1|Y
|10.0.4.14|DC2|RAC2|N

!! Creating keyspace, tables, inserting, updating , querying

! Create keyspace

First create a keyspace. You can do that both with ''cassandra-cli'' and ''cqlsh'', but they have different syntaxes :-) .\\
Here's a cqlsh example:
{{{
[default@unknown] cssndra@ubuntu1:~$ cqlsh 
Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 1.2.4 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
cqlsh> CREATE KEYSPACE demo_keyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 2};
cqlsh> select * from system.schema_keyspaces;

 keyspace_name | durable_writes | strategy_class                                       | strategy_options
---------------+----------------+------------------------------------------------------+----------------------------
   system_auth |           True |          org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
 demo_keyspace |           True | org.apache.cassandra.locator.NetworkTopologyStrategy |      {"DC2":"2","DC1":"2"}
        system |           True |           org.apache.cassandra.locator.LocalStrategy |                         {}
 system_traces |           True |          org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}

cqlsh> 
}}}

! Create a column family

Create a columnfamily with the ''cassandra-cli'' utility:
{{{
cssndra@ubuntu1:~$ cassandra-cli
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.2.4

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use demo_keyspace;
Authenticated to keyspace: demo_keyspace
[default@demo_keyspace] create column family users with key_validation_class = 'UTF8Type' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type';
0b6b0010-fc89-35a1-ad05-77d53e5a4443
}}}

! Insert data

Again with the ''cassandra-cli'' utility insert some data in the {{users}} columnfamily:
{{{
[default@demo_keyspace] cssndra@ubuntu1:~$ cassandra-cli
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.2.4

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown] use demo_keyspace;
Authenticated to keyspace: demo_keyspace
[default@demo_keyspace] set users[1234][name] = scott;
Value inserted.
Elapsed time: 52 msec(s).
[default@demo_keyspace] set users[1234][password] = scott-secret;
Value inserted.
Elapsed time: 14 msec(s).
[default@demo_keyspace] set users[1234][length] = 185;
Value inserted.
Elapsed time: 8.76 msec(s).
[default@demo_keyspace] set users[1235][name] = harry;
Value inserted.
Elapsed time: 6.71 msec(s).
[default@demo_keyspace] set users[1235][length] = 181;
Value inserted.
Elapsed time: 13 msec(s).
[default@demo_keyspace] set users[1235][whatevercolumn] = skfkjdkfjdklsjfsjflkjldk181;
Value inserted.
Elapsed time: 6.2 msec(s).
[default@demo_keyspace] list users;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: 1234
=> (column=length, value=185, timestamp=1368883341707000)
=> (column=name, value=scott, timestamp=1368883316118000)
=> (column=password, value=scott-secret, timestamp=1368883330142000)
-------------------
RowKey: 1235
=> (column=length, value=181, timestamp=1368883368497000)
=> (column=name, value=harry, timestamp=1368883358461000)
=> (column=whatevercolumn, value=skfkjdkfjdklsjfsjflkjldk181, timestamp=1368883385475000)

2 Rows Returned.
Elapsed time: 42 msec(s).
}}}



And insert with the ''cqlsh -2'' utility :
{{{
cssndra@ubuntu1:~$ cqlsh -2
Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 0.0.0 | CQL spec 2.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
cqlsh> USE demo_keyspace ;
cqlsh:demo_keyspace> INSERT INTO users ( key, name, password, whatevercolumn) VALUES ( '12345' , 'harry' , 'wachtwoordje' , 'blablabla fjkdsjf l4j 2k43u');
cqlsh:demo_keyspace> 
}}}


And a stupid shell script to insert bulk data :
%%prettify
{{{
#!/bin/bash
#
num=$1
let n=0
TMPFILE=/tmp/$RANDOM.cql
echo "use demo_keyspace;" > $TMPFILE
while [ $n -lt $num ]
do
 # echo $n
 CQL="INSERT INTO users ( key, name, password, whatevercolumn) VALUES ( '99${n}' , 'harry${n}' , 'wachtwoordje${n}' , 'blablabla fjkdsjf ${n} ${n} ${n}2k43u');"  
 echo $CQL >> $TMPFILE
 let n=n+1
done

echo "echoing inserts to cqlsh..."
cqlsh -2 -f $TMPFILE 

echo "listing users..."
cat <<EOF | cassandra-cli
 use demo_keyspace;
 list users;
EOF
rm $TMPFILE
}}}
%%
!! Cassandra notes/questions

! Questions

* Can I share a cassandra cluster between multiple applications while still having some form of (security) separation ? (like having multiple databases in MySQL, and arranging access to them with grants).
* Security in general, how is the gossip protected, how to prevent "illegal nodes" from entering the cluster ?
* Security, how is access control arranged, and on what level ?
* How to change replica settings ?
** You set the number of replicas when you create a keyspace using the replica placement strategy.
** run through cqlsh: ''ALTER KEYSPACE "Excalibur" WITH REPLICATION =  { 'class' : 'SimpleStrategy', 'replication_factor' : 3 }; ''
** On each affected node, run nodetool repair. Wait until repair completes on a node before moving to the next node.
** also see [http://www.datastax.com/docs/1.2/cql_cli/using/keyspace]
* How to take down (phase out) a node in a controlled way ?
* what snitch to use ?
! Notes

* Every node should have the same list of seeds. In multiple data-center clusters, the seed list should include a node from each data center.
* Use NetworkTopologyStrategy when you have (or plan to have) your cluster deployed __across multiple data centers__
* Use vnodes, see ''num_tokens'' in cassandra.yaml and [http://wiki.apache.org/cassandra/Operations]