Cassandra#
Just some free format stuff for a small studying / experiment with NoSQL stuff.
Resources#
- C* 2012: Building a Cassandra Based Application from Scratch (Patrick McFadin, Hobsons)
- Introduction to NoSQL by Martin Fowler
- http://cassandra.apache.org/
- Datastax Documentation
Install/config#
lxc#
- /etc/default/lxc => change subnet from 10.0.3 to 10.0.4 (10.0.3 is already in use somewhere else)
- add static route in wireless router (10.0.4.0/8 => via 10.0.0.164)
- adjust /etc/network/interfaces of container to:
auto eth0
#iface eth0 inet dhcp
iface eth0 inet static
address 10.0.4.10
netmask 255.255.255.0
network 10.0.4.0
broadcast 10.0.4.255
gateway 10.0.4.1
post-up route add default gw 10.0.4.1 dev eth0
# dns-* options are implemented by the resolvconf package, if installed
dns-nameservers 213.197.28.3 213.197.30.28
dns-search computerhok.nl
- lxc-host ubuntu1 => 10.0.4.11 (user=ubuntu, password=ubuntu)
cassandra#
- useradd cssndra && mkdir /home/cssndra && chown -R cssndra /home/cssndra + change /etc/passwd => sh => bash
- wget 'http://mirrors.supportex.net/apache/cassandra/1.2.4/apache-cassandra-1.2.4-bin.tar.gz'
- apt-get install openjdk-7-jdk
- sudo mkdir /opt/apache-cassandra-1.2.4 && sudo chown cssndra /opt/apache-cassandra-1.2.4 && sudo ln -s /opt/apache-cassandra-1.2.4 /opt/cassandra
- cssndra@ubuntu1:~$ cd /opt && tar -xf ~/apache-cassandra-1.2.4-bin.tar
- ls -l
cssndra@ubuntu1:/opt/cassandra$ ls -l total 248 -rw-r--r-- 1 cssndra cssndra 152928 Apr 8 19:21 CHANGES.txt -rw-r--r-- 1 cssndra cssndra 11609 Apr 8 19:21 LICENSE.txt -rw-r--r-- 1 cssndra cssndra 47580 Apr 8 19:21 NEWS.txt -rw-r--r-- 1 cssndra cssndra 1820 Apr 8 19:21 NOTICE.txt -rw-r--r-- 1 cssndra cssndra 3569 Apr 8 19:21 README.txt drwxr-xr-x 2 cssndra cssndra 4096 May 15 21:43 bin drwxr-xr-x 2 cssndra cssndra 4096 May 15 21:43 conf drwxr-xr-x 2 cssndra cssndra 4096 May 15 21:43 interface drwxr-xr-x 4 cssndra cssndra 4096 May 15 21:43 javadoc drwxr-xr-x 3 cssndra cssndra 4096 May 15 21:43 lib drwxr-xr-x 3 cssndra cssndra 4096 May 15 21:43 pylib drwxr-xr-x 4 cssndra cssndra 4096 Apr 8 19:21 tools
- sudo mkdir -p /var/lib/cassandra/data /var/lib/cassandra/commitlog && sudo chown -R cssndra /var/lib/cassandra
- sudo mkdir /var/log/cassandra/ && sudo chown -R cssndra /var/log/cassandra
- limit the heap size usage by editing conf/cassandra-env.sh : MAX_HEAP_SIZE="512M" HEAP_NEWSIZE="100M"
- edit ./conf/cassandra.yaml:
- change the listen_address to 10.0.4.11 , necessary for multinode cluster communication (as OS hostname/ip is not properly configured)
- change the rpc_address to 10.0.4.11 , necessary to make it reachable from non-localhost (as OS hostname/ip is not properly configured)
- change the seeds parameter from 127.0.0.1 to 10.0.4.11 (this first node becomes the seed node for all nodes)
- configure endpoint_snitch: GossipingPropertyFileSnitch
- edit ./conf/log4j-server.properties : remove logging to stdout
- edit ./conf/cassandra-rackdc.properties : see Cassandra#Cluster config
- ==> now first clone the VM : lxc-clone -o ubuntu1 -n ubuntu2 , ubuntu2 has address 10.0.4.12
- continue with the first node and there start the thing with : ./bin/cassandra -f
- add keyspace:
Now on the second node (ubuntu2) start the node with /opt/cassandra/bin/cassandra , and start creating the cluster. First create the proper tokens for a 4 node cluster, by using the following py :
# Number of nodes in the cluster
num_node = 4
for n in range(num_node):
print int(2**127 / num_node * n)
cssndra@ubuntu1:~$ python calcToken.py 0 42535295865117307932921825928971026432 85070591730234615865843651857942052864 127605887595351923798765477786913079296
First startup cassandra on all 4 nodes by executing from the host: ssh ubuntu@10.0.4.11 'sudo su - cssndra /opt/cassandra/bin/cassandra'
Cluster config#
| IP | DC | RACK | seeder |
|---|---|---|---|
| 10.0.4.11 | DC1 | RAC1 | Y |
| 10.0.4.12 | DC1 | RAC2 | N |
| 10.0.4.13 | DC2 | RAC1 | Y |
| 10.0.4.14 | DC2 | RAC2 | N |
Creating keyspace, tables, inserting, updating , querying#
Create keyspace#
First create a keyspace. You can do that both with cassandra-cli and cqlsh, but they have different syntaxes :-) .
Here's a cqlsh example:
[default@unknown] cssndra@ubuntu1:~$ cqlsh
Connected to Test Cluster at localhost:9160.
[cqlsh 2.3.0 | Cassandra 1.2.4 | CQL spec 3.0.0 | Thrift protocol 19.35.0]
Use HELP for help.
cqlsh> CREATE KEYSPACE demo_keyspace WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'DC1' : 2, 'DC2' : 2};
cqlsh> select * from system.schema_keyspaces;
keyspace_name | durable_writes | strategy_class | strategy_options
---------------+----------------+------------------------------------------------------+----------------------------
system_auth | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
demo_keyspace | True | org.apache.cassandra.locator.NetworkTopologyStrategy | {"DC2":"2","DC1":"2"}
system | True | org.apache.cassandra.locator.LocalStrategy | {}
system_traces | True | org.apache.cassandra.locator.SimpleStrategy | {"replication_factor":"1"}
cqlsh>
Create a column family#
Create a columnfamily with the cassandra-cli utility:
cssndra@ubuntu1:~$ cassandra-cli Connected to: "Test Cluster" on 127.0.0.1/9160 Welcome to Cassandra CLI version 1.2.4 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use demo_keyspace; Authenticated to keyspace: demo_keyspace [default@demo_keyspace] create column family users with key_validation_class = 'UTF8Type' and comparator = 'UTF8Type' and default_validation_class = 'UTF8Type'; 0b6b0010-fc89-35a1-ad05-77d53e5a4443
Insert data#
Again with the cassandra-cli utility insert some data in the users columnfamily:
[default@demo_keyspace] cssndra@ubuntu1:~$ cassandra-cli Connected to: "Test Cluster" on 127.0.0.1/9160 Welcome to Cassandra CLI version 1.2.4 Type 'help;' or '?' for help. Type 'quit;' or 'exit;' to quit. [default@unknown] use demo_keyspace; Authenticated to keyspace: demo_keyspace [default@demo_keyspace] set users[1234][name] = scott; Value inserted. Elapsed time: 52 msec(s). [default@demo_keyspace] set users[1234][password] = scott-secret; Value inserted. Elapsed time: 14 msec(s). [default@demo_keyspace] set users[1234][length] = 185; Value inserted. Elapsed time: 8.76 msec(s). [default@demo_keyspace] set users[1235][name] = harry; Value inserted. Elapsed time: 6.71 msec(s). [default@demo_keyspace] set users[1235][length] = 181; Value inserted. Elapsed time: 13 msec(s). [default@demo_keyspace] set users[1235][whatevercolumn] = skfkjdkfjdklsjfsjflkjldk181; Value inserted. Elapsed time: 6.2 msec(s). [default@demo_keyspace] list users; Using default limit of 100 Using default column limit of 100 ------------------- RowKey: 1234 => (column=length, value=185, timestamp=1368883341707000) => (column=name, value=scott, timestamp=1368883316118000) => (column=password, value=scott-secret, timestamp=1368883330142000) ------------------- RowKey: 1235 => (column=length, value=181, timestamp=1368883368497000) => (column=name, value=harry, timestamp=1368883358461000) => (column=whatevercolumn, value=skfkjdkfjdklsjfsjflkjldk181, timestamp=1368883385475000) 2 Rows Returned. Elapsed time: 42 msec(s).
And insert with the cqlsg -2 utility :
cssndra@ubuntu1:~$ cqlsh -2 Connected to Test Cluster at localhost:9160. [cqlsh 2.3.0 | Cassandra 0.0.0 | CQL spec 2.0.0 | Thrift protocol 19.35.0] Use HELP for help. cqlsh> USE demo_keyspace ; cqlsh:demo_keyspace> INSERT INTO users ( key, name, password, whatevercolumn) VALUES ( '12345' , 'harry' , 'wachtwoordje' , 'blablabla fjkdsjf l4j 2k43u'); cqlsh:demo_keyspace>
And a stupid shell script to insert bulk data :
#!/bin/bash
#
num=$1
let n=0
TMPFILE=/tmp/$RANDOM.cql
echo "use demo_keyspace;" > $TMPFILE
while [ $n -lt $num ]
do
# echo $n
CQL="INSERT INTO users ( key, name, password, whatevercolumn) VALUES ( '99${n}' , 'harry${n}' , 'wachtwoordje${n}' , 'blablabla fjkdsjf ${n} ${n} ${n}2k43u');"
echo $CQL >> $TMPFILE
let n=n+1
done
echo "echoing inserts to cqlsh..."
cqlsh -2 -f $TMPFILE
echo "listing users..."
cat <<EOF | cassandra-cli
use demo_keyspace;
list users;
EOF
rm $TMPFILE
Cassandra notes/questions#
Questions#
- Can I share a cassandra cluster between multiple applications while still having some form of (security) separation ? (like having multiple databases in MySQL, and arranging access to them with grants).
- Security in general, how is the gossip protected, how to prevent "illegal nodes" from entering the cluster ?
- Security, how is access control arranged, and on what level ?
- How to change replica settings ?
- You set the number of replicas when you create a keyspace using the replica placement strategy.
- run through cqlsh: ALTER KEYSPACE "Excalibur" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };
- On each affected node, run nodetool repair. Wait until repair completes on a node before moving to the next node.
- also see http://www.datastax.com/docs/1.2/cql_cli/using/keyspace
- How to take down (phase out) a node in a controlled way ?
- what snitch to use ?
Notes#
- Every node should have the same list of seeds. In multiple data-center clusters, the seed list should include a node from each data center.
- Use NetworkTopologyStrategy when you have (or plan to have) your cluster deployed across multiple data centers
- Use vnodes, see num_tokens in cassandra.yaml and http://wiki.apache.org/cassandra/Operations
