Apache Cassandra is a NoSQL database with linear scalability.
Install the AUR package.
Logging to journald
The package logs to
/var/log/cassandra/system.log by default. To instead log to journald you will need to copy the systemd unit to
/etc/systemd/system/ so the change persists.
$ cp /usr/lib/systemd/system/cassandra.service /etc/systemd/system/
Edit the unit
$ vim /etc/systemd/system/cassandra.service
And set the service to run in the foreground by adding
-f to the
ExecStart line, and set Type to
simple as the process will no longer fork
[Service] Type=simple ExecStart=/usr/bin/cassandra -p /run/cassandra/cassandra.pid -f
If Cassandra was running you will need to drain, and restart Cassandra
$ nodetool drain; systemctl restart cassandra
There is copious amounts of documentation in the default
cassandra.yaml. When installed via the AUR package, it is located in
Basic config items to change
Setting the name of the cluster. This needs to be consistent for all nodes that you intend to have in this cluster.
cluster_name: 'Test Cluster'
Set the directory where cassandra will write too, below is the default that will be used if unset. If possible set this to a disk used only for storing cassandra data
data_file_directories: - /var/lib/cassandra/data
For the first node (the seed node) make sure to include its IP address in the seeds, and atleast 1 other node. for all other nodes, try and set a broad range of nodes in the cluster. If a node cannot connect to one of the seeds listed in this configuration at startup - it will fail to start.
seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "192.168.1.53, 192.168.1.52"
set this based on what type of disk cassandra is using to store data on
This is the address Cassandra will listen for client connections on
This is the address this node will advertise itself as, ensure both your clients and nodes can reach this node on this address
This is the address used for thrift connections, set to
0.0.0.0 it will listen on all interfaces, which is fine as long as its firewalled for security
Recommended settings for linux specifically
hsha stands for "half synchronous, half asynchronous." All thrift clients are handled asynchronously using a small number of threads that does not vary with the amount of thrift clients (and thus scales well to many clients). This is not recommended on windows machines hsha is about 30% slower
Because we're using hsha,
rpc_max_threads must be set, or cassandra will refuse to start.
rpc_max_threads represents the maximum number of client requests this server may execute concurrently.
cqlsh CQL Shell is a command line client for connecting to a cassandra cluster
$ sudo pip install cqlsh
To use the Python API, install the proper Cassandra driver.
$ pip install cassandra-driver
For more information on how to use the Python API for Cassandra, see the API documentation.