The purpose of this post is not to reiterate but rather restate how simple is adding a node to a running Cassandra cluster. I will not pretend that this is the whole thing that needs to be done once the node has been added. For example, the cluster will start streaming data to the newly added node which is not always cheap and which must be taken care of once the node has joined the cluster. Nevertheless, the procedure itself is pretty straightforward and the following is the minimum one must do to add a node to a running Cassandra cluster.
Everything below runs on my laptop using Virtual Box VM’s. Here is a cluster of two nodes running on CentOS 7.9:
[centos@centos7 cassandra]$ nodetool status
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.107 292.99 KiB 1 100.0% c9504a89-653b-4e9d-a289-3444fc608a3b rack1
UN 192.168.1.108 218.35 KiB 1 100.0% de35416a-256a-445b-ab8d-c6a03da29eee rack1
The cluster is up and running. What we want to do now is to add the third node. Assuming the machine has been prepared the steps are:
1. Modify the cassandra.yaml file.
2. Start the instance.
That’s it! As simple as that. The minimum that must be changed in the cassandra.yaml file is the seeds and listen_address sections. Seed node is not a specific node. It can be any node(s) from the cluster which the joining node needs to contact to in order to communicate cluster information. Listen_address is the address that nodes use to communicate with each other.
The new node’s IP address is 192.168.1.109 for which 192.168.1.108 is specified as the seed. Here is how the seeds section looks like after the change.
seed_provider: # Addresses of hosts that are deemed contact points.
# Database nodes use this list of hosts to find each other and learn
# the topology of the ring. You _must_ change this if you are running
# multiple nodes! - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "192.168.1.108"
and here is how I changed the listen_address section
# Address or interface to bind to and tell other nodes to connect to.
# You _must_ change this address or interface to enable multiple nodes to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# When not set (blank), InetAddress.getLocalHost() is used. This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be).
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: 192.168.1.109
Now the cassandra instance can be started after which it can be seen that the cluster is comprised of three nodes:
[centos@centos7 ~]$ nodetool status
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving/Stopped
-- Address Load Tokens Owns (effective) Host ID Rack
UN 192.168.1.107 306.67 KiB 1 35.7% c9504a89-653b-4e9d-a289-3444fc608a3b rack1
UN 192.168.1.108 281.72 KiB 1 85.4% de35416a-256a-445b-ab8d-c6a03da29eee rack1
UN 192.168.1.109 326 KiB 1 78.9% 88965b8c-1ba0-4d70-a034-616001795044 rack1
Node addition is completed.