Nebula
Nebula is a user-space mesh virtual private network (VPN) daemon that uses tunneling and encryption to create a secure private mesh network between participating hosts.
Installation
Basic concepts and terminology
Nebula is a mesh VPN technology, inspired by tinc. In a mesh VPN, individual nodes form direct tunnels between each other. This allows for high speed direct communication between nodes, without the need to go through a central node. Nodes are authenticated using certificates signed by a certificate authority.
This is in contrast to WireGuard, which is a peer-to-peer VPN technology (although there exist mesh network managers for WireGuard, e.g. innernet and wesherAUR).
This is also different from OpenVPN, which uses a star topology (also called hub and spoke).
- Certificate authority
- The certificate authority creates host certificates by signing it.
- Lighthouse
- In a Nebula network, there is typically at least one lighthouse node that serves as an information hub for other nodes. Lighthouse nodes help other nodes find each other and form a network mesh.
- Node
- A node in the Nebula network.
- Nebula IP
- IP address of a node within the Nebula network. Also known as VPN IP.
- Routable IP
- The "normal" or "native" IP address of a node. This can be a public IP address or a private IP address, depending on where the node is located and how its network is configured. A node can have multiple routable IP addresses.
Example: Simple mesh VPN
Network setup
In this example, we have 3 nodes:
- lighthouse
- Nebula IP: 192.168.100.1
- Routable IP: 12.34.56.78
- hostA
- Nebula IP: 192.168.100.101
- Routable IP: 10.0.0.22
- hostB
- Nebula IP: 192.168.100.102
- Routable IP: 23.45.67.89
The lighthouse has a public static IP address and is reachable by hostA and hostB. hostA lives behind a NAT. hostB has a public IP address.
In our case, we will use a /24 subnet for the VPN network. We will call this network "My Nebula Network".
Certificate and key generation
First, generate the CA certificate and private key with nebula-cert ca -name "My Nebula Network"
. This will create two files:
ca.crt
: The CA certificate fileca.key
: the CA private key
Subsequently, generate the certificate and private key files for the nodes in the network:
$ nebula-cert sign -name lighthouse -ip 192.168.100.1/24 $ nebula-cert sign -name hostA -ip 192.168.100.101/24 $ nebula-cert sign -name hostB -ip 192.168.100.102/24
Notice that we did not specify ca.crt
and ca.key
. By default, nebula-cert
looks for those files in the current directory.
After this step, we will have these files:
lighthouse.crt
,lighthouse.key
hostA.crt
,hostA.key
hostB.crt
,hostB.key
Configuration
Create this configuration file on the lighthouse node:
/etc/nebula/config.yml
pki: ca: /etc/nebula/ca.crt cert: /etc/nebula/lighthouse.crt key: /etc/nebula/lighthouse.key lighthouse: am_lighthouse: true listen: port: 4242 firewall: outbound: - port: any proto: any host: any inbound: - port: any proto: any host: any
Create this configuration file on hostA:
/etc/nebula/config.yml
pki: ca: /etc/nebula/ca.crt cert: /etc/nebula/hostA.crt key: /etc/nebula/hostA.key static_host_map: "192.168.100.1": ["12.34.56.78:4242"] lighthouse: hosts: - "192.168.100.1" punchy: punch: true firewall: outbound: - port: any proto: any host: any inbound: - port: any proto: any host: any
Finally, use this configuration file for hostB:
/etc/nebula/config.yml
pki: ca: /etc/nebula/ca.crt cert: /etc/nebula/hostB.crt key: /etc/nebula/hostB.key static_host_map: "192.168.100.1": ["12.34.56.78:4242"] lighthouse: hosts: - "192.168.100.1" firewall: outbound: - port: any proto: any host: any inbound: - port: any proto: any host: any
Distribute certificates and private keys
Because the certificates and private keys were generated by the certificate authority, they need to be distributed to each node. SCP and SFTP are suitable for this purpose.
Specifically:
ca.crt
should be copied to all 3 nodes: lighthouse, hostA, and hostBlighthouse.crt
andlighthouse.key
should be copied to the lighthouse nodehostA.crt
andhostA.key
should be copied to hostAhostB.crt
andhostB.key
should be copied to hostB
ca.key
file does not have to be copied over to any node. Keep it safe (do not lose it) and secure (do not leak it).Start the nebula daemon
On each node, start nebula.service
. Optionally, enable it so that it will be started on boot.
Note that it does not matter which node starts the nebula daemon. The lighthouse node can even be started last. Each individual node always tries to connect to the list of known lighthouse nodes, so any network interruption can be rectified quickly.
Test for mesh functionality
With a mesh network, every node is directly connected to every other node. So, even if the connection between lighthouse and both hostA and hostB is slow, traffic between hostA and hostB can be fast, as long as there is a direct link between those two.
This can be demonstrated by a simple ping test on hostA:
$ ping -c 5 12.34.56.78
PING 12.34.56.78 (12.34.56.78) 56(84) bytes of data. 64 bytes from 12.34.56.78: icmp_seq=1 ttl=56 time=457 ms 64 bytes from 12.34.56.78: icmp_seq=2 ttl=56 time=480 ms 64 bytes from 12.34.56.78: icmp_seq=3 ttl=56 time=262 ms 64 bytes from 12.34.56.78: icmp_seq=4 ttl=56 time=199 ms 64 bytes from 12.34.56.78: icmp_seq=5 ttl=56 time=344 ms --- 12.34.56.78 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4004ms rtt min/avg/max/mdev = 199.141/348.555/480.349/108.654 ms
$ ping -c 5 192.168.100.1
PING 192.168.100.1 (192.168.100.1) 56(84) bytes of data. 64 bytes from 192.168.100.1: icmp_seq=1 ttl=64 time=218 ms 64 bytes from 192.168.100.1: icmp_seq=2 ttl=64 time=241 ms 64 bytes from 192.168.100.1: icmp_seq=3 ttl=64 time=264 ms 64 bytes from 192.168.100.1: icmp_seq=4 ttl=64 time=288 ms 64 bytes from 192.168.100.1: icmp_seq=5 ttl=64 time=163 ms --- 192.168.100.1 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4004ms rtt min/avg/max/mdev = 162.776/234.874/288.073/42.902 ms
$ ping -c 5 192.168.100.102
PING 192.168.100.102 (192.168.100.102) 56(84) bytes of data. 64 bytes from 192.168.100.102: icmp_seq=1 ttl=64 time=106 ms 64 bytes from 192.168.100.102: icmp_seq=2 ttl=64 time=2.14 ms 64 bytes from 192.168.100.102: icmp_seq=3 ttl=64 time=4.53 ms 64 bytes from 192.168.100.102: icmp_seq=4 ttl=64 time=4.29 ms 64 bytes from 192.168.100.102: icmp_seq=5 ttl=64 time=5.39 ms --- 192.168.100.102 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4006ms rtt min/avg/max/mdev = 2.136/24.535/106.344/40.918 ms
Notice that the connection between hostA and lighthouse is slow, but the connection between hostA and hostB is very fast. Also notice that the first packet between hostA and hostB is delayed a bit, but subsequent packets take almost no time at all.
Configuration options
listen.port
- This is the listening port for the nebula daemon, which by default is 4242. On a lighthouse node, or a node with a static IP address, set this to any other number in order to personalize your setup and reduce the chances of unwanted service discovery and DDoS attacks on that port. Then update
static_host_map
to reflect the change. - On a node with a dynamic IP address, it is recommended to set this to 0, such the nebula daemon will use a random port for communication.
logging.level
- By default, the nebula daemon logs INFO-level messages. Thus handshakes are printed, and this can generate a lot of log messages. Set it to
warning
in order to reduce the amount of messages logged. relay
- This option can be used if a node cannot be reached directly from another node. Relay nodes help forward the communication between such nodes.
firewall
- This option can be used to allow only certain traffic to and from a node.
Troubleshooting
My lighthouse node takes forever to handshake
If your lighthouse node needs a long time to handshake, and it prints multiple handshake messages all at once when handshake is completed, maybe it does not support recvmmsg()
. To get around this issue, add this configuration option:
/etc/nebula/config.yml
listen: batch: 1
This problem usually happens if your Linux kernel is too old (<2.6.34). The proper solution is to upgrade it.