Multiple hosts with multiple servers, shared database, and VLANs

So I’m trying to set up a Pritunl cluster in our new environment that has VLAN segments for production and dev, and I’m having trouble figuring out how to set everything up to work. Hopefully my explanation makes sense.

This is the simplified version of our config (we have an Enterprise license):

  • VLAN 10 - Production (10.0.10.0/24)
  • VLAN 20 - Admin VPN (10.0.20.0/24)
  • VLAN 30 - Dev (10.0.30.0/24)
  • VLAN 40 - Dev VPN (10.0.40.0/24)

We have two VPN servers, one is using a subnet on VLAN 20 (10.0.20.128/24), and the other is on VLAN 40 (10.0.40.128/25). We then have two host machines running Pritunl, and we would like both VPN servers to be accessible on both hosts for resiliency and balancing. They share the same MongoDB server. The host themselves are currently running within VLAN 20 and 40 for Admin and Dev (respectively) and assigned an IP on that subnet. The Admin VLAN has access to all other VLANs. The Dev VLAN only has access to the Dev VLANs. For simplicity, each VPN server routes all VLAN subnets over the connection (since they have to be restarted if there are changes), and the switches and firewall handle the access to the other VLANs.

        Firewall
   ____/        \___
   |               |
 vpnhost01      vpnhost02
(10.10.20.5)   (10.10.40.5)
   |               |
   ----\       /----
        MongoDB

In order to allow each VPN server to run on each host, it seems like I need to assign an interface to each host from each VPN VLAN (20 and 40) so the VPN subnets are routable to the rest of the network. However, that ends up giving the Dev VPN connections access to the production VLANs because the traffic can traverse the VPN through the VPN host and through its default gateway (which on vpnhost01 is on the Admin VLAN).

For example: A Dev VPN client tries to access 10.0.10.50 (VLAN 10, should be blocked)

Client → Dev VPN → vpnhost01 → default gateway (10.10.20.1) → 10.0.10.50 (allowed because VLAN 20 is trusted).

Is the only way to handle this to only route the specific subnets that each VPN server should access? Hopefully that’s not the case, since the Dev VLAN will need access to some parts of some VLANs, but not the entire subnet. Obviously I could get crazy with firewall rules on the VPN host, but that seems messy.

Also, if I have a dual VPN host set up like this that runs replicated servers, can I not use NAT on the routes? Since non-NAT traffic requires a static route on the firewall to push the VPN subnet back to the correct VPN host, I can only make it go to one specific host. If someone connects to vpnhost01, the static route for the VPN subnet in the firewall needs to point to vpnhost01 (10.10.20.5), but then it won’t work for vpnhost02.

The only option for controlling what a client can access from Pritunl is the set of routes available on each server. Multiple servers can be created to handle different sets of routes for each group of users. If VPN virtual networks are routed those subnets can be used in external firewall rules.

Replicated servers can use non-NAT routes, any Pritunl host can be used as the route next-hop. This will cause problems if that host goes offline so failover will not work. This is solved on AWS or Oracle Cloud with the route advertisement option that will automatically update the routing tables if a host goes offline. This is only available on AWS and Oracle Cloud.

Thanks for the info.

So after thinking about this a little more, and taking the information you provided into account, it sounds like it’s not possible to run a Pritunl host machine that has multiple VPN servers that span VLANs. Since I have to attach an interface for each VLAN to the host, and Pritunl doesn’t have a way to route a specific VPN server over a specific gateway that corresponds to its VLAN, everything goes out the default gateway.

Ultimately, it looks like I’ll have to set up the servers in a way that they only run on a host that only uses the VLAN that corresponds to that VPN server (so for two VLANs, I’d need two hosts that are just on VLAN 20 and two that are just on VLAN 40). I then set the Availability Group for each host to either “admin” or “dev”. However, how do I specify that the “dev” VPN server only runs on the “dev” availability group hosts, or is that not how I use this feature? The docs might indicate I do not have ultimate control over which host the VPN server uses in a cluster.

Regarding the NAT vs non-NAT, that seems to indicate that outside of AWS or Oracle Cloud, the only way to have a truly redundant setup that allows for a single host failure is to use NAT, since non-NAT requires the static route to a single machine that could be down.

Regarding the question about only running a particular VPN server on a certain set of hosts, I’m an idiot. I just assign the given hosts only to that VPN server to control that.

The second question still stands, but assuming what I said is correct, then there’s nothing that can be done about that.

A follow-up to the static route config for a non-NAT setup with multiple hosts.

I can’t seem to get the “any host can be the next-hop” working with my setup.

Here’s the rundown now (I’m just using the Admin VLAN with a replicated server to simplify things):

vpnhost01 - 10.10.20.5
vpnhost02 - 10.10.20.6
VPN server subnet: 10.10.20.128/25

The client connects to vpnhost01 and is assigned 10.10.20.130/25. The client now connects to another server (server01 - 10.10.20.10) on the same subnet as vpnhost01. I have to create a static route on server01 to route 10.10.20.128/25 back to vpnhost01 (10.10.20.5). This actually works, and I can pass traffic between the VPN client and server01.

I then change the static route on server01 to use vpnhost02 (10.10.20.6) as the next-hop instead. Now I cannot pass traffic from the VPN client to server01. A tcpdump shows this traffic for the SYN and SYN/ACK packets:

client → vpnhost01 (tun0) → vpnhost01 (eth0) → server01 (eth0) → vpnhost02 (eth0) → vpnhost02 (tun0) → nothing

So the SYN packet gets all the way from the client to server01, then the SYN/ACK packet from server01 gets back to vpnhost02, first on the public eth0 interface, then to the tun0 interface used by the VPN server, and then stops. Apparently vpnhost02 doesn’t know how to actually get the SYN/ACK packet from server01 back to vpnhost01 (where the client is actually connected). Obviously I can’t put a static route on vpnhost02 for 10.10.20.128/25 to vpnhost01, because that would break clients that are actually connected to vpnhost02.

What am I missing? How does the “secondary” host know how to get traffic back to the “primary” host where the client is? I assume there would have to be some sort of communication between the two VPN host servers. Maybe this is where the VXLan stuff comes into play (VXLan is completely new to me). The hosts CAN communicate with each other, as they are on the same subnet without firewall restrictions between them. Is there something else I have to configure for VXLan?

(sigh) I believe at least part of the problem is that I had Inter-Client Routing disabled (I misinterpreted this as a form of client isolation so VPN clients can’t talk to each other, which I desire). If I enable that, I can see the response traffic sent to vpnhost02 go from eth0 to pxlan225, which I guess makes more sense, but it still doesn’t work its way to vpnhost01.

Does the VXLan between the hosts require me to configure anything special? Is there a way to test the communication is working?

The VXLan routing should be enabled unless a layer 2 network is available between hosts.

So that means in my simplified, it shouldn’t need VXLan routing (since all of the servers are on the same subnet with no routing needed). That being said, my first attempt (without VXLan routing enabled) still isn’t routing properly. The packets travel from the client through vpnhost01 to server01, but when they are routed back to vpnhost02, that server apparently doesn’t know what to do with the packets and drops them.

This means that I can only get a redundant host setup working if I have NAT enabled, since it bypassed all of the routing issues. That doesn’t seem like it should be the case based on what I’ve seen and read in the docs, so there must be something else missing that I cannot see.

You’ve mentioned configuring static routes on servers, the routing table on the router needs to be modified. The server routing tables should not be modified, this will likely break the modifications already made by the Pritunl server.

For the simplified test, I added the static route on server01, not the VPN servers. There is direct communication between the three servers because they all on the same subnet, so no external router is actually involved, no? It’s just passing through the switch, and the tcpdump results show the packet is making it all the way back to the vpn02 server, but then just stopping. The local static route on server01 basically takes the place of a static route in a router. Even if that wasn’t the case, I do still have a static route in the switch for traffic that it DOES route (like inter-VLAN).