I’m not sure where to start with to troubleshoot this. I segregated my network into a few different VLANs (servers, workstations, wifi, etc…). I have VMs and LxC containers running in Proxmox, routing is handled by Opnsense, and I have a couple tplink managed switches. All of this is working fine except for 1 problem.

I have a couple systems (VM and LxC) that have interfaces on multiple VLANs. If I SSH to one of these systems, on the IP that’s on the same VLAN as the client, it works fine. If I SSH to one of the other IPs it’ll initially connect and work but within a minute or so the connection hangs and times out.

I tried running ssh in verbose mode and got this, which seems fairly generic:

debug3: recv - from CB ERROR:10060, io:00000210BBFC6810
debug3: send packet: type 1
debug3: send - WSASend() ERROR:10054, io:00000210BBFC6810
client_loop: send disconnect: Connection reset
debug3: Successfully set console output code page from 65001 to 65001
debug3: Successfully set console input code page from 65001 to 65001 

I realize the simple solution is to just use the IP on the same subnet, but my current DNS setup doesn’t allow for me to provide responses based on client subnet. I’d also like to better understand (and potentially) solve this problem.

Thanks

  • towerful@programming.dev
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    9 months ago

    You have to NAT through opnsense, or set up different routing tables on the VM.

    Client is 192.168.1.4.
    Server is 192.168.1.5 and 192.168.2.5.
    Opnsense is dealing with vlan 1 and vlan 2 (for simplicity sake) according to 192.168.VLAN.0/24, and will happily forward packets between the 2 subnets.

    As the VM has 2 network devices, 1 on VLAN1 and 1 on VLAN2, it alway has a direct connection to the client via VLAN1.

    So, if your client connects to 192.168.2.5, it doesn’t know where to send the packet.
    It sends it to the gateway (opnsense), which then forwards it to vlan2.
    The VM then receives the packet, and replies to its address (192.168.1.5, opnsense doesnt alter the sender’s address).
    The way Linux works is it will use the network device that is in the same subnet - as opposed to replying on the same device the packet arrived on.
    So the VM send the packet out VLAN1 directly back to the client.
    And this works. Packets from client to server go via opnsense, packets from server to client go directly.
    For a while.
    Then opnsense sees that there is an ongoing connection between vlan1 and vlan2… except it’s not seeing all the proper acks/syn/wait/fin packets. So it thinks it’s a timed out connection, or something dodgy going on… and it closes the connection.
    And now your client can’t talk to the server through vlan2, and it has to reconnect.

    I pulled my hair out over this.
    I ended up just having a single NIC per VM.
    Here’s a SE question that might help you.
    https://unix.stackexchange.com/questions/4420/reply-on-same-interface-as-incoming

    • 𝓢𝓮𝓮𝓙𝓪𝔂𝓔𝓶𝓶OP
      link
      fedilink
      English
      arrow-up
      1
      ·
      9 months ago

      Yeah, I did some packet captures this afternoon and realized that’s exactly what’s happening.

      I want the VM to have multiple interfaces. I was just being lazy about connecting to it (wanted to use dns). The way I see it I have 3 options.

      1. Connect via IP to the interface on the same subnet.
      2. Separate A records for each IP. Feels like #1 with extra steps.
      3. Overcomplicate things with bind views on my internal zone so it returns the best IP for the client.

      I did also find something online about policy based routing on the VM. But, all of this reeks of me overcomplicating things when I could just use the IP the couple times a month I ssh to these boxes.

      • towerful@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        9 months ago

        We have gone through the exact same process!
        Multiple NICs, fancy DNS, Linux not replying on the same interface.

        I ended up being super lazy about it and using somewhat sensible IP addresses.
        And only using 1 NIC - which also massively simplified firewall rules.
        Everything turned into zone based rules (ie mgmt has access to dmz, vms, wan. VMs has access wan. DMZ has access to nothing. anything else is a specific rule).
        I’m even thinking about swapping to a more zone oriented firewall solution.

        However, if I were to do it again, I’d ditch the multiple vlans (well, almost. I’d have a proxmox/hardware vlan, and a VM vlan). I’d manage VM firewalls in proxmox, and network firewalls on opnsense.
        Then I can be precise about who talks to who.