Mangling packet headers using NFTables affects data received via pcap interface

To solve the problem with bad/martian src ip’s in multicast packets, I used something called Stateless NAT: I change src ip in packets to something valid (e.g. 0.0.0.0192.16.0.1).
I found two solutions for this:

  • using nftables;
  • using tc (Traffic Control).

Both work well, but the nftables variant also affects to src ip that I see via tcpdump. And this is strange behaviour for me.

Here is my assumptions on the packet flow in the Linux network subsystem, based on iptables and nftables schemes:

  • tcpdump clones incoming packets at the AF_PACKET point (this used under the hood by pcap);
  • tc operates at the ingres (qdisc) point;
  • Used in nftables prerouting hook located somewhere at iptablesraw/mangle/nat prerouting points, anyway after AF_PACKET and ingres (qdisc) points.

Questions:

  1. Why do I see a changed src ip in the tcpdump output when using nftablesStateless NAT?
    • is there something wrong in my assumptions above?
    • or does this change/mangling go directly to the packet data buffer, which is not actually cloned at the AF_PACKET point?
      If this true, could there be some race conditions, e.g. when using multiple mangling rules?
  2. Is this intentional behaviour?
    Didn’t find any information about this, e.g. in nftable
    wiki
    .

Notes:

  • I used Linux 5.15;
  • Create Stateless NAT snippets:
    • Common:
      _iface=eth0
      SRC_IP_FROM=0.0.0.0
      SRC_IP_TO=192.16.0.1
      
    • nftables solution (based on this):
      nft add table tmp_1
      nft add chain tmp_1 prerouting "{type filter hook prerouting priority raw;}"
      nft add rule  tmp_1 prerouting " 
        iifname $_iface ip saddr $SRC_IP_FROM ip protocol udp 
        ip saddr set $SRC_IP_TO"
      
    • tc solution (based on this):
      tc qdisc add dev $_iface ingress
      tc filter add dev $_iface parent ffff: protocol ip prio 1 
        u32 match ip src $SRC_IP_FROM match ip protocol 17 0xff 
        action nat egress $SRC_IP_FROM $SRC_IP_TO