To solve the problem with bad/martian src ip’s in multicast packets, I used something called Stateless NAT: I change src ip in packets to something valid (e.g. 0.0.0.0
➜ 192.16.0.1
).
I found two solutions for this:
- using nftables;
- using tc (Traffic Control).
Both work well, but the nftables variant also affects to src ip that I see via tcpdump. And this is strange behaviour for me.
Here is my assumptions on the packet flow in the Linux network subsystem, based on iptables and nftables schemes:
- tcpdump clones incoming packets at the
AF_PACKET
point (this used under the hood by pcap); - tc operates at the
ingres (qdisc)
point; - Used in nftables
prerouting
hook located somewhere at iptables‘ raw/mangle/natprerouting
points, anyway afterAF_PACKET
andingres (qdisc)
points.
Questions:
- Why do I see a changed src ip in the tcpdump output when using nftables‘ Stateless NAT?
- is there something wrong in my assumptions above?
- or does this change/mangling go directly to the packet data buffer, which is not actually cloned at the
AF_PACKET
point?
If this true, could there be some race conditions, e.g. when using multiple mangling rules?
- Is this intentional behaviour?
Didn’t find any information about this, e.g. in nftable
wiki.
Notes:
- I used Linux 5.15;
- Create Stateless NAT snippets:
- Common:
_iface=eth0 SRC_IP_FROM=0.0.0.0 SRC_IP_TO=192.16.0.1
- nftables solution (based on this):
nft add table tmp_1 nft add chain tmp_1 prerouting "{type filter hook prerouting priority raw;}" nft add rule tmp_1 prerouting " iifname $_iface ip saddr $SRC_IP_FROM ip protocol udp ip saddr set $SRC_IP_TO"
- tc solution (based on this):
tc qdisc add dev $_iface ingress tc filter add dev $_iface parent ffff: protocol ip prio 1 u32 match ip src $SRC_IP_FROM match ip protocol 17 0xff action nat egress $SRC_IP_FROM $SRC_IP_TO
- Common: