I’m testing a raid card which uses a PCIe link with my server.
There are two ways to link down in PCIe protocol: link disable and hot reset. I didn’t find any reasonable way to both do the reset and make a notification to the kernel. So I choose to use the setpci
tool to do this, such as triggering a hot reset:
bc=$(setpci -s $port BRIDGE_CONTROL)
echo "Bridge control:" $bc
setpci -s $port BRIDGE_CONTROL=$(printf "%04x" $(("0x$bc" | 0x40)))
sleep 0.01
setpci -s $port BRIDGE_CONTROL=$bc
But the Linux kernel will not sense this link down, thus, the card driver does not sense the device is also unavailable.
When I try to re-probe my card via the driver, the software has to wait for I/O queue timeout and then trigger host reset. The queue timeout is 60 seconds, so I must wait at least 60 seconds to complete the reprobe process.
I want to know if there is any reasonable way in Linux to both do the reset and make a notification to the kernel, such as flr using /sys/bus/pci/devices/$dev/reset
so that the card driver could get the link down status and reprobe my card faster?