kernel-boot: Add rdma_topo tool #1644
Open
+775
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For some time now modern multi-NIC servers now have very complex topology. Often with NICs, GPUs and NVMe devices that are topologically co-located. These systems tend to come with specialized ACS requirements for PCI Peer to Peer, for instance ACS disable or ACS setup specially for translated traffic.
NVIDIA's latest systems have a novel PCI multipath system that requires special asymmetric ACS.
Introduce a tool to help users configure the ACS on such systems. The tool will be able to parse the PCI topology and identify the topological features then generate the require ACS settings.
Modern kernels support the config_acs kernel command line parameter to allow fine grained settings so the correct ACS for the topology can be fed into Grub and to the kernel command line to configure it at boot
The tool has four functions:
topo - Print out the topology from the RDMA perspective. Indicate what
devices are P2P connected to the NIC.
write-grub-acs - Emit the config_acs kernel command line parameter for
the required ACS configuration
setpci-acs - Use setpci after booting to set the required ACS
configuration. This is not recommended but provided to help
legacy systems without config_acs.
check - Read the live ACS settings and compare them to the required
configuration
This initial version supports two NVIDIA platforms. There is an expectation it will grow to more broadly support more common topologies as well.