After a recent OS upgrade, you need to reinstall NVIDIA GPU and DOCA drivers to support both AI training and accelerated networking. What best practice ensures successful installation and full hardware capability?
正解:A
The correct best practice is to install only the GPU and DOCA driver versions that are validated for the current operating system, kernel, hardware platform, and NVIDIA software release. In NVIDIA AI infrastructure, driver compatibility is critical because GPU drivers, CUDA libraries, DCGM, Fabric Manager, DOCA-OFED, network adapters, DPUs, and kernel modules must align. Installing the newest driver without checking compatibility can cause driver/library mismatches, failed nvidia-smi, broken RDMA, or unsupported DOCA behavior. NVIDIA DGX documentation notes that DGX OS updates include coordinated OS, kernel, GPU driver, CUDA toolkit, and DCGM updates, and NVIDIA AI Enterprise provides support matrices to confirm supported platform combinations. DOCA-OFED is also tied to the operating system and kernel environment, so version selection must be deliberate. Legacy drivers may lack required fixes or hardware support, and default distribution drivers are usually not sufficient for validated DGX or AI Enterprise deployments. After installation, administrators should reboot if required, validate GPU visibility, confirm networking driver status, and run health checks before returning the node to production.