NCP-AII受験方法質問34：You're deploying a multi-GPU training job on a cluster using Slurm. You need to ensure that the GPUs

<<前へ次へ>>

質問 34/131

You're deploying a multi-GPU training job on a cluster using Slurm. You need to ensure that the GPUs allocated to the job are healthy and functioning correctly before the training starts. What's the MOST effective approach to pre-validate the GPU hardware?

A. Run a simple CUDA vector addition program on each GPU and check for errors. B. Check the output of 'nvidia-smi' to ensure all GPUs are listed and have the expected memory. C. Execute the NVIDIA Data Center GPU Manager (DCGM) diagnostic suite on the allocated GPUs. D. Monitor the GPU temperature using 'nvidia-smi' during the first few minutes of the training job. E. Allocate all available GPUs to the job and assume they are healthy.

質問 34/131

コメントを発表する

Download PDF File