NCP-AII受験資格質問73：Consider a scenario where you are setting up a high-performance computing cluster with several GPU-accelerated

<<前へ次へ>>

質問 73/131

Consider a scenario where you are setting up a high-performance computing cluster with several GPU-accelerated nodes using Slurm as the resource manager. You want to ensure that jobs requesting GPUs are only scheduled on nodes with the appropriate NVIDIA drivers and CUDA toolkit installed. How can you achieve this within Slurm?

A. Use Slurm's 'GresTypeS configuration option in 'slurm.conf to define a generic resource type called 'gpu' and then configure each node to advertise the available GPIJs. Slurm will automatically ensure that jobs requesting GPUs are only scheduled on nodes with the 'gpu' resource. B. Create a custom Slurm script that checks for the presence of the NVIDIA driver and CUDA toolkit before submitting a job to a node. If the requirements are not met, the job is rejected. C. Use Slurm's node features to tag nodes with the "Feature=' keyword in 'slurm.conf. For example, tag nodes with GPUs as 'Feature=gpu' . Jobs can then request nodes with the 'gpu' feature using the option. D. Install the NVIDIA Data Center GPU Manager (DCGM) on each node and configure Slurm to query DCGM for GPU availability and health. Slurm will then only schedule jobs on healthy and available GPUs. E. Utilize Slurm's Prolog and Epilog scripts to dynamically install the necessary NVIDIA drivers and CUDA toolkit on each node before and after a job runs. This ensures that the required software is always available.

質問 73/131

コメントを発表する

Download PDF File