NCCL
使用 Volcano 运行 nccl-test
· ☕ 2 分钟
1. 制作 nccl-test 镜像 查看 CUDA 版本 1 2 3 nvidia-smi | grep "CUDA Version" | awk '{print $9}' 12.2 编写 Dockerfile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cat > Dockerfile << EOF FROM hubimage/nvidia-cuda:12.1.0-cudnn8-devel-ubuntu22.04 ENV DEBIAN_FRONTEND=noninteractive ARG CONDA_VERSION WORKDIR /workspace ENV DEBIAN_FRONTEND=noninteractive RUN apt-get update && apt install -y openmpi-bin libopenmpi-dev ssh openssh-server net-tools vim git iputils-ping nfs-common RUN git clone https://github.com/NVIDIA/nccl-tests.git && \ cd nccl-tests && \ make MPI=1 MPI_HOME=/usr/lib/x86_64-linux-gnu/openmpi EOF 编译 nccl-test 镜像 1 docker build -t hubimage/nccl-test:12.1.0-ubuntu22.04 -f Dockerfile . 推送 nccl-test 镜像 1 docker push hubimage/nccl-test:12.1.0-ubuntu22.04 2. 运行 Volcano Job 给测试节点打