Please enable Javascript to view the contents

NFS Over RDMA

 ·  ☕ 3 分钟

1. 前置条件

  • 存储与使用存储的节点组成 RDMA 网络

  • RDMA 设备配置了 IP 地址

  • 如果使用的是 Mellanox 网卡,在安装驱动时需要加上参数 -with-nfsrdma

1
./mlnxofedinstall --with-nfsrdma

2. 服务端启动

  • 安装依赖
1
apt install nfs-kernel-server rdma-core -y
  • 加载内核模块
1
modprobe svcrdma
  • 将端口加入 portlist
1
echo 'rdma 20049' | tee /proc/fs/nfsd/portlist
  • 挂载目录
1
mkdir /data1/nfs
1
2
3
vim /etc/exports

/data1/nfs  *(rw,sync,no_root_squash,no_all_squash)
  • 启动 NFS 服务
1
systemctl start nfs-server.service
  • 检测挂载点
1
2
3
exportfs -v

/data1/nfs      <world>(rw,wdelay,no_root_squash,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash

3. 客户端挂载

  • 安装依赖
1
apt install nfs-common rdma-core -y
  • 加载内核模块
1
modprobe rpcrdma
  • 创建挂载目录
1
mkdir /data1/nfs
  • 普通 NFS 挂载
1
mount -t nfs 10.8.x.x:/data1/nfs /data1/nfs
  • NFS Over RDMA 挂载
1
mount -o proto=rdma,port=20049,vers=4 10.113.x.x:/data1/nfs /data1/nfs

需要注意,一台主机上不能同时挂载普通 NFS 和 NFS Over RDMA。

4. 性能测试

4.1 网络性能

  • 普通网卡
1
iperf3 -s
1
iperf3 -c 10.8.x.x
1
2
3
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  27.2 GBytes  23.4 Gbits/sec  4104             sender
[  5]   0.00-10.05  sec  27.2 GBytes  23.3 Gbits/sec                  receiver
  • RDMA 网卡
1
ib_write_bw -d mlx5_0 -x3
1
ib_write_bw -d mlx5_0 -x3 10.8.x.x --report_gbits
1
2
3
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
Conflicting CPU frequency values detected: 3100.000000 != 2101.000000. CPU Frequency is not max.
 65536      5000             389.06             389.00             0.741968

4.2 4K 随机读写

  • 主机磁盘 4K 随机读写
1
fio -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=10g -numjobs=1 -runtime=1000 -group_reporting --allow_mounted_write=1 -name=Rand_Write_Testing -filename=/data1/fiofile
1
  WRITE: bw=1128MiB/s (1182MB/s), 1128MiB/s-1128MiB/s (1182MB/s-1182MB/s), io=10.0GiB (10.7GB), run=9081-9081msec
1
fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=10g -numjobs=1 -runtime=1000 -group_reporting --allow_mounted_write=1 -name=Rand_Read_Testing -filename=/data1/fiofile
1
   READ: bw=1441MiB/s (1511MB/s), 1441MiB/s-1441MiB/s (1511MB/s-1511MB/s), io=10.0GiB (10.7GB), run=7107-7107msec
  • RDMA 4K 随机读写
1
fio -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=10g -numjobs=1 -runtime=1000 -group_reporting --allow_mounted_write=1 -name=Rand_Write_Testing -filename=/data1/nfs/fiofile
1
  WRITE: bw=525MiB/s (551MB/s), 525MiB/s-525MiB/s (551MB/s-551MB/s), io=10.0GiB (10.7GB), run=19488-19488msec
1
fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=10g -numjobs=1 -runtime=1000 -group_reporting --allow_mounted_write=1 -name=Rand_Read_Testing -filename=/data1/nfs/fiofile
1
   READ: bw=674MiB/s (707MB/s), 674MiB/s-674MiB/s (707MB/s-707MB/s), io=10.0GiB (10.7GB), run=15191-15191msec
  • NFS 4K 随机读写
1
fio -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -size=10g -numjobs=1 -runtime=1000 -group_reporting --allow_mounted_write=1 -name=Rand_Write_Testing -filename=/data1/nfs/fiofile
1
  WRITE: bw=291MiB/s (305MB/s), 291MiB/s-291MiB/s (305MB/s-305MB/s), io=10.0GiB (10.7GB), run=35196-35196msec
1
fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=10g -numjobs=1 -runtime=1000 -group_reporting --allow_mounted_write=1 -name=Rand_Read_Testing -filename=/data1/nfs/fiofile
1
   READ: bw=311MiB/s (326MB/s), 311MiB/s-311MiB/s (326MB/s-326MB/s), io=10.0GiB (10.7GB), run=32909-32909msec

4.3 128K 读写

  • 主机磁盘 128K 读写
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=write -bs=128K --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/fiofiledir
1
  WRITE: bw=3842MiB/s (4029MB/s), 3842MiB/s-3842MiB/s (4029MB/s-4029MB/s), io=113GiB (121GB), run=30009-30009msec
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=read -bs=128K --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/fiofiledir
1
   READ: bw=6503MiB/s (6818MB/s), 6503MiB/s-6503MiB/s (6818MB/s-6818MB/s), io=191GiB (205GB), run=30006-30006msec
  • RDMA 128K 读写
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=write -bs=128K --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/nfs
1
  WRITE: bw=2814MiB/s (2951MB/s), 2814MiB/s-2814MiB/s (2951MB/s-2951MB/s), io=82.5GiB (88.6GB), run=30012-30012msec
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=read -bs=128K --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/nfs
1
   READ: bw=16.5GiB/s (17.7GB/s), 16.5GiB/s-16.5GiB/s (17.7GB/s-17.7GB/s), io=496GiB (533GB), run=30003-30003msec
  • NFS 128K 读写
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=write -bs=128K --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/nfs
1
  WRITE: bw=2249MiB/s (2358MB/s), 2249MiB/s-2249MiB/s (2358MB/s-2358MB/s), io=65.9GiB (70.8GB), run=30018-30018msec
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=read -bs=128K --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/nfs
1
   READ: bw=2556MiB/s (2680MB/s), 2556MiB/s-2556MiB/s (2680MB/s-2680MB/s), io=74.9GiB (80.4GB), run=30012-30012msec

4.4 4M 读写

  • 主机磁盘 4M 读写
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=write -bs=4M --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/fiofiledir
1
  WRITE: bw=3838MiB/s (4025MB/s), 3838MiB/s-3838MiB/s (4025MB/s-4025MB/s), io=113GiB (122GB), run=30242-30242msec
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=read -bs=4M --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/fiofiledir
1
   READ: bw=5510MiB/s (5778MB/s), 5510MiB/s-5510MiB/s (5778MB/s-5778MB/s), io=162GiB (174GB), run=30167-30167msec
  • RDMA 4M 读写
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=write -bs=4M --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/nfs
1
  WRITE: bw=3840MiB/s (4027MB/s), 3840MiB/s-3840MiB/s (4027MB/s-4027MB/s), io=113GiB (121GB), run=30159-30159msec
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=read -bs=4M --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/nfs
1
   READ: bw=42.4GiB/s (45.6GB/s), 42.4GiB/s-42.4GiB/s (45.6GB/s-45.6GB/s), io=1274GiB (1368GB), run=30018-30018msec
  • NFS 4M 读写
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=write -bs=4M --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/nfs
1
  WRITE: bw=2520MiB/s (2643MB/s), 2520MiB/s-2520MiB/s (2643MB/s-2643MB/s), io=74.2GiB (79.7GB), run=30165-30165msec
1
fio -numjobs=128 -fallocate=none -iodepth=2 -ioengine=libaio -direct=1 -rw=read -bs=4M --group_reporting -size=100m -time_based -runtime=30 -name=fio-test -directory=/data1/nfs
1
   READ: bw=2754MiB/s (2888MB/s), 2754MiB/s-2754MiB/s (2888MB/s-2888MB/s), io=81.5GiB (87.5GB), run=30312-30312msec

4.5 小结

4K 随机读写性能

测试类型传输方向主机磁盘 (MiB/s)RDMA (MiB/s)NFS (MiB/s)
随机写1128525291
随机读1441674311

128K 读写性能

测试类型传输方向主机磁盘 (MiB/s)RDMA (MiB/s)NFS (MiB/s)
顺序写384228142249
顺序读650316.5 GiB/s2556

4M 读写性能

测试类型传输方向主机磁盘 (MiB/s)RDMA (MiB/s)NFS (MiB/s)
顺序写383838402520
顺序读551042.4 GiB/s2754

5. 总结

本篇主要是介绍了 NFS Over RDMA 的配置与性能测试,从测试结果来看:

  • 无论是大文件还是小文件,NFS Over RDMA 的性能都要优于 NFS
  • 文件越大,NFS Over RDMA 的写性能越接近主机磁盘

NFS Over RDMA 的读取大文件性能远超主机上进行的测试,无论是改变持续时间还是并发数都能得到一致的结果,这有点无法解释。但从监控可以看到 RDMA 数据传输带宽与 FIO 测试结果一致,似乎又在说明结果是正确的。

难道是 NFS Over RDMA 的缓存机制所致?

6. 参考


微信公众号
作者
微信公众号