In a previous blog article, I covered how to get the infiniband fabric up on Ubuntu 10.10 (Maverick Meerkat). This time I’ll cover IPoIB on Ubuntu 11.10 (Oneiric Ocelot), and also add in the information on how to also get the SRP target configured and running. This guide does not require you to rebuild the linux kernel, and works from the stock install of Ubuntu 11.10. So you should be up-and running with an SRP target in about 15 minutes, all going well.
As before, I’m using an Ubuntu box for the disk subsystem, and a Windows 7 as a client (initiator) machine.
Here’s a few acronyms:
SCSI – we should know what this one is (Small Computer Systems Interface)
iSCSI – SCSI protocol over a network. (Internet SCSI).
SCST – SCSI Target subsystem for linux with target drivers for iSCSI, Fibre Channel, SRP, SAS, FCoE, etc. We’re most interested in the SRP target.
RDMA – Remote Direct Memory Access – fast way of copying chunks of memory from one machine to another.
SRP – SCSI RDMA Protocol – wraps it all up using RDMA over SCSI protocol.
Stage 1 – Setting up IPoIB
This section shows how to get the basic fabric up and running with TCP/IP over Infiniband (IPoIB). Once you’ve done this, you’ll be able to send pings between the machines, ssh from one to the other across the fabric, etc.
For the windows 7 machine, it’s a simple case of installing the OFED drivers from openfbrics.org. While installing the windows drivers, make sure you select the SRP options, as later in this guide we’ll be setting up the RAID box as a drive on the Windows box using SRP (iSCSI RDMA Protocol). So, install the OFED driver package.
To enable improved throughput, enable connected mode on the link by updating the “Connected Mode” and “Connected Mode Payload Mtu size” in the properties of the interface. Open up “Device Manager” and find your Infiniband adapter:
Right clock on the adapter and bring up the properties dialog:
So, on to the linux box. I started with a fresh install of Ubuntu 11.10.
Firstly Install Ubuntu. Update all the packages to the latest using
sudo apt-get update sudo apt-get upgrade
Everything below is done as root, this avoids having to type ‘sudo’ before every command, so I just call “sudo bash”.
Note: No need to edit the udev rules any more as per the Ubuntu 10.10 HOWTO, as they are correctly set by the kernel 3.0.0 and newer.
Edit /etc/modules and add the following modules:
ib_sa ib_cm ib_umad ib_addr ib_uverbs ib_ipoib ib_ipath ib_qib
Next,
apt-get install opensm
This will install the subnet manager and all the relevant dependencies, libibverbs, etc.
Then add the relevant entries for the interface into /etc/network/interfaces file:
auto ib0 iface ib0 inet static address 192.168.1.1 netmask 255.255.255.0
Then reboot. This will create the relevant infiniband entries in /sys, load the ipoib modules, and bring up the infiniband port with an ip address. You should now have a functioning infiniband port on your Ubuntu machine, and you should be able to ping the remote machine.
The next thing is to enable connected mode for the infiniband connection. I found that this increased the tcp/ip netperf benchmarks from 3gbps tp 7gbps.
root@raid:~# echo connected >`find /sys -name mode | grep ib0` root@raid:~# echo 65520 >`find /sys -name mtu | grep ib0`
To make this happen when the ib0 interface is brought up, modify the /etc/network/interfaces file as follows:
auto ib0 iface ib0 inet static address 10.4.12.1 netmask 255.255.255.0 up echo connected >`find /sys -name mode | grep ib0` up echo 65520 >`find /sys -name mtu | grep ib0`
Stage 2 – Setting up SRP Target
The next step, should you wish to take it, is to set up the SRP targets. This is much better option than using samba shares, as it gives you a massive boost in throughput, and uses much less CPU. For example, on my setup using IPoIB and samba shares, I was able to achieve no more than 125MB/sec. Using a ramdisk set up as an SRP target, I was able to achieve 900MB/sec between my two machines.
So, first get a few packages:
apt-get install libmthca1 apt-get install iscsitarget apt-get install open-iscsi apt-get install lsscsi apt-get install scsitools
change /etc/default/iscsitarget to:
ISCSITARGET_ENABLE=true
We now need to get scstadmin. I typically pull the latest version from subversion, so we first need to get svn. apt-get install subversion
There’s a choice to me made now. According to the guide at http://iscsi-scst.sourceforge.net/iscsi-scst-howto.txt , you get a slight performance increase by patching and re-compiling the kernel. That’s a lot of work, so I skipped that step for this guide. I went for the easier option, which is just to run the few commands below. I’m not sure what the difference in performance is, but I can still pull 900MB/sec across the fabric from a ramdisk set up as a an SRP target. And not having to rebuild the kernel makes this a 15 minute procedure rather than a 3 hour one.
cd ~ svn co https://scst.svn.sourceforge.net/svnroot/scst/trunk scst cd scst make scst scst_install make iscsi iscsi_install make scstadm scstadm_install make srpt srpt_install
N.B.: You’ll need to rebuild these each time Ubuntu upgrades the kernel, as the modules get out of sync. Just re-run the make’s again.
Lets now use scstadmin to create an SRP target:
First we modprobe a couple of kernel modules. That will make the relevant drivers and targets available to scstadmin. We should then be able to see thse drivers/targets with a few scstadmin ‘list’ commands:
modprobe scst_vdisk modprobe ib_srptroot@raid:~# scstadmin -list_handler Collecting current configuration: done. Handler ------------- vdisk_fileio vdisk_blockio vdisk_nullio vcdrom All done. root@raid:~# scstadmin -list_driver Collecting current configuration: done. Driver ------- ib_srpt All done. root@raid:~# scstadmin -list_device Collecting current configuration: done. Handler Device ------------------------ vdisk_nullio - vdisk_fileio - vdisk_blockio DISK01 vcdrom - All done. root@raid:~#So, if you can see those handlers, targets, and drivers, your good to go with the next step.
Summary:
The following command list is the quick list to use (and modify) if you’ve done this kind of thing before. The detail section below gives a bit of explanation on each of the lines.
scstadmin -disable_target ib_srpt_target_0 -driver ib_srpt scstadmin -clear_config -force scstadmin -open_dev DISK01 -handler vdisk_blockio -attributes filename=/dev/sda scstadmin -set_dev_attr DISK01 -attributes t10_dev_id=0x2346,threads_num=4 scstadmin -add_group HOST01 -driver ib_srpt -target ib_srpt_target_0 scstadmin -add_lun 0 -driver ib_srpt -target ib_srpt_target_0 -group HOST01 -device DISK01 -attributes read_only=0 scstadmin -add_init 0x0002c9020021f9fc0002c902002200bc -driver ib_srpt -target ib_srpt_target_0 -group HOST01 scstadmin -enable_target ib_srpt_target_0 -driver ib_srpt scstadmin -write_config /etc/scst.conf
Detail:
Clear the current config (only if that’s ok, and you don’t have any other config that you want to keep. I do this because I’m starting with a clean slate).
scstadmin -clear_config -force
Create DISK01, assigning it to a partition (/dev/sdg1, /dev/md0p1, etc., etc.). I’m using a disk partition in this example.
scstadmin -open_dev DISK01 -handler vdisk_blockio -attributes filename=/dev/sdg1
Now set the drive attributes.
scstadmin -set_dev_attr DISK01 -attributes t10_dev_id=0x2345
Now add a group
scstadmin -add_group HOST01 -driver ib_srpt -target ib_srpt_target_0
Add a LUN to the group, assigning it to DISK01
scstadmin -add_lun 0 -driver ib_srpt -target ib_srpt_target_0 -group HOST01 -device DISK01 -attributes read_only=0
Add an initiator to the group, allowing the initiator to connect to our new target. I got this from watching /var/log/messages while I was disabling and enabling the Infiniband SRP miniport in device manager on the Win7 box. This caused the SRP miniport to attempt to connect to the Ubuntu target, and that attempt is shown in the /var/log/messages file along with the initiator ID.
scstadmin -add_init 0x0002c9020021f9fc0002c902002200bc -driver ib_srpt -target ib_srpt_target_0 -group HOST01
Finally enable the target
scstadmin -enable_target ib_srpt_target_0 -driver ib_srpt
And write the config.
scstadmin -write_config /etc/scst.conf
At this point you should see a new drive appear on the Win7 box, and asking you to format it. If not, you could try disabling and enabling the SRP miniport driver again, and take a look at /var/log/messages to see what’s happening.
Once the new drive appears on the Windows host, you should be able to format it and start using it. If you don’t see it immediately, have a look in “Disk Management”.
I found that the //etc/init.d/scst script was not getting called at boot, so I added a softlink in /etc/rcS.d
cd /etc/rcS.d ln -s ../init.d/scst S26scst
This is so it will start after scsitools, opensm and open-iscsi. Using the /etc/init.d/scst script also resolved a problem I’ve had for a long time, in that I used to have some modules in /etc/modules, so the SRP miniport would attempt to connect too early in the linux boot sequence, and fail, so that I had to go into the device manager and disable/enable to get it working. Now it pops up perfectly every boot of the linux box, and I don’t have to go into device manager on the Win7 box any more.
I’d suggest you run a benchmark on it also just to see what kind of speed you’re getting out if it in real-world usage. There’s a very handy (free) benchmarking took available from attotech.com. Oh, and please do drop a comment below if you do get decent speeds. I’m always interested to hear what people are getting. Oh, and also, comment if you find this guide useful!
References:
SCST – http://scst.sourceforge.net/
Installing iSCSI-SCST – http://iscsi-scst.sourceforge.net/iscsi-scst-howto.txt
Performance Testing.
If you have plenty of memory in your linux machine, you might like to do a test of a ramdisk SRP target over the fabric.
There’s a ramdisk set up by default in Ubuntu at /dev/ram0, but it’s quite small, at 64K. I like to bump it up to 1Gig or 2Gig, as I have 4Gig in my linux box and it does need all that for normal operation. Adding an extra parameter to the kernel line in /boot/grub/grub.cfg does the trick nicely.
ramdisk_size=1024000 for a 1Gig ramdisk at /dev/ram0
ramdisk_size=2048000 for a 2Gig ramdisk at /dev/ram0
so the full line would look like:
linux /boot/vmlinuz-3.0.0-12-generic root=UUID=39331d95-e9ba-48a5-9dd9-09978e0503c4 ro quiet splash vt.handoff=7 ramdisk_size=1024000
Then using scstadmin, we add another disk to the script above, but we use vdisk_fileio rather than vdisk_blockio.
scstadmin -open_dev DISK02 -handler vdisk_fileio -attributes filename=/dev/ram0
Then when you open disk manager in windows, you should see a disk requesting a new MBR, which you go ahead and add, then format and run a few benchmarks on it.
Tags: backup, Bandwidth, Infiniband, RAID, throughput














Nice how-to david – found your blog after going on a similar venture (http://forums.whirlpool.net.au/forum-replies.cfm?t=1836291)
As you can see, similar speeds (little bit less for me). I’ll probably be going further with SDP as the solution (after I test your SRP solution – burning Oneiric now) – as I’m after the same solution (10GbE Samba).
FWIW, for Ubuntu 11.10 and newer, you should not need any changes to udev rules. The required paths and modes are set by the kernel now — the changes are in kernel 3.0 and newer.
Thanks Roland, I’d been meaning to test that out. I did see the rule warnings in syslog, but never got around to fixing. I’ve updated the HOWTO above. Rules removed.
Since you were able to tweak the IPoIB throughput to 7Gbps, it’s pretty clear samba is as bad bottleneck, although as you pointed out, the disk overhead might have something to do with it. Did you try pointing the share to the ramdisk to see if that helped?
Slightly off topic: are you using just one port or both on the Mellanox cards? I’m thinking about daisy-chaining multiple machines and bridging the connections to provide extra storage (if possible, that is. Does mdam support SRP targets?).
Hi David,
After reading your fantastic blogs, I made the jump and bought a dual port and single Mellanox ConnectX-2 (QDR) cards from ebay and couple QSFP (QDR) active optic cables.
The dual port card will be installed in the storage server (Ubuntu 11.1) with LSI 9265-8i raid card. I am just a bit puzzled whether the SRP Target setup is good for 2 clients (Windows 7/8 and Mac) disk access after the disk array has been formatted by the Windows 7/8 client first. I am planning to have the Windows and Mac clients each plugs into the dual port IB card. I wondered if the 2nd client would be able to access the ntfs formatted volume.
For Mac, unfortunately I need to go with 10Gbe ethernet as OSX 10.7 3does not provide native drivers for IB. To do this, I will need to install a 10Gbe QFSP module to the 2nd port of IB card. Then find a way to install a 10Gbe card to the thunderbolt (PCIE) expansion bus.
Patrick,
As I understand it, you will not be able to access the one volume from multiple clients at the same time, unless the two clients are clustered. Normally you’d split the array into several volumes, then allocate the volumes to the clients on a 1:1 basis. If you want to access the same volume with multiple clients, you could go the samba route, but you’re losing a lot of bandwidth. Also, there’s the possibility of NFS in conjunction RDMA, but I haven’t tested that as I don’t have a client machine capable of NFS with RDMA.
Rgds,
Dave.
Dave,
Thanks. That’s what I thought out too. About SAMBA performance via IPoIB, I remember I read somewhere the newer ConnectX, X2 and X3 handle IP much faster through TCP/UDP/IP stateless offload. I will first try SAMBA to verify this claim and report back.
As to file sharing for Windows clients, SMBoRDMA remedy is in sight according to these:
http://www.hpcwire.com/hpcwire/2011-10-03/mellanox_announces_collaboration_with_microsoft_on_multiple_technology_demonstrations_of_new_infiniband_rdma_support_in_windows_server_%E2%80%9C8%E2%80%9D.html
http://ubiqx.org/presentations/DTC/DTC-2011.pdf
It means all Windows 8 clients with Mellanox card (ConnectX series only?) in theory should be able to access the same volume via SMBoRDMA.
Just found NFS4.1 Windows client from this post:
http://arstechnica.com/civis/viewtopic.php?f=10&t=1140383
Direct download link:
http://citi.umich.edu/projects/nfsv4/windows/
The source of the driver does include SDP (WSD) support but no idea if this was compiled into binary.
I will test this NFS over SDP combination.
Hi Dave,
I’m a novice here trying to setup an Infiniband connection between two Ubuntu 11.10 boxes using the same Mellanox adapters that you used. I followed your instructions but seem to be stuck after rebooting the two machines. When I check the status of both cards, both seem to show one active port each. Checking the opensm logs for one machine shows that the subnet is up. The second machine shows that it is “Entering STANDBY state.” I cannot ping between the two machines. Any ideas about what I’m doing wrong? Your help (or anyone else’s) will be much appreciated.
Thanks.
With default IPoIB mode (non connected, datagram, etc.) you could try to enable more TCP Offloading option, like Large Receive Offload.
Try it with ethtool -K ib0 lro on
Check with ethtool -k ib0
Offload parameters for ib0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: off
large-receive-offload: on
ntuple-filters: off
receive-hashing: off
Hi,I following your configuration on ubuntu 11.10 amd64 but it doesn’t work.Recently, I build a InfinitBand network on HP SL390s blade, but ubuntu can not recognize the InfiniBand Adapter ,when modprobe ib_addr ib_umad ib_cm ib_sa
ib_mad ib_core mlx4_core ect. add item about ib0 in /etc/network/interfaces it not work still. Even use OFED source code,however, it can not to be complied and install.
My Adapter Card is InfiniBand: Mellanox Technologies MT26438 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE Virtualization+] (rev b0)
Please give me advices! Thank you!
Your infiniband series has been really interesting. I’ve been setting up a Ganeti cluster at home, and wanted something faster than GigE for replicating disks, and Infiniband seems to be just the thing. I have 4 machines with cheap Mellanox IB cards, and I added a cheap ($290) Voltaire switch off of eBay.
I’m using Ubuntu 12.04, and everything is working *perfectly*.
Using connected mode with a big MTU, netperf reports between 7760 and 7810 Mbps, with no additional tuning at all. I’ve seen drbd copying at around 250 MB/sec; beyond that I’m disk-bound for now.
Thanks!
Hi Dave, I have been trying to set up the infiniband using the method you suggest however i am still unable to set it up. The upfront problem seems to be that i am unable to bring up the ib0 up, it says the device is not found . I have tested out the card in win7 and it works.
Hi Scott, how are you able to make them work in 12.04 version?
I added the modules listed to /etc/modules, rebooted, and ib0 and ib1 appeared. As mentioned, you’ll need to run a subnet manager if you don’t have a switch that provides one, but just loading the modules should be enough to get the devices to appear.
Then go ahead and give your interface(s) IP addresses and configure them for use, just like any other interface.
I have exactly the same hardware as ChangLiwei, nothing works at all as described above. After adding modules and interfaces entry and installing opensm, all one sees in syslog is a line stating “ib_qib: Unable to register ipathfs”. No entries are created under /sys/class/infinfiband.
For now am devastated to have to use centos 6.2 with official drivers (which is working fine however) instead of beloved Ubuntu!
Hi,
I have been breaking my brain on getting more than 2 targets online at the same time.
I have a config with tree Infiniband cards in the storage system and I like to have LUN’s assigned to them.
I can have see luns on my Windows 2K8R2 server but when I enabe a third LUN in scst.conf it breaks and I get a warning in the Infiniband SRP Miniport telling me it cannot start.
Hope anyone has an idea how to fix this.
My SCST.CONF, looks like this
HANDLER vdisk_blockio {
DEVICE QUORUM {
filename /dev/sdc
t10_dev_id 0×2345
}
DEVICE DISKGOLD {
filename /dev/sdd
t10_dev_id 0×2355
}
DEVICE DISKSILVER {
filename /dev/sde
t10_dev_id 0×2365
}
DEVICE DISKBRONZE {
filename /dev/sdf
t10_dev_id 0×2375
}
}
TARGET_DRIVER ib_srpt {
TARGET ib_srpt_target_0 {
enabled 1
rel_tgt_id 1
LUN 0 QUORUM
LUN 1 DISKGOLD
# LUN 2 DISKSILVER
# LUN 3 DISKBRONZE
# }
#
# TARGET ib_srpt_target_1 {
# enabled 1
# rel_tgt_id 2
#
# LUN 2 DISKSILVER
# }
#
# TARGET ib_srpt_target_2 {
# enabled 1
# rel_tgt_id 3
#
# LUN 3 DISKBRONZE
# }
}
I’ve a pair of machines talking using a pair of Mellanox MT23108 cards. I have a few ‘issues’.
It only seems to work if I boot machine A first and then machine B, if I boot them the other way round then the machines can’t even ping each other ( I use Ubuntu 11.10 on B and Ubuntu 12.04 on A ).
Once started in the right order the networking connects and all seems fine. Except that if I use iperf from B to A I get a rate of around 550Mbit/s if I do A to B I get 1.2Gbit/s.
When I boot machine A I get some messages from the card and it seems to be looking for a switch, whereas machine B doesn’t display anything from the card at all – they are different types of machine ( A-HP DL585 G2, B – HP XW9400 ). Not sure if that’s significant.
Trying to compare 10gbe vs IB. Were using 2 machines and topspin switch as a testbed for IB. We followed this article with ubuntu 11.1 64bit server on one machine and windows 7 pro 64bit on the other using MELLANOX MHEA28-XTC cards.
We can get the miniport driver on but no disk to list in the disk management on the windows 7 machine.
here is the list from scstadmin -list_group HOST01
Driver: ib_srpt
Target: ib_srpt_target_0
Driver/target ‘ib_srpt/ib_srpt_target_0′ has no associated LUNs.
Group: HOST01
Assigned LUNs:
LUN Device
————-
0 DISK01
1 DISK02
Assigned Initiators:
Initiator
———————————
0x00000000000000000002c9020022a07
Looking at line 4 we seem to be having an issue with linking the lun to the driver/target. We have tried removing and adding luns with the scstadmin -rem_lun and -add_lun commands but still the same results. Has anyone else had this issue or able to give a solution.
Thanks
Hi,
We have a problem that seems to be consistent and looks like it is coming from the Windows 2008 R2 servers.
We have tried 3 different Linux versions and created targets on them, as long as we have one or two targets, the windows machines will see the disks, as soon as we enable a third they all disappear and the “Infiniband SRP miniport” will give the message “THIS device cannot start”.
Hello David, I need your help.
OS is Ubuntu x64, kernel 3.0.0-12.
I edit the /etc/modules as you wrote in blog, then install opesm.
I add the ib0 configs to /etc/network/interfaces and then reboot, in bootlog i have “ib_qib: Unable to register ipathfs”, and the ib0 doesn’t creates.
However, lspci can see it – ‘Infinibnd:Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)’
What i need to do?
Best regard, Rustam!
p.s. i use two servers (supermicro) with mellanox IB card(qdr), and IB switch.
When i power up server, IB switch, and connect the IB wires – the leds on IB cards don’t light on, is it good? And on the IB switch i have top led (warning) is switched on.
[...] 2013-03-10: David Hunt wrote some very insteresting posts regarding InfiniBand at home project: 1,2,3,4 and storage 5. You have to take a look This entry was posted in hardware, networking by [...]
By the way:
TMPFS filesystem is faster than RAMDISK, however it has no device file like ramdisk has. And after using it just dismount folder and mount again and fall memory is unleased!
Hi David,
Thanks for putting this info together. It’s been a real help to me. I’ve followed along for several days now, starting with fresh Ubuntu builds a number of times. I’ve got the infiniband up and going between a Windows 7 box and an Ubuntu 12.04.2 installation. The bump in the road I seem to be having comes when I try to set up the SRP target on the Ubuntu box. In an earlier step when I try to list the drivers:
scstadmin -list_driver
the driver list is empty. No mention of ib_srpt.
Later, when I attempt to add a target to a group:
scstadmin -add_group HOST01 -driver ib_srpt -target ib_srpt_target_0
I get the following message:
“No such driver exists.”
This message appears to be originating from the SCST code. I’ve spent days trying to get to the bottom of this.
Admittedly, I’m not a Linux pro. I spend the majority of my time working on code and not operating systems.
If you or anyone else might be in a position to shed a little light on what might be at the heart of the issue, I would be tremendously grateful.