Posts Tagged ‘Storage’

SRP vs Samba on 10Gb Infiniband Fabric

Saturday, January 14th, 2012

 

Here’s some more ramblings on 10G Networking on the Cheap.

Having gone to all the trouble of getting a 900MB/sec SRP target working, I was interested to go back and see how a samba mount would perform now that I’d sorted out all the cards into their optimum PCI-express slots.

So, I did the same setup as the 900MB/sec SRP test, Ramdisk to Ramdisk. This time, I set up a 2Gig ramdisk as a samba mount, and copied large files to and from that ramdisk from a 4G Ramdisk on the Windows side. ATTO disk benchmark doesn’t test network drives, so I had to make do with copying large files and keeping an eye on the throughput with dstat.

I initially measured the IPoIB throughput using netperf, than this gave me about 6Gbits.

daveh@raid:~$ netperf -H 10.4.12.2
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0
AF_INET to 10.4.12.2 (10.4.12.2) port 0 AF_INET : demo
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec
 8192  16384  16384    10.00    6022.50

Based on that number, I was not expecting anywhere close to the 900MB/sec throughput I was getting with SRP. Anyway, just to compare samba throughput with SRP, I pressed on.

While copying a 1 GB video file to
the ramdisk on the linux box (write)
usr sys idl wai hiq siq| read  writ| recv  send
  3  57  31   0   0   9|   0     0 | 191M  565k
  3  61  28   0   0   8|   0     0 | 188M  556k
  4  56  31   0   0   9|4096B   40k| 192M  568k
  4  59  32   0   0   6|   0     0 | 163M  485k
  5  56  29   0   0   9|   0     0 | 186M 6419k
  3  61  29   0   0   8|   0     0 | 187M 6519k
  2  64  25   0   0   9|   0     0 | 183M  543k
 

While copying a 1 GB video file from
the ramdisk on the linux box (read)

usr sys idl wai hiq siq| read  writ| recv  send
  3  40  54   0   0   4|   0  4096B| 863k  251M
  6  42  47   1   0   5|   0    52k| 910k  263M
  5  44  48   0   0   5|   0     0 | 816k  233M
  5  42  46   0   0   6|   0     0 | 812k  231M
  4  45  48   0   0   4|   0     0 | 826k  236M
  3  44  49   0   0   5|   0     0 | 819k  234M
  4  44  48   0   0   5|   0     0 | 832k  236M

I know it’s not very scientific, and I didn’t spend any time tuning samba. But the 900MB/sec throughput is practically out-of-the-box for SRP, so why not test samba at it’s out-of-the-box settings. :)

So, even with netperf showing a 6Gbit throughput, I was only seeing around 190-235MB/sec throughput over samba (at 40-60% CPU usage).  So I think I’ll stick with my SRP target setup rather than going back to samba. With SRP, I know that whatever disk subsystem I put into the linux box, it probably won’t be able to keep up with the 900MB/sec  that the link is capable of. Unless, that is, I buy a very expensive RAID card and a bunch more disks. And with the way disk prices have gone over the last few months with the floods in Thailand, it’ll be a while until I even think about doing that. :)

 

 

Infiniband at Home (10Gb networking on the cheap)

Wednesday, February 9th, 2011

Would you like to have over 700MB/sec* throughput  between your PCs at home for under €110? That’s like a full CD’s worth of data every second! If you do, then read on….

–edit–

I’ve found the real-world throughput of infiniband from a windows machine and an ubuntu machine very much depends on the type of protocol that you use to communicate between the machines. For example, when using samba mounts and using IPoIB, the max throughput is of 135MB/sec @ 100% CPU, just under twice my 1gbps ethernet (75MB.sec). . However, later research showed that by using SCSI RDMA Protocol (SRP), the IPoIB bottleneck is removed, and it’s possible to get the throughput up past 700MB/sec. Using RAMDisk I was able to pull over 900MB/sec from the SRP target to the Windows initiator. Care needs to be taken in the selection of PCI-express slots, though, and you need to understand how many PCI-express lanes are assigned to each slot to get maximum throughput.

—end edit—

With the increasing amout of data that I have to manage on my computers at home, I started looking into a faster way of moving data around the place. I started with a RAID array in my PC, which gives me read-write speeds of 250MB/sec. Not being happy with that, I looked a creating a bigger external array, with more disks, for faster throughput. I happened to have a decent linux box sitting there doing very little. It had a relatively recent motherboard , and 8 SATA connectors.  But no matter how fast I got the drives in that linux box to go, I’d always be limited by the throughput of the 1Gb ethernet network between the machines, so I researched several different ways of inter-PC communication that might break the 1gbps barrier. The 1GB ethernet was giving me about 75MB/sec throughput.

The first I looked at was USB 3.0 (5 gbit/s). While that’s very good for external hard drives, there didnt seem to be a decent solution out there for allowing multiple drives to be added together to increase throughput. We are now starting to see raid boxes appear with USB3.0 interfaces, but they are still quite expensive. To connect my existing linux box to my windows desktop, I’d need a card with a USB 3.0 slave port so that the external array would look like one big drive, and max out the 5Gbps bandwidth of a USB 3.0 link . However, these do not seem to exist, so I moved onto the next option.

Then I moved on to 10G Ethernet (10 gbit/s). One look at the prices here and I immediately ruled it out. Several hundred Euro for a single adapter.

Fibre channel (2-8 gbit/s). Again the pricing was prohibitive, especially for the higher throughput cards. Even the 2Gbps cards were expensive, and would not give me much of a boost over 1Gbps ethernet.


Then came Infiniband (10-40 gbit/s). I came across this while looking through the List of Device Bit Rates page on Wikipedia. I had heard of it as an interconnect in cluster environments and high-end data-centres. I also assumed that the price would be prohibitive. A 10G adapter would theoretically give up to a Gigabyte per second throughput between the machines. However, I wasn’t ruling it out until I had a look on eBay at a few prices. To my surprise, there was a whole host of adapters available ranging from several hundred dollars down to about fifty dollars. $50? for a 10Gig adapter?   Surely this couldn’t be right. I looked again, and I spotted some dual port Mellanox MHEA28-XTC cards at $35.99. This worked out at about €27 per adapter, plus €25 shipping (and I’ve seen these same cards as low as $22.99 each on eBay). Incredible, if I could get it to work. I’d also read that it is possible to use a standard infiniband cable to directly connect two machines together without a switch, saving me about €700 in switch costs. If I wanted to bring another machine into the Infiniband fabric, though, I’d have to bear that cost. For the moment, two machines directly connected was all I needed.

With a bit more research, I found that drivers for the card were available for Windows 7 and Linux from OpenFabrics.org, so I ordered 2 cards from the U.S. and a cable from Hong Kong.

About 10 days later the adapters arrived. I installed one adapter in the Windows 7 machine. Windows initially failed to find a driver, so I then went on the OpenFabrics.org website and downloaded OFED_2-3_win7_x64.zip. After installation I  had two new network connections available in windows (the adapter was dual-port), ready for me to connect to the other machine.

Next I moved onto the Linux box. I won’t even start with the hassle I had to install the card in my linux box. After days of research, driver installation, kernel re-compilation, driver re-compilation, etc. etc., etc., etc., I eventually tried swapping the slot that I had the card plugged into. Low and below, the damn thing worked. So, my mother board has two PCI-Ex16 slots, and the infiniband adapter would work in one, but not in the other. Who would have thought. All I had to do then was assign an IP address to it. –EDIT– here’s a quick HOWTO on getting the fabric up on Ubuntu 10.10. About 10 minutes should get it working – http://davidhunt.ie/wp/?p=375 –EDIT–

Without a cable (it still had not arrived from Hong Kong), all I could do was sit there and wait until it arrived to test the setup. Would the machines be able to feed the cards fast enough to get a decent throughput? On some forums I’d seen throughput tests of 700MB/sec. Would I get anywhere close to that with a 3GHz dual core athlon to a 3GHz i7 950?

A few days later, the cable arrived. I connected the cable into each machine, and could immediately send pings between the machines. I’d previously assigned static IP addresses to the infiniband ports on each machine. I wasn’t able to run “netperf”, as it didn’t see the cards as something it could put traffic through. So I upgraded the firmware on the cards, which several forums said would improve throughput and compatibility. Iwas then able to run netperf, with the following results:

root@raid:~# netperf -H 10.4.12.1
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) 
port 0 AF_INET to 10.4.12.1 (10.4.12.1)
port 0 AF_INET : demo
Recv   Send   Send
Socket Socket Message Elapsed
Size   Size   Size    Time     Throughput
bytes  bytes  bytes   secs.    10^6bits/sec
87380  16384  16384   10.00    7239.95

That’s over 7 gigabits/sec*, or over 700MB/sec* throughput between the two machines!

So, I now have an Infiniband Fabric working at home, with over 7 gigabit throughput between PCs. The stuff of high-end datacentres in my back room. The main thing is that you don’t need a switch, so a PC to PC 10-gigabit link CAN be achieved for under €110! Here’s the breakdown:

2 x Mellanox MHEA28-XTC infiniband HCA’s @ $34.99 + shippping = $113 (€85)

1 x 3m Molex SFF-8470 infiniband cable incl shipping = $29 (€22)

Total: $142 (€107)

The next step is to set up a raid array with several drives and stripe them so they all work in parallel, and maybe build it in such a way if one or two drives fail, it will still be recoverable (raid 5/6). More to come on that soon.

References:
http://hardforum.com/showthread.php?p=1036510049
http://www.zdnet.com/blog/storage/build-a-10-gbit-home-network-for-1100/284
http://www.gossamer-threads.com/lists/drbd/users/19594
http://www.mellanox.com
http://www.openfabrics.org

* It was a long time before I achieved 700MB/sec  in real-world tests. 900MB/sec was eventually achieved using a RAMdisk as an SRP target, with carefully selected PCI-express slots for all the cards to make sure there were no bottlenecks on the motherboard. CPU usage on the Linux box during the transfers is around 50%.

 

Additional Notes:

Since this article was originally written, I’ve found the real-world throughput of infiniband from a windows machine and an ubuntu machine very much depends on several factors.

  1. The type of protocol that you use to communicate between the machines.  SRP is absolutely essential to maximise the throughput, being about 8 times faster than samba over IPoIB.
  2. The availability and selection of PCI-express slots. Many motherboards have less PCI-express lanes assigned to ports than their physical appearance. For example, 16x slots are sometimes only assigned 1 lane, depending on bios settings.
  3. Disk subsystem speed. Hardware Raid solutions help a lot in this case.

For example, when using samba mounts and using IPoIB, the max throughput is of 135MB/sec, just under twice my 1gbps ethernet (75MB.sec). And the linux box is at 95% CPU while doing transfers. However, later research showed that by using SCSI RDMA Protocol (SRP), the IPoIB bottleneck is removed, and the link will then run as fast as my disk subsystem will allow (curerntly 250MB/sec reads and 200MB/sec writes at 5% CPU usage on the linux box).

I got a Dell PERC 5/i RAID card, and that gave a dramatic increase in disk subsystem throughput, but still not much more than 350MB/sec. More expensive RAID cards should help, but I’m happy with it for the moment.

 

 

 

USB 3.0 is the Future

Friday, October 1st, 2010

From reading this blog, you might get the idea that I’m obsessed with disk speeds as of late (on the cheap, I might add). Well, you’re not wrong there. And to add to my obsession, I got my hands on a USB 3.0 PCIExpress card and a SATA-USB 3.0 atapter today. After installing the drivers for the card, I then attached a SATA hard drive to the adapter, and plugged it into the PC. Up it came no problem, so the first thing I did was to run ATTO Disk benchmark tool against the drive. Here are the results:

Sustained 80-85MB/sec read/write. It sure beats the hell out of USB2.0, at about 25-30MB/sec.  I don’t have a decent drive available to test it faster, but theoreticaly USB3.0 is good to >400MB/sec.

Next purchase will probably be a USB3.0 Compact Flash reader, but they’re very thin on the ground. Oh, and some good and fast Compact Flash cards to go along with it. :)

UPDATE:

I got a new motherboard, which has USB 3.0 built in, and this week Aldi were doing a special on USB 3.0 1TB drives for €79.99. I got one, and the following is the ATTO benchmark results:

More RAID Wonderfulness…

Tuesday, September 21st, 2010

With the increase in the amount of data taken up by images on my PC data throughput has become a real problem. Dozens of seconds to load and save files, not to mention 25 seconds to open Photoshop. Each image I take of my camera is 25-30MB, and when I am working on a file, it can be 600-800MB for a few days until I’m happy with it and compress it down to a flattened image at about 100MB. Loading and saving these files can take up to a minute on my PC.

I started looking into increasing the performance of editing images on my PC. The main prompt was the fact that my 1TB drive in my pc crashed (I had 3 backups so nothing lost). I took a look at what the main bottlenecks were, and it seemed that disk speed was one of the main problems. Having discussed RAID systems in a previous post, I thought I’d investigate setting up a striped array as a second drive on my desktop. Striping allows two disks to be “merged” showing up as 1 disk to windows, but the underlying writes are spread across each disk, theoretically doubling the read/write performance.

First of all I discovered that Windows 7 now includes software RAID, and the benchmarks on the web showed that the speeds were very similar to the motherboard RAID solutions for both RAID 1 and RAID 0 (Mirroring and Striping), so I had a go at that. Well into the process of copying about 800GB of data around the place to make two spare drives, I found that only Windows 7 Professional and Windows 7 Ultimate allow RAID configurations. I’ve got Windows 7 Home Premuim, which DOES NOT have RAID functionality. Bummer.

Then I took a closer look at my motherboard, and it has a RAID controller built in. But there was a problem, I couldnt enable the RAID controller in the BIOS without causing my current windows installation to give me a BSOD (Blue Screen of Death) at boot. Even the solution suggested by Microsoft didnt help, so I freed up another 500G SATA drive and installed a fresh copy of Windows 7 on it, after enabling the RAID controller on the motherboard. It was very particular about the BIOS settings to get windows recognising the SATA CD-ROM drive and the hard disk. Anyway, once I got the settings right, I installed Windows and the nVidia RAID software.

Then, I installed my two 1TB drives (Seagate Baracuda 7200.12′s, €80 each in Maplin), and kicked off the nVidia storage manager. This is quite a neat piece of software that allows easy configuration of arrays based on spare disks in the system. I started a couple of self tests on the disks, which checked out healthy. I chose not to migrate any existing data, so the new striped array was created in seconds in the nVidia storage manger. Then using the Disk Manager in windows to create a couple of partitions on that disk, which again, took seconds, I started up the ATTO Disk Benchmark Utility. Previously I had tested the Seagate disks individually, giving me a max speed of about  125MB/sec. With the new striped array, this was increased to 247MB/sec! Success.

ATTO Benchmark for Striped 1TB Seagate Baracuda 7200.12′s

Because it’s a striped disk, I need to be very careful to have a good backup policy, because if one of the disks in the array goes, I lose ALL the data. If youwant data security, dont use RAID 0 (striping).

Now to install Photoshop CS4, then the CS5 upgrade, and restore my photos onto it and see what kind of performance improvement I get in general use editing pictures. I don’t expect the actual processing to be much quicker. but organising images in bridge, generating previews, loading/saving large TIFFs should all be improved.

Once Photoshop was installed, I started it up, it took about 8 seconds. Closing and opening again took about 4 seconds. This is an improvement from about 25 seconds from cold with my old 80MB/sec boot drive.

Adobe bridge seems a lot snappier now, with previews loading much quicker. Also, I can now save a 1GB TIFF file in under 10 seconds. I reckon this will save me a several days over the course of a year :)

Backing Up Photos

Friday, April 30th, 2010

I just embarked on an project to make backing up photos easier, and decided I wanted to get myself set up with a RAID array. This would allow me to copy my photos once to the array, and it would then automatically create a mirror of each file on the disks, so I’d be backing up twice without even thinking about it.

I’ve known about RAID (Redundant Array of Independent Disks) for quite a while now, and also use them at work, but those types start at $15,000, and go up from there. I’m on a much more modest budget, and even the €350 upwards models on the internet seemed a bit steep, so I decided to build my own. Initially, I thought I’d use a motherboard with built-in RAID, or get a RAID controller for the PC, then stick in a few disks. After researching a bit, I found that RAID controllers can be got for as little as $30 on E–Bay, but there’s limitations in the size of disk you can use with the older ones, and the more recent ones cost a good bit more, maybe $70 upwards to $600 for the top-of-the-range cards.

Being even stingier than that, I then looked at the motherboards I had lying around the house. My desktop was full, the kids PC had RAID on the motherboard, but it’s reputation when it comes to RAID was awful. I then checked my media centre PC, and sure enough, there were 4 SATA connectors, and 2 IDE connectors. Enough for 8 drives. But the motherboard didnt seem to have any hardware RAID. So, if it wasnt there, could I do it in software? Seeing as it’s already got Linux (Ubuntu), I looked up software raid solutions for linux, and there it was in the form of ‘mdadm’. This magical command allows setting up of all types of RAID arrays with a few quick commands. To test this out, I salvaged a couple of old 80Gig SATA drives from my garage, and stuck them in the box. An hour later, I had a new 80Gig volume mounted in the linux box, and accessible over the network via Samba. The two drives were set up as a mirror, so I lose half the capacity of the combined drives, but it’s redundancy I want for this setup, not speed.

The next step is to replace the 80Gig drives with a few 1-2TB drives. Who know, I might even go the whole hog and use 3 or more drives in a RAID 4 or RAID 5 array. That should give me faster access, as well as the redundancy that is an essential part of this project.

Another thing is that I need to know when there’s a problem with the array. The array should still function when one of the drives fail, and will re-build the array when I replace that failed drive. But how do I know when a drive has failed? I don’t want to have to check it every week (or day). Well, there’s very handy feature of the Linux Software RAID solution that monitors the disks in the array and can be set up to automatically send an email when there’s a problem. Nice.

Once I actually get the drives, I’ll then have to upgrade the link between the desktop and the media centre PC upgraded to gigabit ethernet so I have nice fast access to the array.  I’ll keep ye posted…

That’s it for now.
Cheers,
Dave.

Refs:
RAID - http://en.wikipedia.org/wiki/RAID
mdadm - http://en.wikipedia.org/wiki/Mdadm