Horrible 3ware RAID performance

Forum: LinuxTotal Replies: 29
Author Content
Sander_Marechal

Jan 11, 2008
4:39 AM EDT
Hello,

Does anyone here have experience with hardware RAID cards? More specifically the 3ware 9550SX cards? They're very expensive and promise huge read and write speeds. 800 MB/s read and 380 MB/s write in RAID 5. RAID 1 should be faster because no parity needs to be computed. I bought one.

I have a 3ware 9550SX-4 card using two 500 Gb Western Digital SATA 3.0 drives in RAID 1. It should be able to do 300 MB/s write. It's plugged into a 64-bit, 66 Mhz PCI-X slot which should do up to 533 MB/s transfer. I configured the RAID 1 as LVM2 volume group and created a 200 Gb ext3 logical volume in it. Maximum write speed I've been able to obtain is a shockingly low 5.3 MB/s.

I've googled around and I have heard of more comments on poor 3ware performance, but they usually consider 60-140 MB/s to be slow. I should be so lucky! I get a paltry 5 MB/s! I know that LVM2 and ext3 slow it down a bit but it shouldn't cause a 500%-2500% performance drop...

Some more info on my server: HP ProLiant G3 Dual Xeon 3.2 Ghz 1 GB RAM Debian Etch 4.0 with kernel 2.6.22-3 from backports.org (I read somewhere that the 2.6.18 had a bug causing the slow speed and that it was fixed for x86 under 2.6.21. Apparently not)

Any help?
Sander_Marechal

Jan 11, 2008
4:55 AM EDT
I just tested it with XFS instead of ext3, but there's hardly any performance increase:

sylvester:/mnt/test/temp# time dd bs=1M count=1000 if=/dev/zero of=1000M.bin 1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 209.261 seconds, 5.0 MB/s

real 3m29.275s user 0m0.016s sys 0m2.932s
NoDough

Jan 11, 2008
9:37 AM EDT
Have you verified that it is actually the RAID card causing the slowdown? Plug the drives into the MB controller and performance test (with the same FS/LVM config.) If the performance problem is gone, then it is definitely the controller.

If it is, contact the manufacturer and, if they are worth anything, they will make it right for you.
pat

Jan 11, 2008
10:28 AM EDT
Here is what I get on mine for your dd command:

1000+0 records in 1000+0 records out 1048576000 bytes (1.0 GB) copied, 1.87005 s, 561 MB/s

Is the driver configured in your kernel?

Also for read look at

blockdev --help
Sander_Marechal

Jan 11, 2008
10:30 AM EDT
I can't. There are no SATA connectors inside my server. That's why I got the hardware RAID card in the first place. If I could have gotten a $50 PCI-X card that had SATA 3.0 controller, I would have bought that one instead of the $500 3ware card I have now. But alas, all the cheap cards are PCI which is too slow at 133 MB/s.

There are two things I still need to try:

* Deleting the LVM2 volume groups and using the RAID 1 as a regular disk * Resetting the RAID to JBOD and using mdadm software RAID instead od hardware RAID

The first one might lead to improvements but I really like LVM so it'll probably be the latter.

Quoting:contact the manufacturer and, if they are worth anything, they will make it right for you.


I read a couple of comments via Google that said 3ware had great support, even for Linux. But they were recently bought by AMCC who have little Linux love.
Sander_Marechal

Jan 13, 2008
9:40 PM EDT
I've just ditched the entire LVM on the 3ware RAID 1 and created one big 500 Gb XFS filesystem on it. Guess what... throughput is *still* 5.0 Mb/s. I'm getting peeved off at this really fast now. Of course, the good news is that LVM doesn't seem to matter one bit. Which is nice because I really like to use LVM.
Sander_Marechal

Jan 14, 2008
4:18 AM EDT
Some more stuff done. I threw away the RAID 1 unit and instead created two units of type "single". This is almost (but not quite) identical to JBOD, except that the card still manages the units. Inside the usints I created one big 500 Gb parition containing a 500 Gb XFS filesystem. Write speed is now about 75-80 MB/s. Not great, and a far cry from the 380 MB/s advertised but it's a helluva lot better than 5 MB/s. At least it's usable now.

Next up: dumping the single units and doing a real JBOD setup.
Sander_Marechal

Jan 14, 2008
5:39 AM EDT
JBOD is about as fast as single disk. Turning off write-caching seems to make the disks slightly faster (about 2-3 Mb/s faster) which is strange. Enabling write-cache is supposed to make the disks faster. Then again, I'm writing much more than will ever fit in cache (or in RAM) when doing these tests.

I've also updated to the latest firmware, with no noticeable difference. I now get a consistent 75-85 Mb/s writing. I have also tried writing to both the disks at the same time. There's only a very little drop in performance when I do that. It drops about 5 Mb/s to 70-80 Mb/s when writing to both disks at the same time.

SATA 3.0 is supposed to go up to 300 Mb/s. My PCI-X slot can go up to 533 Mb/s. What could cause the slowdown to 80 Mb/s when you take into account I can write at the same speed to both disks at the same time?
Sander_Marechal

Jan 14, 2008
6:19 AM EDT
Doing some more reading on the subject, it seems like 80 Mb/s isn't all that bad, considering that my SATA 3.0 drives are regular Western Digital 7200 rpm drives with 16 Mb cache (WD5000AAKS).

So, I'll guess I'll be settling on JBOD configuration, no write-cache (I don't have a battery unit anyway) and software RAID 1 running LVM. I'm probably going to settle for XFS as the filesystem but that's easy to benchmark when I have LVM up and running.
Sander_Marechal

Jan 14, 2008
7:24 AM EDT
Wow, pretty big difference between ext3 and XFS here. ext3 given me 51-63 Mb/s write speed. XFS gives me 67-76 Mb/s write speed. I guess I'll be going with XFS :-)
NoDough

Jan 14, 2008
7:44 AM EDT
You've got more patience than I. I would've already contacted the manufacturer or reseller and requested my money back, or an exchange for a different unit.
pat

Jan 14, 2008
8:45 AM EDT
Sander, I noticed you said your pci-x slot is 66MHZ, the card says it is for PCI-X 133MHz host interface speed, I wonder if this is your issue.
Sander_Marechal

Jan 14, 2008
9:02 AM EDT
Not an issue AFAICT. The card specs says it's 133 Mhz but can also run at 100 Mhz or 66 Mhz. It's one of the reasons I went for this card. At 66 Mhz the bus throughput should be 533 Mbytes/s so more than plenty for my use.

A question for people experienced with software RAID and LVM and all that. Are there specific opimizations I should use when layering these on top of each other? I notice that when you create an XFS file system you can specify the stripe size of the underlying RAID device so it will align itself and increase performance, not I have no idea how this works with LVM sitting between the RAID and the XFS filesystem.
Sander_Marechal

Jan 14, 2008
3:35 PM EDT
Quoting:I would've already contacted the manufacturer or reseller and requested my money back, or an exchange for a different unit.


I've contacted 3ware support and explained my awful performance issues and all the testing I've done. Let's see what I hear back from them. I mean, it's crazy that I get a 1600% speed boost from using mdadm software RAID 1 instead of hardware RAID 1 :-)

I've also done some more reading on RAID 1 + LVM + XFS. It seems that stripe size in the RAID doesn't affect me at all since I'm using RAID 1 which means all data needs to be written to both drives anyway. It's mostly an issue with RAID 5 and RAID 10. On top of that, the mkfs.xfs tool detects when you create an XFS filesystem on an mdadm software RAID and will automagically use the right stripe size and all that.

All I had to benchmark was different blocksizes for XFS. It turns out that a blocksize of 4K was the best, with 82 Mb/s sustained write as opposed to 79.8 Mb/s for 1K blocks. With 1K blocks it was however much faster to create/delete many small files. I used bonnie++ and `time dd` with 1 Gb files and 10 Gb files to benchmark this.
Sander_Marechal

Jan 16, 2008
7:32 AM EDT
3ware support finally got back to me. Apparently I needed to turn on the write cache (Weird, because the 3ware driver warns me not to do that. I have no battery unit so a powerloss means the cache gets wiped and the data may be corrupted) *and* I had to set the StorSave profile (i.e. the policy on how it caches writes) to perform (fastest, least safe). Now it performs similar to the mdadm RAID. At 83 Mb/s it's a tiny little bit faster than software RAID.

But now I'm torn what to do: Hardware RAID or software RAID?

Both have advantages and disadvantages. Software RAID lets me easily move disks to other controllers or even to other machines. It's just a tad more flexible. But it does come at a cost: Higher CPU usage and less throughput on the bus. On hardware RAID 1 all data is sent to the card once. With software RAID twice; once to each disk. Then again, I can handle 533 Mb/s over my bus and filling up the card's 4 slots with SATA drives only comes to 4 x 80 Mb/s = 320 Mb/s.

Any advice in this area?
tuxchick

Jan 16, 2008
8:11 AM EDT
Well sander, what costs less, a big fancy RAID card that apparently does everything the wrong way and induces hair loss, or some extra CPU cycles for the software RAID? I'm thinking for the price of a RAID card you could buy a high-end multi-core CPU and have money left over, plus major geekpoints. Software RAID gives you a lot more flexibility, since it is not limited to RAID-ing entire disks, but works at the block level, so you can set up individual partitions. Which means you can have a boot partition on one of your disks, among other good things.
NoDough

Jan 16, 2008
8:48 AM EDT
Sounds to me like you've already spent a good amount in pursuit of the best performance. If, in the end, the best performance is achieved via the least expensive solution, why wouldn't you go that way?
rijelkentaurus

Jan 16, 2008
11:07 AM EDT
Quoting: I have no battery unit so a powerloss means the cache gets wiped and the data may be corrupted)


That is a big red light to me. If the performance gains are going to be minimal, I would go software. TC is right, spend the CPU cycles, it's not like you're trying to maximize a Pii or something, today's CPUs spend 98% of their time sitting around and goofing off, might as well make them work. Add in the flexibility and I think the choice is easy.
hkwint

Jan 16, 2008
12:57 PM EDT
Do I understand if you have that $500 hardware-RAID card you should have an UPS as well because of the cache-problems whet the power is cut off for some reason? Than LVM2 is really cheaper than I thought. Unlike in the past I wouldn't recommend EVMS, it's not maintained anymore I found out recently, though it makes a great 'LVM2/mdadm GUI'.

Knowing Dutch electricity I'd not take the risk of power loss, when there's more than 5cm snow there are big problems with Dutch infrastructure already - because of the ice clamping to the high-voltage power lines.
ColonelPanik

Jan 16, 2008
1:08 PM EDT
You don't have one of those big windmills?
Sander_Marechal

Jan 16, 2008
1:17 PM EDT
Thanks guys. I'll go with software RAID then. Hardware RAID is only 1 Mb/s faster so it's really negligible. I came up with some other advantages of hardware RAID but none of them seem to apply to me.

1) Less bus transfer. As I said, my bus is bigger than the all disks combined 2) Hotswap. Well, I cannot open my server while it's running. The CPU cooling is dependent on the case being closed. They would explode :-) 3) Less CPU overhead. I got a dual Xeon 3.2 Ghz (four cores total, 6800+ bogomips each) doing nothing but playing a few ogg's and serving webpages at an average rate of 1 page per 15 minutes. 4) You can't partition software RAID. But I'm using LVM anyway.

BTW, I came across one really cool trick that software RAID can do that hardware RAID cannot: RAID different brand drives together.

When one of your drives in a RAID dies you need to replace it. But with regular hardware (not the expensive server stuff) it can be hard finding the exact model drive again. Just try buying a 20 Gb hard drive these days. But with hardware RAID you need the exact same model because block counts vary slightly from model to model and manufacturer to manufacturer. This usually means that you need to resize and rebuild your entire array when you insert a different brand disk.

With software RAID you can prevent that because you RAID together partitions, not actual drives. Just create one partition that's slightly smaller than the actual drive size, e.g. 498 Gb on a 500 Gb drive. Note down the exact size in blocks. Now, when you insert a different model replacement drive, simply create a partition with the exact same size as you noted down. Any difference in block size is hidden in that last bit of unused space.
sh0x

Jan 22, 2008
7:03 PM EDT
I have 3ware 9650SE and have been pulling my hair out trying to fix this same problem. I just enabled write cache and it went from 5Mb/s to 175Mb/s with raid 5. I tried creating several 1G files with dd, sometimes it was only 35Mb/s but usually 170+. I changed the StorSav profile to performance (the other two options were protected and balanced). The array is being verified since making these changes and while running these tests. I tried another 1G file with dd and now it gets 220Mb/s but sometimes 45Mb/s. This is base install.

Here is a link to the 3ware battery back-up unit, for another $150. well i'm not getting that so hopefully write cache won't do me wrong.

http://store.3ware.com/?category=2&subcategory=6&productid=B...

Sander_Marechal

Jan 22, 2008
9:49 PM EDT
You only need the BBU for power outages. If you got redundant power supplies, don't worry too much. One oddity I did find was that if you configure the disks as "single" or JBOD, performance went *up* by about 5 Mb/s when I disabled the write cache. You would expect performance to go down.

I do recommend that you thoroughly benchmark your setup though. I've found that the filesystem and blocksize you use (and for RAID 5 the chunk size) make a huge difference in performance. It can be up to 200%. I just finished benchmarking my RAID 1 setup using XFS, ReiserFS, Ext3 and JFS filesystems with 1k and 4k block sizes. I tested using different real-world loads instead of using `time dd ...`. I suggest you do the same. Note that you cannot change the chunk size of the RAID 5 array when you use 3ware's hardware RAID. It's always 64k. You can only change it on Linux's mdadm software RAID.

I'm writing an article about my benchmark. I hope to have it finished this week. Keep an eye out for it.
natmaka

Feb 19, 2008
12:34 AM EDT
Obtaining bad performances on random I/O (140 IOPS on a 6-drives RAID5), I'm trying to gather as much intelligence as possible about this. I built a system (9550 then 9650 with 12 disks) to assess the situation on those 3ware and also Linux 'md'. Please check http://www.makarevitch.org/rant/3ware/ and let me know what you think/suggest.
Sander_Marechal

Feb 19, 2008
2:41 AM EDT
Hi natmaka. I've read your article quite a few times when investigating my own problems. Thruth be told, much of it is waaay over my head :-) The most saliant points I found is that RAID 1 without write cache effectively turns the setup useless (5 Mb/s write). So, 3ware recommends turning the cache *off* if you don't have a BBU but neglect to tell you that without the cache your setup is even slower than using a rewritable CD or DVD :-/

The other "interesting" fact I found was that turning off the write cache on JBOD or single disk mode actually made the write speed go up instead of down (from about 80 MB/s to 85 MB/s per disk).

I have a bunch of benchmark statistics for my array, but it's more a filesystem benchmark than a RAID benchmark. I tested the array with my 9550 configured in JBOD mode using two 500 Gb Western Digital SATA 3.0 drives that were used in mdadm software RAID 1. I tested four filesystems using different blocksizes on a variety of filesytem operations. In the end I settled for XFS with a 4K blocksize.

I'll write up an article about my benchmark, put it up and e-mail you. Hopefully it's of some use to you :-)
dasacc22

Feb 27, 2008
4:53 PM EDT
Hi i have the same problem. Though I cant seem to set the storsave profile from the cards utility program when creating the unit on the controller, nor from tw_cli with a //$>/c0/u0 set storsave=perform i get .. Failed Error (CLI:108) invalid unit set storsave policy command.

a manual i was reading through somewhere said storsave is disabled if cache is off when unit is created, i wasn't the one to create this unit but cache was on when i checked. And fiddling with the cache on/off through tw_cli makes no difference when i call for storsave=perform on the unit. Im currently about to play around with this some, any suggestions?
Sander_Marechal

Feb 27, 2008
10:06 PM EDT
In my experience the storsave setting doesn't make much of a difference. That said, I tested with RAID1, not RAID5 (I have only two disks).

Quoting:a manual i was reading through somewhere said storsave is disabled if cache is off when unit is created


Actually,. only the storsave setting of "perform" is unavailable when you have no write cache. The other two settings should still be available. Can you switch between those other two storsave profiles?
natmaka

Feb 29, 2008
3:24 AM EDT
> I've read your article quite a few times when investigating my own problems. Thruth be told, much of it is waaay over my head :-)

Don't hesitate to ask (drop a mail to me!). English is not my mother tongue and I may have garbled something.

Beware when using an unit: theoritically 'b' stands for 'bit' and 'B' for 'byte'. 5 Mb/s is not equivalent to 5 MB/s! Moreover 'M' also as different values (base 2 or 10? Beware: disks capacity are often expressed in SI units, base 10, 'M' is 10e6)

Cache: read and write cache should be two different options, and the driver should enable/disable them according to the applications signals (posix_fadvise...). When random accessing, for example, only the write-back cache should be enabled.

To avoid testing the fs one may test the (raw) block device, /dev/sd...

Among classic fs type XFS is, as far as I can say, the fastest on most tests
Sander_Marechal

Apr 21, 2008
2:31 PM EDT
Bit of a late response, but I've finally finished my benchmark article. I hope it is of some use!

http://www.jejik.com/articles/2008/04/benchmarking_linux_fil...
gr1sly

Jul 22, 2008
11:41 PM EDT
I had same problem with 3ware 9650SE-16ML card. I have 10.5tb raid5 and debian 4.0 on top and perfomance then copying files on it was aful.

i ordered bbu(battery backup unit) for controller and enabled write cache over "3ware 3DM2" web raid controller managemet utility and voila, perfomance issues were gone and write speed went from 39 MB/s to 250 MB/s.

hope it helps you

You cannot post until you login.