Useless Microoptimizations Homepage Forum Forum Index Useless Microoptimizations Homepage Forum
Don't get confused, this is just my homepage, not really a message board. I implemented it as a forum for reasons you can find here.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

My RAID disk subsystem benchmarks (Linux + FreeBSD)

 
Post new topic   Reply to topic    Useless Microoptimizations Homepage Forum Forum Index -> General hardware notes
View previous topic :: View next topic  
Author Message
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Fri Nov 04, 2005 6:13 pm   Reply with quote

There are the thoughput and seek results of my new disk array.


The lines in each set are:
--------------------------
- continuous write performance (8 KB blocks)
- CPU time breakup during that
- continuous read performance (8 KB blocks)
- CPU time breakup during that
- bonnie results as follows:
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU



File size used:
---------------
- 16 GB for read and write tests
- 2047 MB for bonnie (machine has 512 MB RAM)


Hardware used:
--------------
- DFI Lanparty Nf4 SLI-DR
- Opteron socket 939 at 2.9 GHz
- 2x 256 MB Geil One TCCD at 290 MHz 3-4-4-8
- bootdisk 80 MB Maxtor P-ATA
- RAID on 3x Segate 7200.8 400 GB SATA
- Disks connected to NVidia SATA ports
- Machine has overclocked CPU and RAM but no other BIOS fiddling or
  hardware modifications


Software used:
--------------

Linux:
- Fedora Core 4, amd64
- 2.6.13-1.1532_FC4
- mdadm as delivered
- no changes to anything, this is just plain FC4 after official updates

FreeBSD:
- FreeBSD 7-current
- ccd driver


Special setups:
---------------

To gain the CPU time in read and write, I do *not* use the time(1) via
getruage(2) tool, as it often fails to properly account for random
system CPU time used.

Instead, I have a frontend to top(1), which gives me the CPU idle and
wait time during the test.

In bonnie, I leave the default, which is using getrusage(2).


First, the speed for the disks solo:
------------------------------------

Single disk, partition at the beginning of the disk:
  49.23 MB/s (51622036 B/s)  (15.9% CPU, 3164.6 KB/sec/CPU)
  64.43 MB/s (67554626 B/s)  (6.4% CPU, 10372.8 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 51795 67.8 54641 13.5 25679  3.8 43896 60.0 66375  4.8 140.0  0.3


Single disk, partition at the end of the disk:
  29.70 MB/s (31144294 B/s)  (9.9% CPU, 3078.4 KB/sec/CPU)
  35.34 MB/s (37054162 B/s)  (3.9% CPU, 9374.5 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 30890 40.1 31337  7.2 15101  2.4 29054 39.0 33310  2.2 135.1  0.3


Raid 5:
-------

Doing level 5, chunk 32, at beginning of disk
  69.15 MB/s (72511624 B/s)  (21.8% CPU, 3243.8 KB/sec/CPU)
  124.02 MB/s (130049187 B/s)  (47.9% CPU, 2652.5 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 68781 81.9 71256  6.6 29416  5.6 50524 59.4 129267 14.7 260.2  0.9

Doing level 5, chunk 512, at beginning of disk
  80.32 MB/s (84222388 B/s)  (25.8% CPU, 3192.9 KB/sec/CPU)
  99.08 MB/s (103889937 B/s)  (22.9% CPU, 4430.4 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 69777 82.5 79481  7.3 27188  4.4 45360 64.8 107492 10.9 247.9  0.7

Doing level 5, chunk 32, at end of disk
  48.09 MB/s (50422836 B/s)  (15.3% CPU, 3212.1 KB/sec/CPU)
  71.56 MB/s (75033341 B/s)  (28.4% CPU, 2576.5 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 49180 58.4 47742  4.2 22504  4.0 50845 61.3 71869  8.9 217.4  0.6

Doing level 5, chunk 512, at end of disk
  52.63 MB/s (55182057 B/s)  (16.7% CPU, 3223.0 KB/sec/CPU)
  73.74 MB/s (77318426 B/s)  (16.6% CPU, 4540.4 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 51914 61.0 52464  4.6 20407  3.1 39195 54.7 71494  7.1 263.4  0.7


Raid 1 (three disks):
---------------------------------

(chunk size only seems to matter for seek test)

Doing level 1, chunk 1024, at beginning of disk
  65.09 MB/s (68250723 B/s)  (11.1% CPU, 5983.0 KB/sec/CPU)
  65.53 MB/s (68714278 B/s)  (6.7% CPU, 10045.5 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 70108 83.6 67393  6.8 28096  4.0 55067 76.8 67643  4.8 464.8  2.3

Doing level 1, chunk 1024, at end of disk
  35.90 MB/s (37642776 B/s)  (6.2% CPU, 5977.3 KB/sec/CPU)
  35.87 MB/s (37609209 B/s)  (3.8% CPU, 9614.6 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 33272 39.5 36640  3.7 15865  2.2 32614 43.9 34370  2.5 436.8  1.8


Raid 0:
-------

(chunk 256 is clearly best)

Doing level 0, chunk 256, at beginning of disk
  182.59 MB/s (191457784 B/s)  (43.8% CPU, 4267.8 KB/sec/CPU)
  159.23 MB/s (166967535 B/s)  (43.1% CPU, 3779.7 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 79491 93.4 204622 19.3 68458  9.7 60790 86.6 167182 10.5 289.3  1.0

Doing level 0, chunk 256, at end of disk
  102.10 MB/s (107061291 B/s)  (14.7% CPU, 7107.5 KB/sec/CPU)
  103.02 MB/s (108021414 B/s)  (12.6% CPU, 8392.2 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 81452 95.8 114525 11.7 44907  6.1 58842 81.0 104733  6.7 282.3  0.7



Last edited by Useless Microoptimizations on Wed Apr 12, 2006 10:18 am; edited 2 times in total
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Mon Nov 07, 2005 1:32 pm   Reply with quote

Since the mainbord in question has the Gigabit Ethernet interface not on the PCI bus I was curious what the throughput directly from network to disk and vice versa is.

Using the fasted raid-0 setup from above I get:
- from network to disk: 17179863888 B 16.0 GB 186.21 s 92260914 B/s 87.99 MB/s
- from disk to network: 17179863888 B 16.0 GB 182.21 s 94287551 B/s 89.92 MB/s

Measured with cstream, opening one end at a TCP socket, the other end on one file on the filesystem on the raid0. 8 KB blocksize on both ends, no jumbo frames, just plain setup without tuning.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Mon Nov 07, 2005 2:35 pm   Reply with quote

FreeBSD results:



- FreeBSD 7-current as of November 4, 2005
- ccd raid driver
- hardware as above

Raid 1
------

ULE Scheduler:

doing 128 CCDF_MIRROR /dev/ad4s1 /dev/ad6s1
  59.31 MB/s (62192681 B/s)  (18.3% CPU, 3318.9 KB/sec/CPU)
  64.81 MB/s (67960489 B/s)  (11.3% CPU, 5878.4 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 54355 66.1 59448  7.8 19759  2.9 43416 47.9 64657  5.1 136.8  0.3

4BSD Scheduler:

doing 128 CCDF_MIRROR /dev/ad4s1 /dev/ad6s1
  59.46 MB/s (62349201 B/s)  (17.8% CPU, 3411.1 KB/sec/CPU)
  64.74 MB/s (67879984 B/s)  (11.0% CPU, 6031.8 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 54206 65.8 59504  7.5 18891  2.6 43609 48.1 64464  4.9 140.8  0.3


Raid 0
------

ULE Scheduler:

doing 96 none /dev/ad4s1 /dev/ad6s1 /dev/ad8s1
  159.03 MB/s (166758911 B/s)  (40.6% CPU, 4014.1 KB/sec/CPU)
  145.94 MB/s (153032694 B/s)  (28.0% CPU, 5341.2 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 74033 88.9 183872 23.5 32875  4.5 67646 74.3 150336 11.2 154.9  0.3

doing 96 CCDF_UNIFORM /dev/ad4s1 /dev/ad6s1 /dev/ad8s1
  159.34 MB/s (167075910 B/s)  (39.5% CPU, 4126.5 KB/sec/CPU)
  145.44 MB/s (152503572 B/s)  (27.4% CPU, 5427.5 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 73694 88.7 182827 23.0 30952  4.3 68579 73.8 149987 11.2 154.4  0.3

4BSD scheduler:
doing 96 none /dev/ad4s1 /dev/ad6s1 /dev/ad8s1
  158.72 MB/s (166434732 B/s)  (39.2% CPU, 4146.3 KB/sec/CPU)
  145.84 MB/s (152920699 B/s)  (27.7% CPU, 5387.3 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 73212 88.2 184105 23.3 30483  4.3 68494 73.7 149873 11.2 157.4  0.3


Simple partition on disk
------------------------

ULE Scheduler:

Doing beginning of disk
  59.94 MB/s (62849804 B/s)  (13.1% CPU, 4692.4 KB/sec/CPU)
  64.97 MB/s (68128976 B/s)  (9.4% CPU, 7062.9 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 56049 67.9 56535  8.0 18543  2.6 41537 45.8 61672  4.7 131.9  0.3

Doing end of disk
  35.19 MB/s (36895173 B/s)  (7.5% CPU, 4791.3 KB/sec/CPU)
  36.48 MB/s (38253871 B/s)  (5.3% CPU, 7088.7 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 33960 41.0 33023  4.6 12938  1.8 35378 39.1 35687  2.7 120.4  0.3

4BSD scheduler:

Doing beginning of disk
  59.76 MB/s (62665639 B/s)  (12.4% CPU, 4947.2 KB/sec/CPU)
  65.12 MB/s (68284659 B/s)  (9.3% CPU, 7155.0 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 55850 67.6 56298  7.7 18372  2.5 42064 45.3 61567  4.6 134.4  0.3

Doing end of disk
  35.14 MB/s (36850764 B/s)  (7.8% CPU, 4637.5 KB/sec/CPU)
  36.52 MB/s (38292235 B/s)  (5.4% CPU, 6899.4 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 34091 41.1 33100  4.6 13195  1.9 35433 38.3 35697  2.7 125.0  0.3



Last edited by Useless Microoptimizations on Tue Nov 08, 2005 6:44 pm; edited 4 times in total
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Mon Nov 07, 2005 3:24 pm   Reply with quote

Findings when interpreting FreeBSD and Linux results:
-----------------------------------------------------

For FreeBSD it doesn't matter much whether you use the ULE or the 4BSD
scheduler. Only the seek tests improve a little with 4BSD.

On the plain disks, FreeBSD writes faster, read is equal. FreeBSD
also uses a lot less CPU time writing, but more CPU time reading from
plain disk.

Linux is much faster in the seek tests on raid-0.

Maximum throughput on raid-0 is about 10% better in Linux, CPU load
per transfer is about the same.

FreeBSD blows more CPU time in raid-1.

FreeBSD (excuse me) sucks seeking on raid-1, it basically doesn't seem
to use the disks independently at all, the average random seek
completion time is about the same as for the plain disk. Linux on the
other hand does an exceptionally good job seeking on raid-1, and it
reads from it with very low CPU time.

Linux manages to do raid-5 writes faster than raid-1, no doubt helped
by the massive CPU power and memory bandwidth that my test machine has
(Socket 939 Opteron at 2.9 GHz and dual-channel RAM at 290 MHz / DDR580).


Non-performance factors:
------------------------

In case you are not familiar with the two software RAID systems, I
should mention that FreeBSD's ccd implementation of RAID-1 is by far
not as powerful as Linux'. FreeBSD's ccd RAID-1 protects you against
disk loss all right but it has no way to replace a disk once it
fails. When one disk fails you need to make a backup of the
filesystem, construct a new array with a new disk from scratch and
play back the backup. Linux RAID-1 can hook in disks at runtime and
it can even have hot-spares.


In the Linux case I should mention that they suffer from sub-optimal
userland integration. Luckily most distributions now use the mdadm
userland setup utility instead of the raidtools which were mixing up
quite a few things. By now mdadm is even documented OK, just their
config file format still leaves to be desired.

The worst feature of Linux software RAID is still automatism going
wrong, in particular distributions that start up the raid system in
read/write when you boot into single-user mode. For a FreeBSD
person the latter is mind-boggling mischief. Single-user ==
read-only. The problem here is that the Linux software RAID (as
opposed to FreeBSD's) uses raid superblocks that are written to when
the array is started, so if you fiddle with device order you might get
stabbed in the back by that.

I once almost lost my RAID-5 array when I rearranged controllers and
the device ID of one disk changed. In FreeBSD ccd, if your device IDs
change, you just start the array with the new device IDs and you are
done. In Linux, it detects that an old disk of the array went away
and in my case it downgraded the RAID-5 to a 2-disk setup and then
manually re-synced the third disk - which had the right data in first
place. If I had lost one of the two disks in the two hours that the
re-sync took, my array would have been gone. This might have been
caused by raidtools and might not apply with mdadm, but again, the
real problem here is that the OS fiddles with my disks in read/write
mode when in single-user mode.


Conclusion:
-----------

Although this test shows that FreeBSD still beats Linux in disk
performance when it comes to single disks, the old ccd RAID driver
that I test for FreeBSD here is not much of a competition for Linux
software RAID.

FreeBSD ccd is OK in RAID-0 with just 10% streaming performance lost
compared to Linux, but the seek times are not impressive. In RAID-1
the seek performance is outright disappointing.

Performance-wise the Linux software RAID is very impressive. That
particularly applies to the write performance in RAID-5 and to the
seek performance in RAID-1.

However, userland utility and general distribution/startup issues can
threaten the integrity of Linux arrays. Based on my own experience
after a few years of Linux raid I have to say that the only safe way
to operate an array for a long time is to take all raid startup
statements out of Linux' startup scripts, do not use raidtools and use
mdadm without config file. The best upgrade-safe way to handle this
is to make a raid start script of your own and it does nothing but
commandline startup of mdadm with no config files. Personally I
wouldn't let single-user mode mount anything read/write but with raid
arrays that's even more imperative.

Consequently, I always use a plain bootup partition on Linux and mount
the RAID arrays later.



Hardware RAID controller?
-------------------------

Should you get a hardware RAID controller instead?

First of all, forget about all these cheap and/or onboard SATA "RAID"
controllers, they just do software RAID in the driver, and you will
lose your array if a disk fails when the OS is not up (read horror
stories on the anandtech forums and elsewhere).

If you just want speed out of RAID-0 it is no question that both
FreeBSD ccd and Linux provide this in software, Linux a little better.

FreeBSD's RAID-1 cannot be used on a large scale due to the
backup-on-fail procedure. And it doesn't give you the performance
advantage you'd expect. It is probably OK if all you want is
protection for a small sub-part of your overall disk space.

Linux also does RAID-1 and RAID-5 very well, performance-wise.
However, userland and startup issues make me recommend that you only
do that when you really learn how it works. And I also recommend that
you take control over your RAID arrays out of your distribution's
hand. You should simulate a disk fail and go through the act of
dealing with it after you set up the array but before you move
important data to it.

Having said that, once you did that, you'll have a great array in
Linux. And it is not that hardware RAID does that well either if you
don't know what you are doing.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Useless Microoptimizations
Site Admin


Joined: 09 Feb 2005
Posts: 114
Location: Boston, MA, USA

PostPosted: Wed Nov 09, 2005 11:58 am   Reply with quote

Here are results for FreeBSD's "gmirror" software RAID-1, using the two algorithms that came out best:


doing load
  59.69 MB/s (62594681 B/s)  (19.5% CPU, 3139.6 KB/sec/CPU)
  63.78 MB/s (66880884 B/s)  (9.8% CPU, 6644.3 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 57077 69.3 58972  7.5 18199  2.5 42691 47.1 63682  4.7 134.0  0.3

doing prefer
  59.60 MB/s (62497088 B/s)  (19.1% CPU, 3190.4 KB/sec/CPU)
  65.01 MB/s (68164137 B/s)  (9.7% CPU, 6862.5 KB/sec/CPU)
          -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
    2047 56003 68.1 58902  7.4 19403  2.7 42545 47.0 64650  4.8 133.1  0.3


So that's in the same ballpark as ccd, and hence slower than Linux.

However, gmirror on FreeBSD gives you a full-featured recovery-capable RAID-1 solution.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Display posts from previous:   
Post new topic   Reply to topic    Useless Microoptimizations Homepage Forum Forum Index -> General hardware notes All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group