Useless Microoptimizations Homepage Forum Don't get confused, this is just my homepage, not really a message board. I implemented it as a forum for reasons you can find here.
Joined: 09 Feb 2005 Posts: 114 Location: Boston, MA, USA
Posted: Fri Nov 04, 2005 6:13 pm
There are the thoughput and seek results of my new disk array.
The lines in each set are:
--------------------------
- continuous write performance (8 KB blocks)
- CPU time breakup during that
- continuous read performance (8 KB blocks)
- CPU time breakup during that
- bonnie results as follows:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
File size used:
---------------
- 16 GB for read and write tests
- 2047 MB for bonnie (machine has 512 MB RAM)
Hardware used:
--------------
- DFI Lanparty Nf4 SLI-DR
- Opteron socket 939 at 2.9 GHz
- 2x 256 MB Geil One TCCD at 290 MHz 3-4-4-8
- bootdisk 80 MB Maxtor P-ATA
- RAID on 3x Segate 7200.8 400 GB SATA
- Disks connected to NVidia SATA ports
- Machine has overclocked CPU and RAM but no other BIOS fiddling or
hardware modifications
Software used:
--------------
Linux:
- Fedora Core 4, amd64
- 2.6.13-1.1532_FC4
- mdadm as delivered
- no changes to anything, this is just plain FC4 after official updates
FreeBSD:
- FreeBSD 7-current
- ccd driver
Special setups:
---------------
To gain the CPU time in read and write, I do *not* use the time(1) via
getruage(2) tool, as it often fails to properly account for random
system CPU time used.
Instead, I have a frontend to top(1), which gives me the CPU idle and
wait time during the test.
In bonnie, I leave the default, which is using getrusage(2).
First, the speed for the disks solo:
------------------------------------
Single disk, partition at the beginning of the disk:
49.23 MB/s (51622036 B/s) (15.9% CPU, 3164.6 KB/sec/CPU)
64.43 MB/s (67554626 B/s) (6.4% CPU, 10372.8 KB/sec/CPU)
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
2047 51795 67.8 54641 13.5 25679 3.8 43896 60.0 66375 4.8 140.0 0.3
Single disk, partition at the end of the disk:
29.70 MB/s (31144294 B/s) (9.9% CPU, 3078.4 KB/sec/CPU)
35.34 MB/s (37054162 B/s) (3.9% CPU, 9374.5 KB/sec/CPU)
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
2047 30890 40.1 31337 7.2 15101 2.4 29054 39.0 33310 2.2 135.1 0.3
Joined: 09 Feb 2005 Posts: 114 Location: Boston, MA, USA
Posted: Mon Nov 07, 2005 1:32 pm
Since the mainbord in question has the Gigabit Ethernet interface not on the PCI bus I was curious what the throughput directly from network to disk and vice versa is.
Using the fasted raid-0 setup from above I get:
- from network to disk: 17179863888 B 16.0 GB 186.21 s 92260914 B/s 87.99 MB/s
- from disk to network: 17179863888 B 16.0 GB 182.21 s 94287551 B/s 89.92 MB/s
Measured with cstream, opening one end at a TCP socket, the other end on one file on the filesystem on the raid0. 8 KB blocksize on both ends, no jumbo frames, just plain setup without tuning.
Joined: 09 Feb 2005 Posts: 114 Location: Boston, MA, USA
Posted: Mon Nov 07, 2005 3:24 pm
Findings when interpreting FreeBSD and Linux results:
-----------------------------------------------------
For FreeBSD it doesn't matter much whether you use the ULE or the 4BSD
scheduler. Only the seek tests improve a little with 4BSD.
On the plain disks, FreeBSD writes faster, read is equal. FreeBSD
also uses a lot less CPU time writing, but more CPU time reading from
plain disk.
Linux is much faster in the seek tests on raid-0.
Maximum throughput on raid-0 is about 10% better in Linux, CPU load
per transfer is about the same.
FreeBSD blows more CPU time in raid-1.
FreeBSD (excuse me) sucks seeking on raid-1, it basically doesn't seem
to use the disks independently at all, the average random seek
completion time is about the same as for the plain disk. Linux on the
other hand does an exceptionally good job seeking on raid-1, and it
reads from it with very low CPU time.
Linux manages to do raid-5 writes faster than raid-1, no doubt helped
by the massive CPU power and memory bandwidth that my test machine has
(Socket 939 Opteron at 2.9 GHz and dual-channel RAM at 290 MHz / DDR580).
Non-performance factors:
------------------------
In case you are not familiar with the two software RAID systems, I
should mention that FreeBSD's ccd implementation of RAID-1 is by far
not as powerful as Linux'. FreeBSD's ccd RAID-1 protects you against
disk loss all right but it has no way to replace a disk once it
fails. When one disk fails you need to make a backup of the
filesystem, construct a new array with a new disk from scratch and
play back the backup. Linux RAID-1 can hook in disks at runtime and
it can even have hot-spares.
In the Linux case I should mention that they suffer from sub-optimal
userland integration. Luckily most distributions now use the mdadm
userland setup utility instead of the raidtools which were mixing up
quite a few things. By now mdadm is even documented OK, just their
config file format still leaves to be desired.
The worst feature of Linux software RAID is still automatism going
wrong, in particular distributions that start up the raid system in
read/write when you boot into single-user mode. For a FreeBSD
person the latter is mind-boggling mischief. Single-user ==
read-only. The problem here is that the Linux software RAID (as
opposed to FreeBSD's) uses raid superblocks that are written to when
the array is started, so if you fiddle with device order you might get
stabbed in the back by that.
I once almost lost my RAID-5 array when I rearranged controllers and
the device ID of one disk changed. In FreeBSD ccd, if your device IDs
change, you just start the array with the new device IDs and you are
done. In Linux, it detects that an old disk of the array went away
and in my case it downgraded the RAID-5 to a 2-disk setup and then
manually re-synced the third disk - which had the right data in first
place. If I had lost one of the two disks in the two hours that the
re-sync took, my array would have been gone. This might have been
caused by raidtools and might not apply with mdadm, but again, the
real problem here is that the OS fiddles with my disks in read/write
mode when in single-user mode.
Conclusion:
-----------
Although this test shows that FreeBSD still beats Linux in disk
performance when it comes to single disks, the old ccd RAID driver
that I test for FreeBSD here is not much of a competition for Linux
software RAID.
FreeBSD ccd is OK in RAID-0 with just 10% streaming performance lost
compared to Linux, but the seek times are not impressive. In RAID-1
the seek performance is outright disappointing.
Performance-wise the Linux software RAID is very impressive. That
particularly applies to the write performance in RAID-5 and to the
seek performance in RAID-1.
However, userland utility and general distribution/startup issues can
threaten the integrity of Linux arrays. Based on my own experience
after a few years of Linux raid I have to say that the only safe way
to operate an array for a long time is to take all raid startup
statements out of Linux' startup scripts, do not use raidtools and use
mdadm without config file. The best upgrade-safe way to handle this
is to make a raid start script of your own and it does nothing but
commandline startup of mdadm with no config files. Personally I
wouldn't let single-user mode mount anything read/write but with raid
arrays that's even more imperative.
Consequently, I always use a plain bootup partition on Linux and mount
the RAID arrays later.
Should you get a hardware RAID controller instead?
First of all, forget about all these cheap and/or onboard SATA "RAID"
controllers, they just do software RAID in the driver, and you will
lose your array if a disk fails when the OS is not up (read horror
stories on the anandtech forums and elsewhere).
If you just want speed out of RAID-0 it is no question that both
FreeBSD ccd and Linux provide this in software, Linux a little better.
FreeBSD's RAID-1 cannot be used on a large scale due to the
backup-on-fail procedure. And it doesn't give you the performance
advantage you'd expect. It is probably OK if all you want is
protection for a small sub-part of your overall disk space.
Linux also does RAID-1 and RAID-5 very well, performance-wise.
However, userland and startup issues make me recommend that you only
do that when you really learn how it works. And I also recommend that
you take control over your RAID arrays out of your distribution's
hand. You should simulate a disk fail and go through the act of
dealing with it after you set up the array but before you move
important data to it.
Having said that, once you did that, you'll have a great array in
Linux. And it is not that hardware RAID does that well either if you
don't know what you are doing.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum