Martin Cracauer Homepage Forum Forum Index Martin Cracauer Homepage Forum
Don't get confused, this is just my homepage, not really a message board. I implemented it as a forum for reasons you can find here.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Why I use ECC memory

 
Post new topic   Reply to topic    Martin Cracauer Homepage Forum Forum Index -> General hardware notes
View previous topic :: View next topic  
Author Message
Martin Cracauer
Site Admin


Joined: 09 Feb 2005
Posts: 113
Location: Boston, MA, USA

PostPosted: Fri Sep 16, 2005 2:12 pm   Reply with quote

[When you link to this page please use this URL: http://cracauer-forum.cons.org/forum/ecc.html ]

Here is my standard writeup on why ECC RAM can be important and the lack of it can shred the contents of your harddrive

First - what is ECC RAM?
  • ECC RAM is error-correcting RAM. If one bit in a word is accidentally changed during storage, the error will be corrected. If you are unlucky and two bits get changed, then you will get an error notification to your OS instead of silently continuing to work with corrupted data.


ECC memory is quite a bit more expensive than non-ECC memory, and a little slower. On the other hand, the error rate in any RAM is very low these days. So is it worth it?

It is true that the average error rate of RAM has been lowered dramatically. Even though you may have 2 Gigabytes of RAM now you have less errors per year than you had with 64 Megabytes ten years ago.

But, the rate is not zero, and will never be. And the bigger problem than just random bit changes from radication or whatever is RAM physically going bad. You can easily have one of chips malfunction, or you can get some dirt in the sockets or cause other electrical problems that don't kill your RAM but make your RAM return nonsense.

Now, why is that so bad? Obviously you have random values and strings in programs turning around and you can get wrong numbers of calculations or you can get segmentation faults as a result of pointers that have been "bent".

But that's not the real issue. The real issue is that it can shred the contents of your harddrive.


Why can memory corruption shred the contents of your harddrive?

Short version: because the filesystem buffer cache in the OS (the one in RAM, not the one on the harddrive) will be screwed up and it will write wrong blocks at the wrong places on the disk, because the information where a block belongs is changed. Then you write, for example, a data block for file "foo" on the location where the allocation blocks for "bar" live, assigning a bunch of random blocks to file "bar", losing the original blocks, and leaving file "foo" without its new content.

Long version:

Because the filesystem buffer cache can, and will, be affected. The operating system uses a certain amount of RAM as cache for the filesystem. That can be quite a lot. A home PC with 1024 MB RAM and running Windows and Office has plenty of RAM to spare and operating systems (Windows and Linux alike) use all the memory not used for the application for cache of disk contents. Besides "harmless" caches of readonly pages like for the application's program code itself there is also a write cache of filesystem contents that have not yet been written out to the disk.

The cache works by having a copy of a disk block and associated with it a value that indicates where that block will belong once it gets written to the disk.

Now, if you have bad memory, you might have one of these cached blocks modified and the wrong data is eventually written to the disk.

But that is only the harmless case.

The harmful case is that the index that says when that blocks belongs is damaged. If that happens the right blocks gets written to the wrong place on the disk.

So you can instantly ruin (corrupt with random data) any file on the filesystem. Even files you haven't even looked at since the PC is up. Any file, any partition, even not mounted ones. So you modify your shopping list and think it is not a big deal if it gets corrupted. But the disk blocks containing the shopping list data get accidentally written over you dissertation. Or over your password file. Now parts of your shopping list are where your password once was and you will have a difficult time logging in again.

And it gets worse. If the new faulty location overwrites a directory entry, then you can lose any number of files in one snap. Even better, if it hits an allocation table, you can end up with totally wrong blocks in some files, none of them you ever touched. Or you can kill the superblock or other critical information that makes you unable to mount the filesystem at all. Instant total loss of all files.

That is why I have ECC RAM.

For me as a software developer ECC RAM is also a question of necessary paranoia. If I get a segmentation fault in one of own programs, or in the OS kernel I just messed with, I need to be absolutely sure that the segmentation fault is caused by my application, or by my kernel changes. I need to be absolutely sure it is broken program code and not the hardware. It would truly suck to spend weeks to debug my software to track down an observed memory corruption when it actually was a hardware problem and there is no software bug.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Martin Cracauer
Site Admin


Joined: 09 Feb 2005
Posts: 113
Location: Boston, MA, USA

PostPosted: Tue Jan 03, 2006 5:52 pm   Reply with quote

By the way, ECC memory will become much more important with DDR2 and DDR3 memory than with the DDR memory I use now.

The lower voltage on DDR2 means that memory cells will be more vulnerable to background radion and other sources of bit errors.
Back to top
View user's profile Send private message Visit poster's website Permanent URL to this post in this thread
Display posts from previous:   
Post new topic   Reply to topic    Martin Cracauer Homepage Forum Forum Index -> General hardware notes All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group