Linux: Error Detection and Correction

Posted by tadelste on Oct 12, 2005 6:30 AM EDT
KernelTrap
Mail this story
Print this story

Alan Cox [interview] submitted a pair of patches to add error detection and correction (EDAC) logic to the 2.6 kernel. He noted, "I don't think its yet merge ready but getting there so I'd appreciate other folks comments and views on what else needs fixing before generating a submission for Andrew." Alan has submitted a subset of thebluesmoke kernel module which "is mainly concerned with reporting ECC, PCI, machine check, cache, hypertransport, thermal throttling and related events." This version of the patch is only for the 2.6 kernel, and was renamed from bluesmoke to EDAC.

Memory error checking used to be accomplised with a parity checking bit that was attached to each byte of memory. The parity bit was calculated when each byte of memory was written, and then verified when each byte of memory was read. If the stored parity bit didn't match the calculated parity bit on a read, that byte of memory was known to have changed. Parity checking was a reasonably effective method for detecting a one bit change in a byte of memory. ECC expanded upon this idea with the use of a hashing algorithm that calculates a checksum for multiple bytes of memory. This checksum can be used to detect when one or more bits has changed. On single bit errors, it can also restore the memory to its intended state, actually correcting the error.

Full Story

  Nav
» Read more about: Groups: Kernel; Story Type: Interview

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.