Has anyone seen an HDD with bit error rate lower than 10^-15 ?
I've looked at latest WD Red Pros and WD Ultrastars and they all have bit error rate 10^15, even though they're like 10TB = 10^13 * 8 =~ 10^14 bits big...
so like, 10% chance a RAID rebuild will fail? That's huge...
@lanodan well those can only turn a silent data corruption into an unrecoverable read error.
I already have an unrecoverable read error (HDDs have built-in per-sector checksums).
If you have a fully operational RAID1, and you encounter an URE, you just read the value of that sector from the other disk and overwrite the faulty one, hoping the HDD will reallocate it.
But that requires (1) stumbling upon that URE while you still have a non-degraded array, and (2) HDD correctly reallocating
@lanodan If the bad sector develops while you're not reading it, neither HDD's nor ZFS's checksums will tell you about it.
A patrol read could probably help, but idk.
Then at a later time, when you see an URE on something that you do read, and then you get write error when trying to fix it, you set the drive to failed, and replace it, right? But if the previous bad sector you didn't detect happens to be on the other drive, you will find it during rebuild, when you only have 1 copy of the data.
@lanodan well, when rebuilding onto 3rd drive from 2 "good" ones, a rebuild will only fail when the same sector number is bad on both "good" drives, which is squarely smaller. So for 3x10TB RAID1 you'd get ~3x10^-13 chance for a failed rebuild. Which sounds pretty damn good to me. But then, you're only using 1/3 of your total capacity, which sucks.
I wonder what the numbers are for 4-drive RAID6
@lanodan ok for 4-drive RAID6 it's just 3x higher chance, because out of 3 sectors, 2 need to fail for a failed rebuild.
So 10^-12 chance for failed rebuild, and you're using half of the raw capacity.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!