I lost a tiny bit of data, but the file is not corrupted anymore, great

I think this is the best outcome since I:

- couldn't dump the database
- couldn't copy the database raw files
- couldn't snapshot the LXD container
- couldn't export the LXD container
- couldn't migrate the LXD container to another host
- couldn't export the dataset
- ...

I have dumps, I have ZFS snapshots, I have snapshots of the VM, but restoring one of those meant that I would lost waaaay more data that what I actually lost

I was stuck with my corrupted file, but about 0,000003% of the SQL reqs were failing... not worth restoring a dump

Thanks to dd, I think about 130 KB of the postgresql file were corrupted... that's 0,0000035% of the database size (~40 GB)

That also why even though a databse file was corrupted, mstdn.io was nearly not affected

The biggest issue was actually to not be able to dump the db

The solution was to dd the file to another file and skipping the corrupted blocks, then replacing the corrupted file with the dd'ed one... Seems to work so far since I don't have any i/o error anymore

But I definitely lost 130 KB of data... it seemed to be some toots... rip in peace

I'll do a little blog post to keep track of this, might be useful

Still... I don't know what caused the issue. This is very concerning since it's a VM and the underlying storage is RAID. And most of all, ZFS should prevent this :blobderpy:

@angristan *whisper* all your db migrate with pg_bouncer

@angristan I think even a stronk FS can't prevent a db software fucking up some bit moving

@angristan that why I stay with rusty strong ext4 with no nifty recuperation features, but aleast when it's written, it's written, everything is up to postgres (and overlayfs2 afaik)

@angristan Yes, there is only one disk. ZFS can autocorrect errors (that would be invisible with another FS), but you need at least 2 disks, and a ZIL on yet another disk is a good idea too.
What probably happened : your disk created an error, the scrub ( or simply reading the sector ) detected the error, and since it had no way to correct it, it declared the file corrupt. It's how ZFS works.

@angristan "zomething zeems to be wrong with that zpool" is funnier

@angristan T'as pu suppr le truc corrompu du coup ? J'ai pas suivi dΓ©solΓ©

Sign in to participate in the conversation
Mastodon

Fast, secure and up-to-date instance, welcoming everyone around the world. Join us! 🌍
Up since 04/04/2017. βœ…

Why should you sign up on mstdn.io?

This instance is not focused on any theme or subject, feel free to talk about whatever you want. Although the main language is english, we accept every single language and country.

We're connected to the whole ActivityPub fediverse and we do not block any foreign instance nor user.

We do have rules, but the goal is to have responsible users.

The instance uses a powerful server to ensure speed and stability, and it has good uptime. We follow state-of-the-art security practices.

Also, we have over 300 custom emojis to unleash your meming potential!


Looking for a Kpop themed instance? Try kpop.social