Follow

Anyone knows how to setup kvm/libvirt to boot a VM directly from an LVM partition containing a filesystem?

I just want the host to load the latest kernel + initrd from the root filesystem of the VM without the need for a DOS partition table, an MBR to load GRUB, etc.

· · Web · 3 · 1 · 0

And if the solution starts with "create a GPT and a UEFI boot partition", then I'm happy to keep the MBR.

What I'm looking for is to simplify VM disks down to a plain filesystem hosted in an LVM logical volume. Now I have this DOS/GPT disk nonsense that can't be mounted on the host without special magic.

cc: @benoit @angristan

@codewiz
a) grub2 can "in some cases" write an MBR which can boot kernel/initrd from LVM. It's not stable enough for us to support it in RHEL, we closed the feature request "boot from LVM without boot partition".
b) you can host kernel/initrd completely external, i.e. make it available via PXE
c) Try to have qemu-kvm use kernel directly. You would then open the LVM from the host, and use "qemu-kvm -kernel -append -initrd [..".
c) looks best, share how it goes.
@benoit @angristan

@globalc @codewiz @benoit @angristan I wonder if this can also be done with a separate Grub disk image that boots the main one. (Not that @codewiz said why a single LVM part was needed!).

@penguin42
If I recall the issues correctly, that might work. IIRC, the issue for grub2/lvm/mbr here is that it needs to embed a bitmap of the places on the disk where kernel/initrd are stored, and there is not enough space for that bitmap in all cases.
@codewiz @benoit @angristan

@globalc @penguin42 @benoit @angristan @srevinsaju I think it was LILO having a bitmap to find the blocks where the kernel and initrd were stored on the disk.

GRUB1 used the "MBR DOS compatibility gap to store its "stage 1.5", which was able to decode the ext filesystem enough to load the stage 2 which would then load the config and bring up the menu.

unix.stackexchange.com/questio

@globalc @penguin42 @benoit @angristan @srevinsaju I can't find where it's documented, but I think that GRUB2 installs a lot of stuff in the huge gap left by fdisk or gdisk to align the first partition to 2048 cylinders. This is how GRUB2 can boot from a RAID5, LUKS-encrypted, LVM partition.

@globalc @penguin42 @benoit @angristan @srevinsaju On UEFI systems, GRUB2 places an EFI binary in the EFI system partition, which is just a DOS filesystem. If there's "secure" boot, there's a shim bootloader signed by Microsoft which loads the GRUB EFI file after verifying it somehow.

I don't know how you'd make the DOS partition work with software RAID1. If you can't, this would make all UEFI servers vulnerable to a single disk failure, which is bad.

@globalc @penguin42 @benoit @angristan @srevinsaju As you might have guess, I'm not a big fan of UEFI, GPT and secure boot. I think they introduce a ton of unnecessary complexity to support a chain of trust leading to Microsoft's "Trusted Computing" dream, where all PCs sold on the market will only boot a "genuine" operating system authorized by Microsoft.

en.wikipedia.org/wiki/Trusted_

@codewiz @globalc @benoit @angristan @srevinsaju It's worth separating your dislike into (UEFI), (GPT), (Secure boot) individually;

@codewiz
Can't say that I'm happy with Microsoft being the ones who sign, but that is anyway just used as "the standard", so OEMs know which CA-cert they should ship.
One can enroll own keys in the firmware and untrust the Microsoft one. Some OEMs also by default ship an own CA-cert they trust with their systems, next to the Microsoft one.
@penguin42 @benoit @angristan @srevinsaju

@globalc @codewiz @benoit @angristan @srevinsaju Eek I think I'd worry more about the OEM certs; my main problem with a lot of the UEFI setups is the OEM provided bugs; like the ones that lose the boot order, or always hunt for a windows installation.

@penguin42 @codewiz
At least one can remove the certs the system is trusting.
Besides rolling out own certs, some firmwares also allow something like "trust code with this hash".
These things can be nicely tried out nowadays, KVM can now deal with secure boot.
I started to write an article on secure boot and emulation, own certs some months ago but got stuck.

@globalc @codewiz Yeh I know you can set it up with a vTPM for that.
(I think I've seen @th have links to insane things booting like that?)

@penguin42 @globalc @codewiz ovmf secure boot with KVM and a vtpm works, and I use it all the time for testing UEFI stuff, although it doesn't really provide any additional protection against attackers. Typically I tftp the kernel+initrd as a signed EFI executable instead of messing around with GPT.

@codewiz @globalc @penguin42 @angristan @srevinsaju you can simply dd the UEFI partition on all disks. If a disk is dead the motherboard will load EFI stub via the others disks.

@codewiz @globalc @benoit @angristan @srevinsaju Which sounds like exactly the problem when you just have LVM though - no partition table, no gap?
As for RAID1'ing the UEFI, hmm; UEFI can do a search, so if you replicated the boot partition to two devices, it should hunt if you give it the list of devices to search. (Or they'll sell you an overpriced RAID controller....)

@codewiz You could mount your LV in read-only on your host. Then you can select the kernel/initrd path either with the machine definition XML or use virt-manager :)

@codewiz Also the special magic you say, is a matter of running partprobe /dev/mapper/vg/lv... Not too much of an issue.

@codewiz I think libvirt manager has xml support for filesystems as a way to attach disk storage.

I don't know if it can be used to accomplish what you described (as in I've never done that). However there is the ability to mount qcow2 disks directly onto a linux filesystem via qemu-nbd so I don't see why what you described wouldn't be possible.

I'll look into this later today if you haven't found out an answer.

@codewiz mount -oro,loop your-lvm-partition /somewhere; cp /somewhere/bzImage /somewhere/initrd /tmp/foo; umount /somewhere; kvm -kernel /tmp/foo/kernel -initrd /tmp/foo/initrd -all -the -other -kvm -cruft

;-)

The more sophisticated variant (that gets by without the host kernel touching VM data, which can be awkward if the VM kernel messed with it in unknown ways) would be an emulation/qemu-q35 coreboot image with grub2 or filo or something as payload. That way the in-"flash" code knows your filesystem and can load kernels directly.

The option for enable such an image in kvm would be "kvm -bios /path/to/bios/image", and I guess libvirt has some underexercised config field doing the same.

@patrick I like the qemu + coreboot with a grub payload idea because it should work seamlessly with any distro which updates the grub config after installing a kernel update.

Where do I find the right image? Or how would I build it myself?

@codewiz oops, been meaning to answer to this: coreboot doesn't provide binaries by itself and we insist on using our own cross compilers (for reasons), so building yourself is time consuming the first time. See https://doc.coreboot.org/tutorial/part1.html

I can see if I can build an image for you if you want.
Sign in to participate in the conversation
Mastodon

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!