Show HN: Ephemeral VMs in 1 Microsecond

URL: github.com
12 comments

What do VMs mean in this context?

I did a pass of the codebase and it seems they’re just forking processes?

It’s unclear to me where the safety guarantees come from (compared to using e.g. KVM).

Edit: it appears the safety guarantees come from libriscv[0]. As far as I can tell, these sandboxes are essentially RISC-V programs running in an isolated context (“machine”) where all the Linux syscalls are emulated and thus “safe.” Still curious what potential attack vectors may exist?

[0] https://github.com/libriscv/libriscv/tree/dfb7c85d01f01cb38f...

The whole thing could really do with an explanation of how it works.

Well, the boundary between the host and the guest, the system call API, is always going to be the biggest vector of attacks no matter what the solution used is. But, if you find a problem and fix it, you're back to being safe again, unlike if you don't have any sandboxing at all. You can also put the whole solution in a jail, which is very common nowadays.

libriscv sounds amazing on paper, I’d love to learn more about it

The use of the term VM without further qualification in the title is unfortunate. Emulated VM would have been nicer to avoid confusion with hypervisor style virtual machines.

Staring emphermial hypervisor VMs quickly is more noteworthy (since they are often slow to start) than an emulator VM where it's expected to be fast since it's usually not much more than setting up a datastructure and executing a call to an interpreter. I clicked hoping for the former, only to find out the project is the latter.

What exactly is the difference? Are you talking about hardware virtualization via e.g. intel vt-x? Do you mean virtualizing hardware subsystems with drivers instead of forwarding syscalls directly? Running a kernel?

Maybe I’m not seeing how those things are fundamentally different than “setting up a datastructure and calling an interpreter”.

A hypervisor VM is running native code via hardware virtualization. An emulator VM is running an interpreter and/or JIT generally using a different instruction set.

A hypervisor VM typically requires more extensive setup involving hardware configuration and is usually used for running existing native code, so it often means emulating a real machine, including OS and sometimes even firmware/BIOS. There are "lightweight" environments like Firecracker, but the overhead of creating an instance is still heaver than the overhead of a function call. The instance creation overhead is high, but the instruction performance can be close to native. Microsecond VM creation is notable given the typical instance creation overhead in this case.

A emulator VM for a sandbox typically will just be a software CPU emulator with some level of OS emulation. The instance creation overhead is setting up a data structure and issuing a function call to "run" the CPU emulator. The instruction performance is generally much slower than executing native code. Microsecond VM creation is not very notable in this case.

If your running a long running process the hypervisor approach is usually superior. If your running a very short lived process (for instance a VM per http request), the emulator approach may work better.

There is also the container approach like Docker which is somewhat in between in overhead and can run at native speed on bare metal. The OS virtualization GVisor approach of capturing syscalls.

a hvm is only running via hardware assisted virtualization if the guest is using the same ISA; a non-native guest is still "real virtualization", if all else is equal, isn't it? in that case, wouldn't the processor be the same thing as a "CPU emulator"? if not, how is it different?

I guess what I'm trying to say is maybe the distinction you're drawing isn't really as distinct as you think it is; if this project had virtualized devices and a kernel driving them instead of passing through syscalls, would that be real virtualization, assuming we're talking about a non-native guest ISA? don't vm guest drivers abstractly just pass through to syscalls / host drivers anyway? what if there was no OS and the user's code implemented those drivers? aren't virtualized devices "just setting up a datastructure and calling a function" too? if not, what are they?

like, do you see how this is really a spectrum or collection of system components with levels of virtualization?

CPU-only virtualization with syscall sandboxes is still more secure and useful than fancy chroot.

Is this better than Firecracker? I was thinking about using that but it needs nested virtualization and the servers that support that aren't as good of a value. Anyone know a good option for nested virtualization that is inexpensive?

Hetzner is really cheap but not sure about the cost effectiveness for the dedicated servers. Actually I think what I saw was that I couldn't get the one I wanted in a US datacenter.

The cited startup time is better than Firecracker but it's not a better tool than Firecracker (for a start only claims to be a PoC, "contains only the necessary parts for realistic benchmarking"). Looks like it's based on RISC-V emulation so the approach seems unlikely to get performance comparable to Firecracker.

You might look into gVisor if you're running containerized workloads on a host without virt support (such as a vm without nested virtualization support).

Absolutely not! It's better to use something backed by KVM, so that you can use all the features of the CPU. This is just a proof-of-concept that was fun to make.

You could also use gVisor which doesn't need nested virtualization.

This competes with WASM Serverless, therefore something like Fermyon Spin, which is built on top of it (https://www.fermyon.com/serverless-guide/speed-and-execution...). Wake up a RISC-V emulator on a http request in 1µs, do your thing and exit. Then gone is the RISC-V VM. WASM takes a millisecond or more to spin up, as it is bytecode.

Is there any cloud that provides RISC-V VM's, coupled with SQLite access for persistence?

> This project […] contains only the necessary parts for realistic benchmarking

> The test program is a simple […] return string

I understand how this is required to measure the effects of sandboxing in isolation. And the result is impressive.

In what ways would you expect performance to be affected when workloads are more realistic as well?

I have a bit of experience in this, and adding monitoring, logging and observability doesn't really affect it compared to the non-sandboxing path: All of those things should already be happening. There should already be logging and statistics gathering as part of the larger service.

libriscv in interpreter mode is fast compared to other interpreters, but not near native performance. As I wrote earlier in the thread using something backed by KVM is what I would do if I were architecting a solution for someone. Eg. my TinyKVM KVM-based userspace emulator would fit the bill.

Tangential, but let's say I want to build a multiplayer game, where players (untrusted users) are allowed to run arbitrary code in some kind of a VM. I've so far established that:

- The VM has to be built in a safe and performant language (like Rust, Go, or - if careful - modern C++), and available as a library to integrate with the rest of the game. However I don't trust myself to write safe C/C++ (the game is being prototyped in LÖVE/Lua).

- Each VM instance needs a tight execution/instruction budget, to avoid stalling the server's main update loop; e.g. a timer/virtual "hardware interrupt", or simply counting cycles, or even something modelled after eBPF. The total number of VM instances running in a single game would also need to be limited somehow (e.g. making a key component scarce and/or non-renewable, or dividing the total instruction budget across all VMs, or requiring a player to be present in a nearby world chunk).

Use cases are something like redstone in Minecraft: curious and technically-inclined players could build contraptions, like auto-farming crops, pranks/traps, defences, fancy gates/moats, etc. Not the core of the gameplay, but rather one aspect of it, for the curious to explore, learn, have fun with.

There are many off-the-shelf VMs that do RISC-V or similar ISAs, and I'm considering picking one of those, but wondering if a RISC instruction set isn't too low-level for such a thing. On the other hand, it would be nice if the knowledge would be directly transferrable to the real world.

Anyone tried to build something similar and can share their experience?

Aren't you describing the ideal scenario for WASM here? Some runtimes like wasmtime have a concept of fuel [1], which can limit the execution time.

One of the Rust-based Lua VMs, piccolo [2], was also made specifically with the ability to be interrupted in the middle of anything, by being stackless.

[1] https://docs.wasmtime.dev/api/wasmtime/struct.Store.html#met...

[2] https://news.ycombinator.com/item?id=40239029 (see "Fuel" on its article)

Actually Piccolo sounds like the ideal solution here, if I were to stick to LÖVE/Lua. It could be weird/confusing to have completely different VMs for the in-game contraptions vs the game engine itself (LuaJIT), but I think the narrow gap could attract players to become modders/contributors. Thanks for the suggestion!

You probably don't need what you're thinking of a "VM instance" to run a VM, and for games you may well be better off building your own VM, although that depends how far down "arbitrary code" you go.

There are a few Zach-like games which achieve this kind of thing.

The scope of the VMs they build are different, the VMs in Shenzen I/O are quite different to that of TIS-100. Likewise non-zach zach-like games like "Turing Complete" also let you build through VMs.

These aren't the kind of VMs you might expect, they're not full x86/x64 emulators, they're very simple virtual machines that have can handle a small abstracted instruction set.

If you restrict the VM to executing a limited number of instructions/cycles, then you don't need it to be super performant in terms of clock cycles.

More important is to define the memory limits of your VM. This is a key constraint that will keep processing feasible but also keeps understanding to be manageable for the player.

If I recall correctly, TIS-100 has 1 general register and 1 accumulator register in each general cell. I don't remember now if there are also special memory cells, but if there are, I suspect the total amount of memory is of the order 64 bytes or so.

Other similar VM games have more memory support, but typically top out at 16KB.

What's the actual gameplay loop you'd like to achieve? Defining that will help define and shape the constraints you put on players.

> You probably don't need what you're thinking of a "VM instance" to run a VM, and for games you may well be better off building your own VM [...].

Indeed. I want something less awkward than redstone; I'm also considering something like fCPU[1] or eBPF (no jumping back = no loops = super easy to keep an upper bound on execution budget).

[1]: https://mods.factorio.com/mod/fcpu

But one of my side goals is to attract players who might be interested in contributing to the core game, so I also like whytevuhuni's suggestion[2] to embed a Lua interpreter. That's a faraway stretch goal (I still have very little in terms of prototyping), but I'd like to keep my options in mind.

[2]: https://news.ycombinator.com/item?id=42493118

> What's the actual gameplay loop you'd like to achieve? Defining that will help define and shape the constraints you put on players.

The general direction is something like Minecraft (with some Factorio / Oxygen Not Included thrown in), with focus on exploration, self-expression, collaboration, and a kind of lore/storytelling (that's deeply personal, like GRIS but still in a generative sandbox).

For example, people have made pixel art in Minecraft by meticulously placing blocks, one by one, on a huge area, and then creating an in-game map of it all[3]. I'd like my game to feature a pixel art editor instead. Same goes for in-game coding/automation: I want to reduce the friction from an idea to seeing it in-game.

[3]: https://minecraft.wiki/w/Map

Thank you for all the suggestions!

I would go with lua for the players. You can easily sandbox it, by not compiling in the dangerous functions. Using debug.sethook you can limit execution by count (https://www.lua.org/pil/23.2.html). And finally you can bring your own alloc for lua.

There are also decades of articles on how lua works with C and C++, and you can find examples for Rust and others too.

QEMU microvm

microvm’ virtual platform (microvm)

microvm is a machine type inspired by Firecracker and constructed after its machine model.

It’s a minimalist machine type without PCI nor ACPI support, designed for short-lived guests. microvm also establishes a baseline for benchmarking and optimizing both QEMU and guest operating systems, since it is optimized for both boot time and footprint.

https://www.qemu.org/docs/master/system/i386/microvm.html

Run headless chrome and execute each script on a web page.

I understand you mean it as a joke, but unironically, it's among the most widely deployed sandboxes in the world, which actually would make it a worthy contender if not for the resource requirements, complexity, and the upgrade treadmill.

> if not for the resource requirements, complexity, and the upgrade treadmill.

this is what makes it a joke

[deleted]

Having scanned the codebase, I think this is about quickly and safely launching and managing risc-v binaries as sandboxed processes? Which is useful, but has nothing to do with virtual machines in the usual sense of there being a hypervisor with hardware support for isolation.

Reminds me a bit of Cloudflare's isolates, but the title is super confusing

to check my understanding on what this is offering, I could build something on top of this that offers remote code execution for people without needing to worry about my system being compromised? or other people's processes interacting with one another, but the VM will still be able to make web requests itself?

The VM could only make requests if you add a system call specifically for that purpose. Eg. if you add something like sys_request that takes a URL as argument and some input and output arguments. It's not that the VM couldn't open a socket and handle connections by itself, but it's all closed down.

I think when it comes to being integrated into a web server, you ideally want to use the web servers pipeline to make requests, so that you can benefit from the same logging and observability that Drogon (or other solutions) already have.

If you allow the VM to make requests on its own, you just have the IP and an open socket. Not much to debug with if (let's say) you have a customer asking about something.

What are the use cases?

Explained in the first and second paragraphs of the README:

> Multi-tenancy allows one server to be safely shared among many users, each of which cannot access each others or negatively affect the HTTP service.

> Specialized sandboxes are instantiated for each request and immediately destroyed after the request, all within a single microsecond.

Some more practical examples would be nice, including security impact

Whatever happened to Unikernels?

They are still in the same tiny niche

Nothing.