FFmpeg-over-IP – Connect to remote FFmpeg servers

URL: github.com
10 comments

Hi HN!

I'm excited to show case an update to a personal project of mine. Its called ffmpeg-over-ip and it allows you connect to remote ffmpeg servers, so if you have one machine with a GPU (could be your windows gaming laptop, gaming PC, a macbook?) and a machine (or VM, docker container etc) without a GPU, you could use the remote GPU to do GPU-accelerated video conversion.

The way it works is pretty neat, there are two components, a server and a client.

- The server (has the GPU) comes with a patched up ffmpeg and listens on a specified port - The client (without the GPU) connects to the server, takes the file IO requests from the server and runs them locally.

ffmpeg doesn't know that its not dealing with a local filesystem, so this approach works with multiple inputs or outputs like HLS, and is perfect for home media servers like Plex or Jellyfin or Emby.

One server can offer up its GPU to as many clients as needed, all clients offer up their own filesystems for their requests, the server also comes with a static build of ffmpeg bundled (built from jellyfin-ffmpeg scripts) so all you have to do is create a config file, set a password and you're good to go!

It's been about a year and half since this was last submitted (https://news.ycombinator.com/item?id=41743780). The feedback at the time was around the difficulty of sharing a filesystem between the machines so that should no longer be a problem.

This has been really useful in my local setup, I hope you find it useful. If you have any further questions, the website has some FAQs (linked in github repo), or you could post them here and I'll answer them for you!

Thanks!

Some operations (downsizing already heavily compressed video with the latest and greatest compression techniques) are CPU or GPU bound, but others, like bunching thousands of high res jopg into a lightly compressed timelapse are more likely to be IO bound. So how does this tool make the trade-off? I imagine some things can best be dealt with locally, whilest for ither operations offloading to a external GPU will be beneficial. Also makes a lot of difference if your bandwidth is megabits or gigabits.

Nice to see you here!

Was really impressed by your work on Pundle. (It was an amazingly fast HMR dev environment - much like Vite today.) Felt like I was the only one using it, but it was hard to walk away from instant updates.

Thanks for putting a smile on my face! I am glad you liked it! :)

This is pretty neat. I was experimenting something similar with my ffmpeg frontend to connect to the local machine (and remote) to run arbitrary encoding jobs, thus offloading the encode tasks to another machine, but still with a queuing mechanism locally.

The project is https://ffmpeg-commander.com for generating ffmpeg commands, but with an experimental backend to offload the tasks.

Do you support chunked encoding across multiple servers? It would be a great feature to support larger video files.

Maybe you can submit a patch to ffmpeg.org.

cool idea. can you elaborate on IO and how the ffmpeg-server reads blocks from the client? that would seem to be a big blocker

> cool idea. can you elaborate on IO and how the ffmpeg-server reads blocks from the client? that would seem to be a big blocker

ffmpeg-server runs a patched version of ffmpeg locally, ffmpeg requests to read some chunks (ie give me video.mp4) through our patched filesystem (https://github.com/steelbrain/ffmpeg-over-ip/blob/main/fio/f...), which gets sent over the same socket that the client established, client receives the request, does the file operations locally and sends the results back over the socket to the server, and server then sends them to ffmpeg.

ffmpeg has no idea its not interacting with a local file system

Is video that cpu/gpu bound that streaming it over the interwebs isn't the issue?

Maybe my use cases for ffmpeg are quite narrow, but I always get a speedup from moving the files off my external hard-drive, suggesting that is my current bottleneck.

> streaming it over the interwebs isn't the issue

The hope is that you stream over LAN not the interwebs!

> I always get a speedup from moving the files off my external hard-drive

Based on your description, it does seem like your ffmpeg may be IO limited

Ah, yeah, so this is probably for more professional workflows where you have a workhorse somewhere. Perhaps even in the cloud as long as the files are close by as well? My use case would be more "my computer sucks, so would be nice to do it on a beefy cloud computer", but of course no time is saved when just reading my files is slow, heh.

very clever and thanks for explaining. for gpu-bound processes, which are common ffmpeg use cases, this is a great approach

ffmpeg has great http input and output support. I've been using this quite a bit recently. Wrapping ffmpeg with node.js and using the built in http server and client to interact with it.

It's even reduced load considerably because most of the time the disk doesn't even need to be touched.

yep same here, i do the same thing for my video pipeline. spawn ffmpeg as a child process from node, pipe stdin/stdout directly and skip disk entirely for intermediate steps. concat demuxer + xfade filters for stitching scenes together. the only time i touch disk is the final output and even thats optional if youre uploading straight to s3 or whatever

What's the point of this?

A single CPU core on a 9500T or a Ryzen V1500B is fast enough to real-time re-encode 60mbps 4K H264 to 1080p 5mbps h264, aka, for a core use case - transcoding for web for Jellyfin over cellular, for example - you haven't needed hardware video engines on PCs for 9 YEARS.

I have no idea why people are so hung up on hardware video encoding. It's completely wrong. The quality is worse. The efficiency is a red herring - you will still use every CPU core for IO threads in ffmpeg, if you don't configure that away, which you do not. And it requires really annoying setup and premium features on stuff like Plex. It just makes no sense!

If latency is important to you, well then hardware engines make sense. But you are throwing away the latency sending it over the network. The only use case (basically) is video game streaming, and in that case, you'll have a local GPU.

I have never read one of these ffmpeg network hardware encode innovations to have an actual benchmark comparison to single thread software transcoding tasks.

I know you mean well but really. It makes NO sense.

> The efficiency is a red herring - you will still use every CPU core for IO threads in ffmpeg, if you don't configure that away, which you do not. And it requires really annoying setup and premium features on stuff like Plex. It just makes no sense!

I would love to learn more about this! What can I do to fully optimize ffmpeg hardware encoding?

My use case is transcoding a massive media library to AV1 for the space gains. I am aware this comes with a slight drop in quality (which I would also be keen to learn about how to minimize), but so far, in my testing, GPU encoding has been the fastest/most efficient, especially with Nvidia cards.

You would use your full system, saturating the CPU and GPU, including unlocking the number of simultaneous video sessions for consumer NVIDIA GPUs. That said, software AV1 looks a lot better than hardware AV1 per bit.

Thank you for sharing your experience. Seems like this is not relevant to your setup & usecase.

People who need this know who they are. Not everything is for everybody.

What is the use case?

I'd argue this is for nobody haha

Nobody using jellyfin plex or whatever needs it: they should just use software transcoding, it's better in pretty much every way.

I've traveled around a lot in the past couple years so my situation (read: homelab equipment) has been changing and my usecase has been changing with it. It started out as:

- I dont want to unplug the GPU from my gaming PC and plug it into my linux server

- Then: I dont want to figure out PCI forwarding, I'll just open a port and nfs to the containers/vms (ffmpeg-over-ip v4 needed shared filesystem)

- Now: I have a homelab of 4 mini PCs and one of them has an RTX 3090 over Oculink. I need it for local LLMs but also video encoding and I dont want to do both on the same machine.

But you've asked a more fundamental question, why would people need hardware accelerated video decoding in the first place? I need it because my TV doesn't support all the codecs and I still want to watch my movies at 4K without stuttering.

You can transcode in realtime in software to your TV. You don't need the GPU at all. Even on ancient USFF PCs.

I'll tell my TV you said that and I'll see if it stops buffering during playback :)

This doesn't appear to be true. My Plex media server is ancient and it really struggles if it has to do any kind of transcoding. Definitely can't handle high bitrate 4k stuff.

> You can transcode in realtime in software

Sometimes you want faster-than-realtime encoding, such as when backing up your video archive.

A CPU using all its cores is much faster than realtime.

As a rule, strong feelings about issues do not emerge from deep understanding.

[flagged]

Why are you using github as your personal proprietary app depot?

nice, problem is that, with hikivision and dahua got banned these days, the majority of ip cameras on the market do not do onvif or rtsp, or neither, what a shame.

Nearly all ip cameras support ONVIF and/or RTSP.

Get a TP Link Tapo! They are like 20-30 bucks and come with ONVIF.

EZVIZ is another ban-evading arm of HIKVision, easily available in Europe and has RTSP (confirmed) with alleged support of ONVIF as well.

On the surface this seems like a terrible idea:

FFmpeg is mountains of extremely complex C code whose entire job is processing untrusted inputs.

Choosing to make such code network-enabled if you can't trust your inputs, I would recommend to sandbox if at all possible. Otherwise you are asking for trouble.

Thank you for your comment!

The usecase for something like this is when you control both sides, server & client. There is some basic HMAC auth built into each request.

> I would recommend to sandbox if at all possible.

Since the server is a standard binary that doesn't need any special permissions, you could create the most locked down user in your server that only has access to a limit set of files and the GPUs and it'll work just fine. This is encouraged.

I've been meaning to build exactly this for a while, and my waiting has been rewarded by someone else doing it!

Had you thought about using FUSE on the server side, rather than patching ffmpeg? Like a reverse sshfs? That avoids patching the ffmpeg binary, which allows usage of wierd and wonderful ffmpeg binaries with other people's patches.

I'd be interested in seeing how well it works with SBC GPUs - many have hardware decoding and encoding, and their vendors love to fork ffmpeg.

I've explored sftp since ffmpeg has built-in support for it (-i sftp://...), but the support is quite buggy in code, I hope to submit some patches upstream to be able to change it. FTP in contrast seemed much more stable, at least looking at the code. FTP had some other shortcomings that made it undesirable for my usecase.

That was the one motivation, the other one was that it would require rewriting arguments going into the server. What you're describing was essentially what ffmpeg-over-ip v4 (and its earlier versions!) was, and the constant feedback I'd heard was that sharing filesystems is too much work, ssh servers on windows and macOS are a bad experience, people want to use a bundled solution.

Forking ffmpeg was no easy task! Took forever to figure out the build process, eventually caved in and started using jellyfin build scripts, but that has the downside of being a few versions behind of upstream HEAD.

Sharing filesystems is hard when you make users do it in advance.

I was thinking of the server end of an ffmpeg-over-ip system bringing up a FUSE filesystem backed by something similar to your VFS-served-by-the-client. Combine that either with argument rewriting, or chrooting into the FUSE filesystem.

As another commenter said, where's plan 9 when you need it? If you go the FUSE route there are existing 9P implementations for both server and FUSE client you can use.

How does this differ in performance from rffmpeg?

https://github.com/joshuaboniface/rffmpeg

I tried to answer it about a year and half ago and that answer is still mostly correct: https://news.ycombinator.com/item?id=41743932

You can mix and match operating systems, macOS, Windows, Linux, you do not need sudo privileges.

rffmpeg needs a shared file system which could be a huge pain to setup: https://github.com/joshuaboniface/rffmpeg/blob/master/docs/S...

ffmpeg-over-ip patches ffmpeg and only needs one port open for the server, then you just run the binary, no mounts needed at all.

This is software which basically replicates what Plan9 gives you out of the box.

Dammit I really wish Plan9 had taken off. It isn’t perfect but it does a much, much better job of helping me run my applications in ways that I want.

If anyone doesn’t already know, one method of Plan9 remote access is to “cpu” into a remote machine which has the hardware you need. Your local filesystems go with you, and your environment on the remote machine consists of your local filesystems mounted to the remote machine, but only for you, and all applications you run in this context execute on the cpu of the remote machine and have access to the remote machines hardware, but your local filesystems. Imagine SSHing into a remote machine and your entire environment goes with you, development tools and all. That’s what Plan9 does for you.

So if I “cpu” into a machine without ffmpeg, but with a GPU and I run ffmpeg, not only will it work, but I can tell ffmpeg to use a hardware encoder with a simple command line flag, and it’ll work.

The great thing about Plan 9 is it can make other distributed systems look so complicated. The worst thing about Plan 9 is that it can make other distributed systems look so complicated!

So this is port forwarding? (I'm too tired and can't concentrate to read this and am going to bed.)

tl;dr:

- Client makes request to server (which opens a bidirectional network socket)

- Server uses that bidirectional socket, spawns a local patched ffmpeg with vfs-like characteristics

- ffmpeg (using client-server bidrection socket) does input/output operations, treating client filesystem as if it was local

Thus client doesn't need to open any ports, or expose its filesystem in a traditional mounting manner, and one server can handle filesystems & requests of any amount of clients.

ffmpeg already has network capabilities. You can let it open a tcp socket and stream input from there and write output to another TCP socket. How is this different? Is this just a wrapper around this functionality for more convenience or does it provide any fundamentally new features?

To be able to use ffmpeg with its native network capabilities in a usecase of media servers, where you need to stream your input to it, and then get multiple outputs (think HLS) that are streamed back is not possible at this point in time. HTTP, FTP, SFTP, all have their limitations, some are outright broken for HLS usecases, others wont stream seeking.

I would have very much loved to use the built-in capabilities instead of patching ffmpeg to add a vfs layer and spend a ton of time figuring out the build pipeline once you add all the codecs and hwaccels. I do hope to be able to change this in the future, I've identified several bugs that I intend to submit patches for.

This is not a special case. Everything you mentioned above can actually be achieved using cli. You can create listeners, configure pipelines, and sinks(granted not ergonomic). Sinks can be HTTP post for example, and sources can be tcp listeners + protocols on top. You can also configure the buffering strategies for each pipeline.

This sounds like a nice project, but is it a Show HN without putting that in the title, because of the stigma that Show HN has recently acquired?

Show HN? What is that? :-)

I've been reading hacker news for ages, but somehow I missed that there was some convention relating to showing technology on hacker news.

You're going to have to chalk this up to human error, in the excitement to post it, I omitted Show HN:

Ah, I see it now. It's a menu option. Never noticed :-)

Missed opportunity to call this FoIP

Thanks for the update and continuing to share this project. What does the roadmap look like into the future?

This is neat. The thing I would most want in the README is a benchmark section showing where it wins and where it does not. My guess is long GPU bound transcodes look great and tiny file churn workloads probably do not. Having that boundary spelled out would make adoption a lot easier.

I am thinking of adding a Windows application with an installer and a tray icon that you can use for some basic settings like changing port or password, or toggling automatic startup.

For linux, I am thinking of adding convenience helpers around systemd service installation

Very cool. Peertube supports remote runners [1] [2], might take a look for inspiration. As a distributed compute enthusiast, big fan of of this model for media processing.

[1] https://docs.joinpeertube.org/maintain/tools#peertube-runner

[2] https://docs.joinpeertube.org/admin/remote-runners

Very cool! Thank you for sharing! I didn't know this existed, so now I'm curious how they solved it :)

My usecase is just-in-time media transcoding, I'll see if PeerTube remote runners support it

Very nice. Could this concept be applied to have ffmpeg running in webassembly, and over HTTP?