FFmpeg to Google: Fund Us or Stop Sending Bugs

als@lemmy.blahaj.zone · 22 days ago

FFmpeg to Google: Fund Us or Stop Sending Bugs

solrize@lemmy.ml · edit-2 21 days ago

Yes, but the FFMPEG developers do not know this until after they triage all the bug reports they are getting swamped with.

With a concrete bug report like “using codec xyz and input file f3 10 4d 26 f5 0a a1 7e cd 3a 41 6c 36 66 21 d8… ffmpeg crashes with an oob memory error”, it’s pretty simple to confirm that such a crash happens. The hard part is finding the cause and fixing it. I had understood the bug search to be fuzzing controlled by AI so I referred to it as fuzzing. Apparently though the AI is also writing the bug report now, so yeah ok, maybe there is potential slop there.

“Don’t ship software with vulnerabilities” sounds good in a vacuum,

I said KNOWN vulnerabilities. Make it known vulnerabilities without known mitigations if you prefer.

I wrote a few of those GNU coreutils that the Rusties are now rewriting. I don’t remember hearing of any CVE’s connected with any of them, though that is mostly because they are uncomplicated.

Here’s all the Debian security advisories for the past year or so. There aren’t THAT many, and they are mostly in complicated network programs, the Linux kernel, etc. Also a lot aren’t actual vulns: https://www.debian.org/security/

This was the first search hit about ffmpeg cve’s, from June 2024 so not about the current incident. It lists four CVE’s, three of them memory errors (buffer overflow, use-after-free), and one off-by-one error. The class of errors in the first three is supposedly completely eliminated by Rust. No idea about the fourth. Not claiming that a Rust reimplementation of ffmpeg is anywhere near feasible. Dunno if the current set of CVE’s are comparable but it’s a likely guess. Anyway, as SPJ likes to say about Haskell’s type system, the idea is to stop fixing bugs one by one, and instead eliminate entire classes of bugs. We can’t fix everything but we can certainly do better than we are doing now.

I saw earlier you mentioned google keeping vulnerabilities secret, and using them against people or something like that,

That was listed as an example of what not to do, not a proposal of an approach to take.

moonpiedumplings@programming.dev · 21 days ago

With a concrete bug report like “using codec xyz and input file f3 10 4d 26 f5 0a a1 7e cd 3a 41 6c 36 66 21 d8… ffmpeg crashes with an oob memory error”, it’s pretty simple to confirm that such a crash happens

Google’s big sleep was pretty good, it gave a python program that generated an invalid file. It looked plausible, and it was a real issue. The problem is that literally every other generative AI bug report also looks equally as plausible. As I mentioned before, curl is having a similar issue.

And here’s what the lead maintainer of curl has to say:

Stenberg said the amount of time it takes project maintainers to triage each AI-assisted vulnerability report made via HackerOne, only for them to be deemed invalid, is tantamount to a DDoS attack on the project.

So you can claim testing may be simple, but it looks like that isn’t the case. I would say one of the problems is that all these people are volunteers, so they probably have a very, very limited set of time to spend on these projects.

This was the first search hit about ffmpeg cve’s, from June 2024 so not about the current incident. It lists four CVE’s, three of them memory errors (buffer overflow, use-after-free), and one off-by-one error. The class of errors in the first three is supposedly completely eliminated by Rust.

FFMpeg is not just C code, but also large portions of handwritten, ultra optimized assembly code (per architecture, too…). You are free to rewrite it in rust if you so desire, but I stated it above and will state it again: ffmpeg made the tradeoff of performance for security. Rust currently isn’t as performant as optimized C code, and I highly doubt that even unsafe rust can beat hand optimized assembly — C can’t, anyways.

(Google and many big tech companies like ultra performant projects because performance equals power savings equals costs savings at scale. But this means weaker security when it comes to projects like ffmpeg…)

solrize@lemmy.ml · edit-2 21 days ago

Have any of the google-submitted vulnerability reports turned out to be invalid? Project Zero was pretty well regarded.

Yes I know about the asm code in ffmpeg though IDK if it’s doing anything that could lead to a use after free error. I can understand if an OOB reference happens in the asm code since codecs are full of lookup tables and have to jump around inside video frames for motion estimation, but I’d hope no dynamic memory allocation is happening there. Instead it would be more like a GPU kernel. But, I haven’t examined any of it.

Anyway there’s a big difference between submitting concrete input data that causes an observable crash, and sending a pile of useless spew from a static analyzer and saying “here, have fun”. The Curl guy was getting a lot of absolute crap submitted as reports.

From the GCC manual “bug criteria” section:

If the compiler gets a fatal signal, for any input whatever, that is a compiler bug. Reliable compilers never crash.

That sounds like timelessly good advice to me.

moonpiedumplings@programming.dev · 21 days ago

Project Zero

Project zero was entirely humans though, no GenAI. Project big sleep has been reliable so far, but there is no real reason for ffmpeg developers to value project big sleeps 6.0 CVE’s over potentially real more critical CVEs. The problem is that Google’s security team would still be breathing down the necks of these developers and demanding fixes for the vulns they submitted, which is kinda BS when they aren’t chipping in at all.

Anyway there’s a big difference between submitting concrete input data that causes an observable crash, and sending a pile of useless spew from a static analyzer and saying “here, have fun”

Nah, the actually fake bug reports also often have fake “test cases”. That’s what makes the LLM generated bug reports so difficult to deal with.

solrize@lemmy.ml · 21 days ago

6.0 is pretty serious according to the rubric. Are there some worse ones? Yes Google is acting obnoxious per your description. It makes no sense to me that they are balking about supplying some funds. They used to be fairly forthcoming with such support.

I can imagine a CI system for bug reports where you put in the test case and it gets run under the real software to confirm whether an error results, if one has been claimed. No error => invalid test case.

TehPers@beehaw.org · edit-2 20 days ago

Rust currently isn’t as performant as optimized C code, and I highly doubt that even unsafe rust can beat hand optimized assembly — C can’t, anyways.

A bit tangential, but to answer this question, nothing beats the most optimized assembly code. At best, programming languages can only hope to match the most optimized assembly.

Rust does have macros for inlining assembly into your program, but it’s horribly unsafe and not super easy to work with.

Rewriting ffmpeg in Rust is not a solution here (like you’re saying).