A quick look at Mythos run on Firefox: too much hype?

When Anthropic published its Mythos announcement, it really seemed impressive at first, almost worrying. But when reading thoroughly, the public evidence is less clean than the headline effect. The often-cited "under $20,000" figure does not mean Mythos casually found one devastating bug for that price; in Anthropic's own writeup, that budget covered a large search process with roughly a thousand scaffolded runs and several dozen findings. That is still notable, but it is a very different claim from the dramatic version people repeat. Mozilla followed with a post about using Mythos identifying a large number of AI-found issues in Firefox 150, and it also seems to push the narrative in the same direction: AI has arrived for vulnerability research. I mean, the latter post is entitled "The zero-days are numbered".

Although it looks like a bold take, that may be true. But the public evidence does not support the strongest version of that claim, and unless you are working for one of the chosen (by Anthropic), it is not simple to figure out if these public claims are just marketing or if they are a real game changer.

The interesting question is not whether Mythos found bugs. It clearly did. The interesting question is what kind of bugs were found, how serious they were, and whether those findings actually change the balance between defenders and attackers.

I spent a few hours going through the Firefox commit history, advisory references, and linked bugs to get a better sense of what Mozilla's numbers really mean. This is not a full audit of every patch, but it is enough to form a more grounded view than the marketing cycle usually allows.

The claim

Mozilla reported that 271 vulnerabilities were identified in Firefox 150 associated with Mythos. At the same time, the Firefox 150 security advisory does not map that claim to a single clean list of 271 Firefox-only bug IDs. It contains many individual CVEs from different reporters, including at least three entries explicitly credited to Anthropic, as well as several aggregated "memory safety bugs" entries:

Those four entries alone link to hundreds of bugs. That should immediately make anyone cautious about reading the headline number too literally. A large AI-assisted cleanup campaign can still be important without every individual fix representing a directly exploitable, high-end vulnerability. The linked bug counts here are 1, 55, 154, and 107 respectively, which makes 317 in total. But that still should not be compared directly to Mozilla's "271 vulnerabilities identified" claim, because the aggregated CVE buckets also cover Thunderbird and ESR releases, not just Firefox 150.

There is also a basic accounting problem here: Mozilla's 271 figure, Bugzilla bug IDs, advisory CVEs, and individual commits are not the same unit. Publicly, you can reconstruct pieces of the picture, but not a single authoritative Firefox-only list that cleanly explains the 271 number. That does not mean Mozilla is wrong. It means outsiders should be careful not to over-interpret the advisory as if it were a perfect ledger of the claim.

What the data suggests

I vibecoded a small tool to group commits, bugs, CVEs, and touched subsystems as well as displaying some statistics. I also made a poor attempt at trying to score the bugs depending on keywords found, in order to prioritize which bugs would look like actually actionable. You can use it to quickly browse through commits, and even get my scripts sources at the end of the summary for reproducibility.

Even if you ignore the exact totals, the shape of the data is informative:

Hundreds of commits and bug references are involved.

The touched code is spread across major Firefox attack surface areas like dom , gfx , netwerk , js , layout .

, , , , . The patch set mixes obvious safety fixes, defensive cleanups, lifecycle hardening, API usage tightening, and some changes that look closer to real exploit primitives.

As part of the CVEs, some patches seem to be not security related (e.g. avoiding null dereference) although relevant for the program stability.

That distinction matters. "Found a bug" is not the same statement as "found an exploitable vulnerability", and it is definitely not the same statement as "found a weaponizable chain component".

In browser exploitation, there is a wide spectrum between:

a harmless correctness bug,

a crash-only bug,

a bug that creates a memory corruption primitive,

and a bug that survives into a reliable exploit chain.

If you collapse that spectrum into a single headline number, you get attention, but you lose precision.

Stats between tags FIREFOX_BETA_149_END and FIREFOX_BETA_150_END

I'm using these tags as a rough release window, not as a precise Mythos boundary. That distinction matters. The stats below describe the Firefox 150 development interval broadly, and not a cleanly isolated set of Mythos-derived fixes. So they are useful for showing scale and patch distribution, but they should not be read as "these are the 271 Mythos vulnerabilities".

Commits: 6,115

Bug IDs: 3,209

High-Priority Candidates: 252

Bugs with (high) CVE: 301 (counting non-mythos CVEs as well)

Commits with (high) CVE: 340

Changed lines: 3,438,679

Median lines / commit: 52

Mean lines / commit: 562.34

Largest patch: 480,735

Commits with crashtest: 47

We can also notice that many commits associated with those bugs predate the Anthropic post by days or weeks, with an obvious pike on April 2. That is not surprising. Advisory aggregation happens late, and some fixes that end up grouped under a release CVE were clearly authored earlier, for example on March 5.

Are these "real vulnerabilities"?

This depends on the standard you care about.

If you are a defender, the answer is straightforward: yes, broadly speaking, many of these fixes matter. Memory-safety issues, lifetime mistakes, race conditions, incorrect ownership, and serialization problems are exactly the kinds of patterns that defenders want removed before an attacker gets to them. Even when a bug is not independently exploitable, it can still reduce safety margins or become useful when combined with another issue, think of e.g. a relative or arbitrary read primitive.

If you are thinking like an attacker, the bar is higher. A bug is only truly interesting if it buys leverage: control of memory, type confusion, privilege boundary crossing, sandbox escape, or something else that materially advances exploitation. By that standard, a lot of the published fixes look more like hardening and bug debt reduction than obvious exploit gold.

That is not a criticism. Hardening is good. But it is not the same thing as proving that a model is now outperforming top offensive researchers at finding high-value browser chains.

This brings me to the context of a vulnerability. For a defender, a vulnerability is a vulnerability regardless of its exploitability context. When it comes to browsers, there are attack surfaces hidden behind additional user interactions, or very specific setups, runtime options, and more, which would not be reliably actionable to weaponize a vulnerability. As an attacker, you would typically never spend effort on such surface.

What stands out in the patch set

A quick pass through the linked fixes shows several recurring categories:

reference lifetime fixes,

ownership and cleanup corrections,

race-condition and async teardown fixes,

bounds checks and integer handling,

safer serialization and IPC handling,

upstream library updates and vendor syncs.

Some of those are exactly where dangerous bugs come from. Others are better understood as preventative maintenance that happened to be triggered by large-scale automated review.

This is why one issue such as 2014596 for CVE-2026-6746 stands out more than the giant aggregate CVE buckets. A concrete use-after-free is easy to reason about as a potentially exploitable security issue. A long list of "memory safety bugs fixed" is directionally important, but analytically much weaker unless you inspect the individual bugs.

What Mythos seems good at

The strongest charitable reading of the Firefox 150 data is this:

Mythos appears to be very good at surfacing suspicious patterns at scale.

That is already valuable. A model that can find cleanup bugs, lifetime hazards, API misuse, unsafe assumptions, and latent memory-safety issues across a codebase the size of Firefox is useful even if only a fraction of those findings are directly exploitable. For a defensive team, that can translate into faster hardening, broader code review coverage, and less time wasted on manual triage. Publicly, that is the part that looks well supported.

This is probably the most important practical outcome. Security teams do not need a model to independently invent a full exploit chain for it to have significant value.

However, its value is not clear compared to other LLMs, if you tried yourself to run any model at finding bugs in a codebase, or even wrote your own agents, you most certainly are confident that it would warn you for most of the patterns found by Mythos. Take Google Big Sleep for instance, there is a chance it has been way more relevant than Mythos already, and there hasn't been such dramatical announcements.

What remains unproven

The offensive claim is much harder to support.

From the public evidence, we still do not know how many tokens, runs, and analyst-hours were required, how much human filtering was needed, how many findings were duplicates or low-value crashes, how Mythos compares to other strong models on the same targets, and how many of the fixed bugs would have materially mattered in a real exploit-development context.

I'm sure Mozilla did not even spend time to prove exploitability, nor did Mythos provide a PoC for them (although some commits include crashtests). Without knowing the actually exploitable bugs count, it is hard to call this a security revolution rather than a successful large-scale bug-mining campaign.

And the distinction is important because browser security is not measured by the number of bug fixes, it is measured by whether attackers lose meaningful capabilities. And that is not yet obvious here.

Defender relevance vs attacker relevance

This is where I currently land.

For defenders, Mythos looks relevant right now. Even if many of the findings are "just" stability issues, suspicious cleanup bugs, or latent memory-safety hazards, removing them improves the codebase and reduces future opportunity for attackers. However, I have doubts it would produce the similar results on what I think are more robusts codebases, and I am eager to find out if Mythos produce similar results on Apple Safari or Google Chrome. If I had to bet, I would say it won't :)

For attackers, the story is less convincing. Nothing in Mozilla's disclosure alone proves that Mythos has suddenly erased the usual offensive edge. If anything, the public evidence suggests that AI is currently easier to defend as broad hardening support than as proof of singular, decisive exploit discovery.

That is also why I would treat public attacker claims separately from Mozilla's numbers. For example, one team publicly stated that their RCE and sandbox escape chain was still alive after the release. That is not strong evidence by itself, but it is a useful reminder that "many fixes landed" is not the same statement as "the offensive problem is solved".

That may change. But this Firefox release does not prove it has changed already.

Conclusion

The Firefox 150 data suggests a tool that is genuinely useful for defensive security work, especially at scale, but the public record does not justify the strongest claims people want to make from it. The headline number is impressive, yet it bundles together bugs of very different significance and does not publicly resolve into a clean accounting.

So my current view is simple:

as a defensive assistant, Mythos looks credible;

as evidence of a dramatic offensive breakthrough, the Firefox case is still weak;

and as usual with AI security announcements, the most interesting part is hidden in the operational details we do not get to see.

Stay safe out there, read through the lines, beware of the hype posts and don't fall for the narrative they want to push.

Appendix

Appendix A: Playing the game of "is it exploitable"

I gave a try myself at deciding if the bugs were relevant. It is a good exercise to do when willing to learn about an attack surface. Take my comments with mountains of salt.

[...] nsSSLIOLayerHelpers::~nsSSLIOLayerHelpers() { - Preferences::RemoveObserver(this, "security.tls.version.fallback-limit"); - Preferences::RemoveObserver(this, "security.tls.insecure_fallback_hosts"); + // Pref observers must have been removed before destruction, since the + // destructor may run off the main thread. + MOZ_ASSERT(!mRegisteredPrefObservers, + "Pref observers should have been removed before destruction"); } [...]

This change suggests that the nsSSLIOLayerHelpers object may be deleted on a separate thread, while the nsSSLIOLayerHelpers::GlobalCleanup() is meant to run on the main thread. If true this possibly leads to a thread affinity bug as Preferences::RemoveObserver is meant to run on the main thread as well.

From afar, this sounds possibly exploitable, but hard to tell without inspecting the actual thread activity that happens with the Preferences class, raceability window, etc.

It seems GlobalCleanup is only called for when Firefox is shutdown, in which case that means we would have to keep a TLS socket open while the user closes the browser, making it obviously not weaponizable.

Exploitability: Maybe

Context: Unrealistic

diff --git a/dom/media/webrtc/MediaEngineFake.cpp b/dom/media/webrtc/MediaEngineFake.cpp index f59c37f0587aa..8123aa05e55a0 100644 --- a/dom/media/webrtc/MediaEngineFake.cpp +++ b/dom/media/webrtc/MediaEngineFake.cpp @@ -151,7 +151,6 @@ MediaEngineFakeVideoSource::CreateFrom( const MediaEngineFakeVideoSource* aSource) { auto src = MakeRefPtr<MediaEngineFakeVideoSource>(); *static_cast<MediaTrackSettings*>(src->mSettings) = *aSource->mSettings; - src->mOpts = aSource->mOpts; return src.forget(); }

The MediaEngineFakeVideoSource is a fake video source that can be used when doing tests with webrtc when one does not have an actual camera to plug into the source feed. The patch is very simple: when cloning the video source, it does not copy the original source options, which is a per-instance stale data. This copy is not useful as the video sources options are meant to be initialized during allocation later on. However, if the source is used without going through the Allocate path later on, this could be a problem. I consider aSource not freed as the copy on mSettings was kept, thus it seems it would only contain options previously allocated through a normal path.

I feel like if this leads to memory corruption, it would easily be found by fuzzing the MediaDevices API.

Exploitability: Low

Context: Realistic

This commit adds null checks before calling methods on a pointer.

Exploitability: None

Context: N/A

diff --git a/js/src/wasm/WasmIonCompile.cpp b/js/src/wasm/WasmIonCompile.cpp index 0e0df7c0eef77..19d78084d452d 100644 --- a/js/src/wasm/WasmIonCompile.cpp +++ b/js/src/wasm/WasmIonCompile.cpp @@ -5544,7 +5544,8 @@ class FunctionCompiler { MInstruction* dstData = MWasmLoadField::New( alloc(), dstArrayObject, nullptr, WasmArrayObject::offsetOfData(), mozilla::Nothing(), MIRType::WasmArrayData, MWideningOp::None, - AliasSet::Load(AliasSet::WasmArrayDataPointer)); + AliasSet::Load(AliasSet::WasmArrayDataPointer), + mozilla::Some(trapSiteDesc())); if (!dstData) { return false; } @@ -5553,7 +5554,8 @@ class FunctionCompiler { MInstruction* srcData = MWasmLoadField::New( alloc(), srcArrayObject, nullptr, WasmArrayObject::offsetOfData(), mozilla::Nothing(), MIRType::WasmArrayData, MWideningOp::None, - AliasSet::Load(AliasSet::WasmArrayDataPointer)); + AliasSet::Load(AliasSet::WasmArrayDataPointer), + mozilla::Some(trapSiteDesc())); if (!srcData) { return false; }

I don't know enough about SpiderMonkey's JIT to tell if that would be exploitable or not.

Before this commit, the two loads were created as plain movable field loads. In MWasmLoadField , that means “no trap metadata” and the instruction is treated as movable; with trap metadata present, it becomes a guard instead ( js/src/jit/MIR-wasm.h:2754 ). That matters because wasm field loads from object pointers can fault on null, and the backend uses maybeTrap() to attach the correct wasm trap site to the emitted faulting instruction ( js/src/jit/Lowering.cpp:8731 , js/src/jit/CodeGenerator.cpp:10684 ).

I'll let you decide.

Exploitability: ?

Context: Realistic

Appendix B: Complete commit list where bugs are associated to a CVE, sorted per component