> Anthropic researcher Julia Merz dismissed AISLE’s approach as: “We took the needle the model found, isolated the relevant handful of the haystack, and then gave it to a small child, who found the needle as well.”
(I'm the Founder and Chief Scientist of AISLE)
This was an a priori good counter-argument, but as I argued even in my original piece, you could easily, due to the lower cost, run LLMs literally on **every single line of code**. This would be equivalent to looking for the needle in the haystack in small chunks of it, where each chunk would be searched by a person in parallel.
You could argue that if you did this, you'd get too many false positive candidate vulnerabilities. To that I'd say the following:
a) this is already basically what Mythos is doing even according to their own report (modulo some quick prioritization based on a heuristic estimate)(https://red.anthropic.com/2026/mythos-preview/) where they literally say:
> instead of processing literally every file for each software project that we evaluate, we first ask Claude to rank how likely each file in the project is to have interesting bugs on a scale of 1 to 5
> We start Claude on the files most likely to have bugs and go down the list in order of priority.
People's reaction to the announcement was as if you told the LLM "go find bugs" and it just did it. In fact there is a heavy scaffold applied: it just goes through files one by one (or in parallel) and investigates. Which is much less mysterious but much closer to how any such zero-day discovery production system must operate.
b) I actually went along and built a simple, extremely parallel version of a zero-day scanner that I call `nano-analyzer`. I open-sourced it here https://github.com/weareaisle/nano-analyzer for the benefit of the community. With it:
1) I was able to reliably re-discover the Anthropic Mythos CVE-2026-4747 from FreeBSD with models as small as 3.6B = models so tiny, that they can't even follow output formatting instructions well.
2) I discovered a number of previously unknown kernel bugs that are now being addressed by the maintainers both in the open () as well as via non-public responsible disclosure.
The timing is what makes this hard to defend. You could construct a principled argument about vendor diversity or procurement risk - but Mythos drops the same week, and suddenly the ban isn't abstract. It's a real capability gap at the exact moment that capability became relevant. The argument that OpenAI fills the space doesn't hold because the risk was never "any AI vendor." It's about one vendor's specific threat models and research bets. Consolidating to a single provider doesn't reduce dependency - it just changes whose dependency it is.
The timing is what makes this hard to defend. You could construct a principled argument about vendor diversity or procurement risk - but Mythos drops the same week, and suddenly the ban isn't abstract anymore. It's a real capability gap at the exact moment that capability became relevant. The argument that OpenAI fills the space doesn't hold because the risk was never 'any AI vendor.' It's about having one vendor's specific threat models and one vendor's specific research bets. The DoD just made itself dependent on both.
Does anyone believe Open AI's promises or guarantees? They're all liars. They get that from the top, their king - Trump. When he fails, I hope to god they do too.
> Anthropic researcher Julia Merz dismissed AISLE’s approach as: “We took the needle the model found, isolated the relevant handful of the haystack, and then gave it to a small child, who found the needle as well.”
(I'm the Founder and Chief Scientist of AISLE)
This was an a priori good counter-argument, but as I argued even in my original piece, you could easily, due to the lower cost, run LLMs literally on **every single line of code**. This would be equivalent to looking for the needle in the haystack in small chunks of it, where each chunk would be searched by a person in parallel.
You could argue that if you did this, you'd get too many false positive candidate vulnerabilities. To that I'd say the following:
a) this is already basically what Mythos is doing even according to their own report (modulo some quick prioritization based on a heuristic estimate)(https://red.anthropic.com/2026/mythos-preview/) where they literally say:
> instead of processing literally every file for each software project that we evaluate, we first ask Claude to rank how likely each file in the project is to have interesting bugs on a scale of 1 to 5
> We start Claude on the files most likely to have bugs and go down the list in order of priority.
People's reaction to the announcement was as if you told the LLM "go find bugs" and it just did it. In fact there is a heavy scaffold applied: it just goes through files one by one (or in parallel) and investigates. Which is much less mysterious but much closer to how any such zero-day discovery production system must operate.
b) I actually went along and built a simple, extremely parallel version of a zero-day scanner that I call `nano-analyzer`. I open-sourced it here https://github.com/weareaisle/nano-analyzer for the benefit of the community. With it:
1) I was able to reliably re-discover the Anthropic Mythos CVE-2026-4747 from FreeBSD with models as small as 3.6B = models so tiny, that they can't even follow output formatting instructions well.
2) I discovered a number of previously unknown kernel bugs that are now being addressed by the maintainers both in the open () as well as via non-public responsible disclosure.
You can read on it here: https://aisle.com/blog/system-over-model-zero-day-discovery-at-the-jagged-frontier
Tldr: You can very much compensate for lower intelligence-per-token with sheer throughput and parallelism in zero-day discovery.
The timing is what makes this hard to defend. You could construct a principled argument about vendor diversity or procurement risk - but Mythos drops the same week, and suddenly the ban isn't abstract. It's a real capability gap at the exact moment that capability became relevant. The argument that OpenAI fills the space doesn't hold because the risk was never "any AI vendor." It's about one vendor's specific threat models and research bets. Consolidating to a single provider doesn't reduce dependency - it just changes whose dependency it is.
The timing is what makes this hard to defend. You could construct a principled argument about vendor diversity or procurement risk - but Mythos drops the same week, and suddenly the ban isn't abstract anymore. It's a real capability gap at the exact moment that capability became relevant. The argument that OpenAI fills the space doesn't hold because the risk was never 'any AI vendor.' It's about having one vendor's specific threat models and one vendor's specific research bets. The DoD just made itself dependent on both.
Pete Hegseth will not admit he was wrong. But he might graciously allow Anthropic's return, hoping it slips by without drawing Trump’s attention.
Does anyone believe Open AI's promises or guarantees? They're all liars. They get that from the top, their king - Trump. When he fails, I hope to god they do too.