A quick update on using Claude Fable 5 for research
Building on the last post on Anthropic's new Mythos-class model, I wanted to push Fable 5 to its limits with the research goal I set it. This is what I got.
I guess it was inevitable: having tried out Anthropic’s new “Mythos-class” Fable 5 AI model with a single-shot research prompt while on the way home from vacation a couple of days ago, I couldn’t resist the temptation to see what it could do given greater latitude, and a lot more tokens!
This is what happened.
My previous post set out to see how far Fable would get with a simple one-shot prompt that asked it to research and write a paper about the impact of Mythos-class AI models on higher education. The intent was not to generate ground-breaking new knowledge (one-shot prompts are definitely not the way to go here), but rather to see just how far Fable could get on its own.
You can read the resulting paper here. It’s superficially very good. The hypotheses are worthy of consideration, the reasoning is not bad, the research and the paper’s coherence and through-line are very good, the citations are not hallucinated, and many of the arguments made leverage prior work reasonably well.
But it is still flawed. The research tends to rely on secondary sources rather than original papers. There are questions around the validity of the insights drawn from published work in some cases. It’s not clear how original and insightful the research truly is. And Fable’s writing, while an improvement on previous models, still errs on the side of homogenized style over substances.
There was enough here though to make me wonder what would happen if I passed the task over to Fable 5 in Claude Code, where it could fully lean into its capabilities.
And so I set up a new project in Claude Code using Fable 5 in “ultracode” mode, passed on the original prompt and paper — together with a coupe of critical and detailed reviews from separate sessions using Fable, as well as my own feedback — included a folder of documents cited in the original paper along with a coupe of additions of my own, and asked it to “write a rigorously researched and defensively argued preprint” based on the supplied material, without further direction from me.
Watching Fable work based on what I asked of it in Claude Code was fascinating. The model devised a plan of action that would put many PhD students to shame. It developed tentative hypotheses then tested and modified them. It spawned tens of sub-agents to do research, test claims, write and rewrite drafts, and a lot more. It downloaded and audited primary sources as necessary. And it folded in layer after layer of checks and balances to ensure what it was doing was grounded in what is known and what is defensible.
This, I must admit, all came at a cost. I was using Anthropic’s $200 a months Max plan and, ironically maxed it out — and had to purchase more credits to complete the project. (Fable even let me know I needed to do this!) According to Fable’s own audit, the project used over 80 agents, called on the use of 2000 tools, verified over 120 primary sources (including downloading and auditing them), ran for nearly 15 hours on the task-clock, and consumed over 8 million tokens.
And, according to the audit, nearly half of the token usage was spent on verification — by Fable’s choice, remembering that I set the goal then let it run, but didn’t specify how it met the goals.
The “preprint” that Fable produced was certainly interesting. But in a good way.
In being given autonomy to pursue what I’d asked of it, Fable interpreted its goal to both research the original question — using the first paper as a starting point — and to reflect on its own role in the process. As a result, the preprint it produced was deeply reflective, with meta-layers lying above the academic substance.
This is not what I expected. Fable produced a dense, 53 page document that explicitly refers to the first paper and prompt, and that documents in depth its own processes and decisions. It is also academically quite rigorous — although this took some time to ascertain given the density and terseness of the writing. At this point I had not provided any instructions on writing style and so this is “raw Fable” and not particularly human reader-friendly.
This preprint (it’s really an unvarnished research report) underwent one iteration (ending up as version 3, with the original one-shot prompt preprint being version 1), following me paying for additional tokens to complete a couple of failed tasks, and providing a handful of PDFs that Fable couldn’t access directly. Other than this, I was completely hand-off on its production.
This “preprint” is not the end of this story, as I was looking for something more polished and less self-referential. But I’m including it here as it provides an instructive insight into the rigor behind Fable’s work. It’s hard going reading it, but worth it if you want to dig under the hood of what Fable did, and why:
(Note that it refers to a number of associated Fable-generated files which aren’t included here, but that I have access to, and are in themselves instructive as to just how rigorous the process was).
Building on this, I asked Fable to write a stand-alone paper that was true to the original one-shot prompt, that didn’t including the self-referential meta-layer, and that retained the full rigor or the preprint.
And here I did intervene, just a little, because the writing style of the first iteration was awful. Technically, the content was solid. But it took so much effort and energy to digest that it reading it was painful.
And so over two further iterations I asked Fable to rewrite and re-format the paper in a way that would make it easier for human readers to digest — using its own discretion in how it interpreted this (although I did provide some guidance on what a reader like me finds palatable versus hard going). Through these I was very explicit about not losing any of the academic rigor, and the need to check this.
This was the result:
And here I have a bit of a problem, but not one you might expect.
Even on a quick read, this paper is substantially better than the one produced from the one-shot prompt. By a long way.
After the “legibility” rewrite (and to be clear, this still reads like an AI paper, although in this case the substance is more important than the style) the hypotheses it puts forward, the claims it makes, the reasoning it backs them up with, the evidence it presents, the coherence and reasoning underlying everything, stand up to considerable scrutiny.
But because of the quality and extent of the “intellectual labor” represented by the paper, it takes a substantial level of human expertise and intellectual labor to evaluate it.
This is not a paper that can be read and evaluated in a few minutes, or one that can be substantially assessed by someone without considerable knowledge that spans multiple fields. And here, I even find myself reaching my own limits in assessing it’s rigor and validity — and the value — of the insights it offers. And my expertise intersects pretty closely with the work.
If I’m being honest, I would need to spend days with this — probably more — before I was sure in myself where the value of the work lies.
On one hand, this is what I would expect from substantial intellectual work written by accomplished human researchers. On the other though, it does highlight substantial questions around how AI-generated research is evaluated when increasingly few humans have the expertise or intellectual capacity to fully understand and assess it.
It also suggests that any quick responses to AI-generated work like this are either coming from genius-class humans, are themselves the product of AI, or are not based on knowledgeable assessment.
Where this leaves us, I’m not sure — especially as this one example is almost definitely a relatively poor reflection of emerging capabilities. But it does suggest — just as the paper itself does — that agentic frontier systems like Fable 5 will increasingly challenge how we navigate the intersection of AI, expertise, and knowledge generation.
Not because AI is in some way “smarter” or “better” than us. But because it is getting so good at emulating the processes through which new knowledge is constructed and tested — while doing this at speed and with access to prior knowledge at a scale and depth that far transcends human capabilities — that mere humans are going to find it increasingly hard to keep up.
Postscript
This was very much written to stimulate informed conversation. Frontier AI models and systems are still very much a moving target, and there’s a serious risk — as many commentators have noted — of falling for the illusion that these models are more capable than they actually are. This is an inherent risk with a technology which has a mastery of language that is potentially capable of slipping by our critical reasoning and persuading us of things that don’t hold up to scrutiny.
And yet, it would be foolish to discount emerging capabilities around autonomous AI research and knowledge generation. Just as it would be foolish to ignore the consequences of these capabilities to the roles of learning and education — especially higher education — in a world built on the assumption that intelligence, expertise, and new knowledge, and valuable because they are scarce.
The paper Fable wrote addresses this directly. And while this post is primarily about the process, I would strongly encourage anyone who takes the future of higher education seriously to read it.
Of course, the paper may be little more than smoke and mirrors, which is where the conversation it spawns is so important — as long as it is based on informed expertise and reason and not assumption.
But my sense is that we are seeing the emergence of capabilities that have the capacity to both challenge and extend how we think about knowledge, research, learning, and value-creation — and ultimately, what it means to thrive as humans in an age of AI.
If true, this should be an absolute top priority for any university that takes student success and the future of human flourishing seriously — and certainly far more seriously than incessant conversations around pre-2023 level AI capabilities.
Especially as we are potentially at the edge of a precipice where AI systems are capable of generating new knowledge and insights faster than we are currently capable of validating and even understanding them — or their consequences.




