SubQ and 12 Million Context Windows Might Break AI Wide Open
06/05/2026
SubQ feels like one of the first real glimpses into what happens when AI stops forgetting everything every five minutes.
I think most people still fundamentally misunderstand what makes AI useful.
Everyone keeps obsessing over benchmarks, reasoning scores, IQ tests, coding tests, "this model beats PhD students", whatever. Which is cool I guess. But honestly the thing that keeps slapping me in the face lately is memory. Context. Continuity. That's the actual magic. Or at least the beginning of it.
SubQ announcing 12 million token context windows feels absolutely ridiculous when you stop and think about it for more than thirty seconds. Not "haha marketing ridiculous", I mean paradigm shift ridiculous. Like one of those moments where something quietly changes and everyone shrugs because the numbers sound abstract and nerdy, then three years later you realise that was the moment the entire direction shifted.
Because current AI still kinda has goldfish brain.
Not entirely obviously. It's insanely useful already. But even with ChatGPT, Claude, Gemini, Codex, all of them eventually hit the same wall where the model starts forgetting things, collapsing context, making weird assumptions because earlier information fell out of memory. Which then creates this whole industry of hacks trying to compensate for that weakness. RAG pipelines, embeddings, vector databases, chunking strategies, memory systems, summarisers summarising summarisers. Half of AI engineering right now honestly feels like desperately trying to stop the model from forgetting what the fuck it's doing.

And now suddenly we're talking about 12 million tokens.
That is an absolutely stupid amount of context.
Like genuinely stupid.
People hear "tokens" and mentally disconnect because nobody thinks in tokens. But we're basically talking millions of words. Entire documentation systems. Whole codebases. Company history. Every meeting transcript. Support tickets. Research archives. Internal wikis. Design discussions. Slack channels full of engineers arguing about architecture decisions from two years ago. Entire books. Massive piles of structured context all sitting directly inside the active reasoning window of the model.
Not retrieval either. That's the important part.
Actually there.
Actually usable while thinking.
That changes things in a way I don't think most people have processed yet.
I've already noticed this myself with AGENTS.md and SKILL.md workflows. The difference between a model with no context versus a model loaded up properly with references, documentation, workflows and project history is honestly absurd. Same exact model underneath. Same intelligence. Completely different output quality. It stops behaving like some chaotic junior developer that just drank six energy drinks and starts behaving more like somebody that's been inside the project for months.
That's why I keep saying context matters more than people think.
A lot more.
Because humans aren't really magical either when you boil it down. Most expertise is contextual memory combined with pattern recognition. A senior developer isn't usually just "better at coding". They're carrying years of context in their head. Old failures. Infrastructure limitations. Weird bugs. Political realities inside companies. Tradeoffs. Technical debt. Things that broke before. Systems that looked smart but turned into disasters six months later.

Context is expertise half the time.
Which is why this whole thing gets weird so fast once models stop forgetting everything every few thousand lines.
I don't even think the average person understands what coding is going to look like once these giant context windows become normal. Right now AI coding still feels slightly drunk sometimes. It writes good code, then forgets architecture decisions twenty prompts later. It fixes one thing and breaks another because the wider system fell out of context. It rewrites functions that already exist somewhere else in the repo because it lost track of them. Every developer using AI has seen this happen repeatedly.
Now imagine feeding the model: the entire monorepo, every API document, infrastructure configs, deployment history, bug reports, style guides, architecture discussions, previous tickets, security reviews, customer complaints and internal engineering notes all at once. Not searchable. Present.
That starts feeling less like autocomplete and more like another engineer sitting beside you who actually remembers the entire company.
Which honestly is both exciting as hell and slightly terrifying.
And weirdly enough I think this kills a huge amount of fake prompt-engineering nonsense too. Good. Honestly. I'm tired of seeing "YOU ARE A WORLD CLASS SEO NINJA WITH 45 YEARS EXPERIENCE" style prompting as if roleplaying with the AI suddenly unlocks hidden intelligence. Most of that stuff feels like astrology for tech bros. The thing that actually improves outputs consistently isn't pretending the AI is Gandalf, it's giving it useful context.
- Real examples.
- Real workflows.
- Real documentation.
- Real memory.
- Real project history.
That's why these giant context windows matter so much. They move AI away from prompt theatre and toward genuine knowledge systems.
The thing rattling around in my head lately is that I don't think future AI systems will feel stateless anymore. That's the actual psychological shift coming. Right now sessions still feel temporary. Disposable. You open ChatGPT, do some work, eventually the context degrades and you start over. Future systems probably won't work like that at all. They'll remember your projects, writing style, workflows, business structure, previous conversations, infrastructure setups, preferences, failures, all continuously over time.
Not because the AI is "alive" or any of that sci-fi nonsense. More because continuity creates capability.
A model that remembers the last thousand hours of work instantly becomes exponentially more useful than one that starts fresh every session. Humans work like this too. Imagine if your employees developed amnesia every afternoon. That's basically current AI systems.

And honestly I don't think businesses are remotely prepared for what happens once this limitation mostly disappears.
Because once memory becomes effectively infinite for practical workflows, the bottleneck shifts somewhere else entirely. Then it becomes about orchestration. Systems. Knowledge organisation. Workflow design. Persistent tooling. Which is probably why I've become obsessed lately with self-improving AGENTS.md systems, specialist workflows, persistent context layers and all this weird agentic infrastructure stuff. It feels less like prompting and more like building digital organisations.
SubQ feels like fuel dumped directly onto that fire.
The really crazy part is that everyone still evaluates AI improvements linearly. Slightly smarter model. Slightly bigger context. Slightly better reasoning. But these things multiply together, they don't add together. Better reasoning combined with massive memory combined with tools combined with persistence combined with workflow automation gets weird incredibly fast.
Like... really fast.
A mediocre model with perfect memory can outperform a smarter model with terrible memory in a shocking number of real-world situations. Add strong reasoning on top of giant context windows and suddenly you're getting systems that start looking uncomfortably capable.
Not AGI. I still think people throw that term around way too casually. But definitely something much bigger than chatbots.
And I genuinely think five years from now we'll look back at today's tiny context windows the same way we look back at dial-up internet now. Technically impressive for the time, but unbelievably primitive compared to what came after.