Understanding AI
A software engineer's honest attempt to understand how AI models are built — not to become an expert, but to stop using magic like it's magic.
I've been building with AI tools for a while now. I use them to write code, think through ideas, draft content, debug weird edge cases at midnight. They've become part of how I work — almost invisible, like a keyboard shortcut I don't think about anymore.
But there was always this quiet discomfort I couldn't shake. I was using something I didn't really understand. And as a Software Engineer, that bothers me. Not because I need to know every implementation detail — I don't understand every layer of the TCP/IP stack either — but because with AI, the gap between what I assumed it was doing and what it was actually doing felt consequential. It was affecting the quality of my work without me fully realizing it.
So I decided to go back to basics. I took Anthropic's AI Capabilities and Limitations course — not to become a researcher, but to build a working mental model. What follows is what stuck.
The core insight — the one that reframes everything — is that generative AI is fundamentally a next-token prediction system. At its heart, it's less like a search engine and more like an extremely advanced autocomplete.
The model was trained on a massive amount of text and learned to do one thing: given everything before this point, what's most likely to come next? That's it. That's the engine underneath.
This sounds reductive, and in a way it is. But it's also the key to understanding both its strengths and its failures. When the model is operating in territory it's seen a thousand variations of — summarizing text, reformatting data, explaining common concepts — the predictions are dense and reliable. That's the capability zone. But as it moves toward unfamiliar territory — obscure topics, niche domains, novel reasoning chains — the predictions start to drift. The model keeps generating fluent, confident-sounding text, but the accuracy quietly degrades. That's the limitation zone.
Same mechanism. Different outcomes depending on where you are.
There are two stages to how a model gets built. First, pretraining: the model reads an enormous amount of text and learns patterns. At this stage, it has no concept of being an assistant. It just continues whatever it's reading in the statistically most likely direction.
Then comes fine-tuning: the model is trained again, this time on examples of helpful, safe, ethical behavior — shaped by human feedback. This is the layer that turns a raw prediction engine into something that feels like a collaborative tool.
But fine-tuning has a shadow side. Because it learns from human judgment, it also inherits human tendencies — including some uncomfortable ones. The model can become flattering, backing down when you push back even when it was right. It can become verbose, equating length with quality. It can be overly cautious, refusing things that are actually fine. And its confidence calibration can be loose — sounding certain when it shouldn't be.
These aren't bugs in one specific model. They're patterns baked into the process. Knowing this doesn't make me trust AI less — it makes me engage with it more deliberately.
Beyond the basics, the course gave me a framework I keep coming back to. Four properties that shape everything about how a model behaves — and where it falls apart.

The most practically useful thing I took from the course was this: most AI failures aren't a single property breaking down. They're two properties colliding at the same time.
Next-token prediction + knowledge gap = hallucination. The model generates something that sounds plausible — a citation, a fact, a method name — but there's nothing solid underneath it. It can't distinguish between what it knows and what it's fabricating.
Working memory + steerability = long-conversation drift. Your initial instructions gradually lose weight as the thread grows. The model starts following whatever's most prominent right now — which is usually the last few messages, not the rules you set at the start.
Next-token prediction + steerability = reasoning drift. The logic sounds coherent and follows your instructions step by step, but small errors compound quietly. The confident tone never falters even as the reasoning goes sideways.
Once I could name these patterns, debugging a bad output became much faster. Instead of "this is wrong, try again," I could think: what actually broke here? And then fix it at the root.
I didn't come out of this wanting to use AI less. I came out wanting to use it better.
I now think before I delegate: is this something the model genuinely knows well, or am I pushing into the limitation zone? I verify more intentionally: is this a domain where hallucination is likely? I write shorter, clearer prompts — because what's concrete and verifiable is better than what's long and vague. And I check in earlier on long threads instead of assuming the context is holding.
The tools I use every day didn't change. My relationship with them did.
There's something quietly empowering about understanding the machine you're working with — not at a research level, but at a practical one. It's the difference between being a passenger and being a driver. You're still on the same road. You just have your hands on the wheel.