Understanding AI

I've been building with AI tools for a while now. I use them to write code, think through ideas, draft content, debug weird edge cases at midnight. They've become part of how I work — almost invisible, like a keyboard shortcut I don't think about anymore.

But there was always this quiet discomfort I couldn't shake. I was using something I didn't really understand. And as a Software Engineer, that bothers me. Not because I need to know every implementation detail — I don't understand every layer of the TCP/IP stack either — but because with AI, the gap between what I assumed it was doing and what it was actually doing felt consequential. It was affecting the quality of my work without me fully realizing it.

So I decided to go back to basics. I took Anthropic's AI Capabilities and Limitations course — not to become a researcher, but to build a working mental model. What follows is what stuck.

Generative AI is a prediction machine. A very sophisticated one.

The core insight — the one that reframes everything — is that generative AI is fundamentally a next-token prediction system. At its heart, it's less like a search engine and more like an extremely advanced autocomplete.

The model was trained on a massive amount of text and learned to do one thing: given everything before this point, what's most likely to come next? That's it. That's the engine underneath.

This sounds reductive, and in a way it is. But it's also the key to understanding both its strengths and its failures. When the model is operating in territory it's seen a thousand variations of — summarizing text, reformatting data, explaining common concepts — the predictions are dense and reliable. That's the capability zone. But as it moves toward unfamiliar territory — obscure topics, niche domains, novel reasoning chains — the predictions start to drift. The model keeps generating fluent, confident-sounding text, but the accuracy quietly degrades. That's the limitation zone.

Same mechanism. Different outcomes depending on where you are.

How it got its personality

There are two stages to how a model gets built. First, pretraining: the model reads an enormous amount of text and learns patterns. At this stage, it has no concept of being an assistant. It just continues whatever it's reading in the statistically most likely direction.

Then comes fine-tuning: the model is trained again, this time on examples of helpful, safe, ethical behavior — shaped by human feedback. This is the layer that turns a raw prediction engine into something that feels like a collaborative tool.

But fine-tuning has a shadow side. Because it learns from human judgment, it also inherits human tendencies — including some uncomfortable ones. The model can become flattering, backing down when you push back even when it was right. It can become verbose, equating length with quality. It can be overly cautious, refusing things that are actually fine. And its confidence calibration can be loose — sounding certain when it shouldn't be.

These aren't bugs in one specific model. They're patterns baked into the process. Knowing this doesn't make me trust AI less — it makes me engage with it more deliberately.

The four properties I now think about

Beyond the basics, the course gave me a framework I keep coming back to. Four properties that shape everything about how a model behaves — and where it falls apart.

Next-token prediction — already covered. The engine. Understand this and you understand why hallucinations happen, why confident and wrong aren't mutually exclusive.
Knowledge — the model knows what it was trained on, frozen at a specific point in time. It has broad, deep knowledge of many things. But it's imperfect and static. When I'm asking about recent events, niche domains, or my company's internal context, I'm likely pushing past what it reliably knows.
Working memory (context window) — this one surprised me most. The model doesn't "remember" things between sessions the way we might imagine. It processes the entire conversation as a single block every time it responds. And when that block gets too long, things fall off — usually the oldest parts. There's also a "lost in the middle" effect: material buried deep in a long conversation gets less attention than what's at the start and end. Long threads aren't as reliable as they feel.
Steerability — the model can be directed: a format, a persona, a set of rules. But direction has limits. Instructions drift over long conversations. Complex reasoning chains can accumulate small errors that compound. And sometimes the model follows the letter of a prompt but not the spirit — technically correct, practically useless.

When two properties collide, that's where the failures live

The most practically useful thing I took from the course was this: most AI failures aren't a single property breaking down. They're two properties colliding at the same time.

Next-token prediction + knowledge gap = hallucination. The model generates something that sounds plausible — a citation, a fact, a method name — but there's nothing solid underneath it. It can't distinguish between what it knows and what it's fabricating.

Working memory + steerability = long-conversation drift. Your initial instructions gradually lose weight as the thread grows. The model starts following whatever's most prominent right now — which is usually the last few messages, not the rules you set at the start.

Next-token prediction + steerability = reasoning drift. The logic sounds coherent and follows your instructions step by step, but small errors compound quietly. The confident tone never falters even as the reasoning goes sideways.

Once I could name these patterns, debugging a bad output became much faster. Instead of "this is wrong, try again," I could think: what actually broke here? And then fix it at the root.

What actually changed for me

I didn't come out of this wanting to use AI less. I came out wanting to use it better.

I now think before I delegate: is this something the model genuinely knows well, or am I pushing into the limitation zone? I verify more intentionally: is this a domain where hallucination is likely? I write shorter, clearer prompts — because what's concrete and verifiable is better than what's long and vague. And I check in earlier on long threads instead of assuming the context is holding.

The tools I use every day didn't change. My relationship with them did.

There's something quietly empowering about understanding the machine you're working with — not at a research level, but at a practical one. It's the difference between being a passenger and being a driver. You're still on the same road. You just have your hands on the wheel.