Model

Concept

A foundational idea to recognize and understand.

Context

At the agentic level, the model is the foundation everything else rests on. A model (specifically, a large language model or LLM) is the inference engine that powers agents, coding assistants, and every other agentic workflow. When you interact with an AI coding assistant, the model is the part that reads your prompt, processes it within a context window, and produces a response.

Understanding what a model is and isn’t helps you work with it effectively. A model isn’t a database, a search engine, or a compiler. At its foundation, it’s a neural network trained on vast amounts of text and code that has learned statistical patterns in language. But that undersells what modern models actually do. Frontier models decompose multi-step problems, plan solutions, self-correct when they notice errors, and generate working code for tasks they’ve never seen expressed in exactly that form. The “just predicts the next word” framing is like saying a chess engine “just evaluates board positions.” Technically accurate, practically misleading.

Problem

How do you develop an accurate mental model of the model itself, so you can anticipate its strengths and weaknesses when directing it?

People new to agentic coding often treat the model as either a magic oracle (it knows everything) or a simple autocomplete (it just predicts the next word). Both framings lead to poor results. The oracle framing leads to uncritical acceptance of output. The autocomplete framing leads to underusing the model’s genuine capabilities for reasoning, planning, and synthesis.

Forces

Fluency makes model output sound authoritative regardless of correctness.
Training data shapes what the model “knows,” but that knowledge has a cutoff date and reflects the biases and errors of its sources.
Scale gives models broad competence across languages, frameworks, and domains, but depth varies.
Stochasticity means the same prompt can produce different outputs on different runs, though agent harnesses routinely set temperature to zero for deterministic tasks.
Capability spectrum means no single model is best at everything. Fast models, reasoning models, and specialized coding models each suit different tasks.

Solution

Think of the model as a highly capable but context-dependent collaborator. It has broad knowledge but no persistent memory across sessions (unless you provide memory mechanisms). It reasons well within its context window but can’t access information outside that window. It generates plausible output by default and correct output when given sufficient context and clear constraints.

Properties worth internalizing:

Models are stateless between calls. Each request starts fresh. The model doesn’t remember your last conversation unless previous context is explicitly included. This is why instruction files and memory patterns exist.

Models have knowledge cutoffs. They were trained on data up to a specific date. They don’t know about libraries released last week or APIs that changed last month. In agentic settings, tools partially compensate: an agent with web search, file reading, and documentation retrieval can look up current information rather than relying on stale training data. But the model still can’t know what it doesn’t know, so providing current documentation for recent technologies remains good practice.

Models optimize for plausibility. When uncertain, a model produces the most likely-sounding response, not an admission of uncertainty. This is why AI smells exist and why verification loops matter.

Models respond to framing. The same question asked differently produces different quality responses. This is the entire basis of prompt engineering and context engineering.

Models process more than text. Frontier models accept images, audio, and video alongside text. For agentic coding, this means a model can examine screenshots of a broken UI, read diagrams and architecture sketches, or inspect visual test output. Multimodal input expands what you can communicate in a prompt beyond what words alone can express.

Models differ and the differences matter. Fast, inexpensive models handle boilerplate generation, summarization, and simple transformations well. Reasoning models with extended thinking excel at architecture decisions, complex debugging, and multi-step planning. Specialized coding models may outperform general-purpose models on targeted code generation tasks. Matching the model to the task is a practical skill. Using a reasoning model for string formatting wastes time and money; using a fast model for a tricky concurrency bug wastes attempts.

How It Plays Out

A developer asks a model to implement a sorting algorithm. The model produces a clean, correct quicksort. Encouraged, the developer asks it to integrate with a proprietary internal API. The model produces confident-looking code that calls endpoints and uses data structures that don’t exist. It has no knowledge of this private API. The developer learns to provide API documentation in the context when asking for integration work.

A team uses a model to review a pull request. The model identifies a potential race condition that three human reviewers missed, because it systematically traced the concurrent access paths. The same model, in the same review, suggests a “best practice” that’s actually outdated advice from a deprecated framework. The team learns that model output requires verification even when parts of it are excellent.

Example Prompt

“I need you to integrate with our internal inventory API. Here is the full API documentation — read it before generating any code, because you won’t have training data on this private system.”

Consequences

Understanding the model’s nature lets you work with it productively rather than fighting its limitations. You learn to provide the context it needs, verify the output it produces, and choose the right model for each task.

The cost is that you must maintain a dual awareness: appreciating the model’s capabilities while remaining skeptical of any individual output. This is a cognitive skill that takes practice to develop. Over time, it becomes second nature, similar to how experienced developers learn to trust a compiler’s output while distrusting their own assumptions.

Enables: Prompt – the prompt is how you communicate with the model.
Enables: Context Window – the context window is the model’s working memory.
Enables: Agent – an agent is a model placed in a loop with tools.
Refined by: Harness (Agentic) – the harness makes the model practically usable.
Uses: Smell (AI Smell) – understanding the model explains why AI smells occur.
Extended by: Tool – tools let models overcome knowledge cutoffs by accessing live information.

Sources

The concept of the large language model traces to Vaswani et al., “Attention Is All You Need” (2017), which introduced the transformer architecture underlying all modern LLMs.
Jason Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” (2022), demonstrated that models can perform multi-step reasoning when prompted appropriately, challenging the “just predicts the next word” framing.
OpenAI’s release of o1 (September 2024) marked the emergence of dedicated reasoning models that spend compute on extended thinking before responding, establishing the fast-vs-reasoning model distinction as a practical concern for practitioners.

Keyboard shortcuts

Encyclopedia of Agentic Coding Patterns