Logging
Record what your software does as it runs, so you can understand its behavior after the fact.
Understand This First
- Observability – the capability that logging helps you achieve.
- Side Effect – logging is itself a side effect, and it records others.
Context
Your code runs. Something happens. Maybe the right thing, maybe the wrong thing. Either way, the moment passes and the state that produced the outcome is gone. You need a record.
This is a tactical practice that sits at the foundation of runtime understanding. Where Tests verify behavior before code ships, logging captures behavior while code runs. The two serve different questions: tests ask “does it work?” and logs ask “what did it do?”
Problem
Software doesn’t come with a flight recorder by default. When a function returns the wrong result, when a background job stops processing, when a user reports something that works on your machine but not on theirs, your first question is always the same: what happened? Without a record, you’re guessing. You reconstruct the state from memory, from reading code, from “I think it probably went down this path.” Guessing is slow, unreliable, and gets worse as systems grow.
How do you give yourself a reliable account of what your software did, without drowning in noise or leaking sensitive information?
Forces
- You need enough detail to diagnose problems, but too much output buries the signal.
- Log entries are useful only if they carry context: which request, which user, which step.
- Sensitive data (passwords, personal information, API keys) must never appear in logs.
- Logging has runtime cost: disk writes, network calls, CPU cycles spent formatting messages.
- Logs must be readable by both humans and machines. Free-form sentences are easy to write and hard to search.
Solution
Instrument your code to emit structured records of significant events as they happen. Every record should answer three questions: what happened, when it happened, and in what context.
Structured logging means each entry is a set of named fields rather than a prose sentence. Instead of "User placed order successfully", emit {event: "order_placed", user_id: 42, order_id: 789, total: 34.50, duration_ms: 230}. Structured entries are searchable, filterable, and parseable by automated systems.
Severity levels separate routine events from problems. The standard progression is DEBUG, INFO, WARN, ERROR, and FATAL. Use them consistently:
- DEBUG records details you’d want during development but not in production under normal conditions: variable values, branch decisions, cache hits.
- INFO records things worth knowing during normal operation: a request served, a job completed, a connection established.
- WARN records recoverable anomalies: a retry succeeded, a deprecated endpoint was called, a configuration fell back to a default.
- ERROR records failures that need attention: a request that couldn’t be fulfilled, a connection that dropped, a payment that was declined.
- FATAL records failures that stop the process: out of memory, missing required configuration, corrupted state.
Context propagation ties related log entries together. When a web request generates log entries across five functions and two services, each entry should carry the same request ID. When you investigate a problem, that ID lets you pull every log entry for that request in order, reconstructing the full story.
The key discipline is knowing what not to log. Log decisions, outcomes, and errors. Don’t log every variable assignment or loop iteration. A good log reads like a concise narrative of what the system did, not a line-by-line transcript of how it did it.
How It Plays Out
A payment processing service handles thousands of transactions per hour. Each transaction logs its start (INFO: payment_initiated), the authorization result (INFO: payment_authorized or WARN: payment_declined), and completion (INFO: payment_settled). Every entry carries the transaction ID, customer ID, and amount. When a customer reports a charge they don’t recognize, a support engineer searches by customer ID and finds the full sequence of events for every transaction that customer made that day. The investigation takes two minutes instead of two hours.
A team building a REST API adds structured logging to every endpoint. Three weeks later, they notice that WARN entries for the /search endpoint spike every afternoon. The logs show a third-party geocoding service timing out during peak hours. They add a local cache and the warnings disappear. Without logging, they would have discovered the problem only when users started complaining about slow searches, and they’d have had no data pointing to the geocoding service as the cause.
In agentic coding workflows, logging is how you understand what an agent did and why. An AI coding agent works through a task: it reads files, runs tests, edits code, and runs tests again. The session log records each tool call, each model decision, and each test result. When the agent produces unexpected output, you read the log to trace its reasoning. Did it misread the test output? Did it edit the wrong file? The log is your only window into the agent’s process. Without it, debugging an agent’s work means re-running the entire session and hoping to catch the mistake the second time.
When directing an agent to add logging to an existing codebase, specify the severity level and the fields you want in each log entry. “Add INFO logging to the order processing pipeline. Each entry should include order_id, step_name, and duration_ms.” Without this specificity, agents tend to add print statements with free-form strings.
Consequences
Benefits:
- Problems are diagnosed faster because you have a factual record instead of guesses.
- Patterns emerge from log data that you’d never spot from individual incidents: a slow dependency that only affects certain regions, an error that correlates with a specific client version.
- On-call engineers can investigate incidents without needing the original developer’s knowledge of the code.
- Automated monitoring and alerting systems can consume structured logs to detect anomalies without human attention.
Liabilities:
- Log storage costs money. High-throughput services can generate gigabytes per day.
- Poorly designed logging creates noise that makes real signals harder to find.
- Sensitive data in logs creates security and compliance risks. Log contents must be reviewed as carefully as any other output.
- Logging adds latency if writes are synchronous. In performance-sensitive paths, asynchronous logging or sampling may be necessary.
- Stale log statements that reference removed features or renamed fields become misleading. Logging code needs maintenance like any other code.
Related Patterns
- Enables: Observability – logging is the primary mechanism for achieving runtime observability.
- Prevents: Silent Failure – logged events make failures visible that would otherwise go unnoticed.
- Supports investigation of: Failure Mode – logs are the primary evidence when diagnosing which failure mode occurred.
- Supports investigation of: Regression – when a regression reaches production, logs help identify when it started and what changed.
- Complements: Test – tests verify behavior before deployment; logging captures behavior after.
- Specialized by: Progress Log – a progress log is a structured log designed for agentic session journals.
Sources
- The practice of logging predates modern software engineering. System operators have maintained logs of machine behavior since the earliest mainframe installations. The term “log” itself comes from nautical tradition, where a ship’s log recorded speed, weather, and events during a voyage.
- The severity level convention (DEBUG through FATAL) was popularized by Apache Log4j, created by Ceki Gulcu in 2001. Log4j established the pattern that nearly every logging framework since has followed, across languages and platforms.
- The shift from free-form text logging to structured logging was driven by the growth of log aggregation systems (Splunk, Elasticsearch, Datadog) in the 2010s, which made machine-parseable log formats a practical necessity at scale.