Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Data Structure

Concept

A foundational idea to recognize and understand.

Understand This First

  • Data Model – data structures implement parts of the data model in running code.

Context

A Data Model says what your system knows about; a Schema says how the database stores it. A data structure says how your running program organizes information in memory so that the operations you need are fast and practical. This is an architectural pattern. Choosing the wrong data structure can make an operation that should take milliseconds take minutes instead.

Problem

How do you organize data in a running program so that the operations you care about — searching, sorting, inserting, grouping — are efficient?

Raw data has no inherent organization. A list of a million customer records could be stored as an unordered pile, but then finding one customer by ID requires scanning every record. The same data in a hash map lets you find any customer instantly. The choice of structure determines what is easy and what is expensive.

Forces

  • Different operations favor different structures: fast lookup suggests a hash map; sorted iteration suggests a tree; first-in-first-out processing suggests a queue.
  • Memory usage and speed often trade off against each other — structures that enable fast lookup may use more memory.
  • The structure must match how the data is actually used, not how it looks conceptually.
  • Simpler structures are easier to understand and debug; complex ones carry a maintenance burden.

Solution

Choose data structures based on the operations your code actually performs, not on how the data looks in the real world. The core structures you will encounter repeatedly are:

Arrays and lists store ordered sequences. Good for iteration and indexed access; poor for searching unless sorted.

Hash maps (also called dictionaries or associative arrays) map keys to values. Excellent for fast lookup by key; no inherent ordering.

Trees organize data hierarchically. Good for sorted operations, range queries, and representing naturally hierarchical data like file systems.

Queues and stacks control the order of processing. Queues process first-in-first-out (like a line at a store); stacks process last-in-first-out (like a stack of plates).

Sets store unique values and answer “is this item present?” quickly.

You don’t need to implement these from scratch; every modern programming language provides them in its standard library. Your job is to pick the right one. When working with an AI agent, specifying the data structure in your instructions (“use a dictionary keyed by user ID”) produces far better code than leaving the choice to the agent, which may default to simple lists even when they’re inappropriate.

How It Plays Out

A developer building a spell-checker needs to determine whether each word in a document exists in a dictionary of 100,000 valid words. Using a list and scanning it for each word would be agonizingly slow. Using a set — which answers “is this word present?” in near-constant time — makes the spell-checker instant.

An AI agent asked to “find duplicate entries in this data” might iterate through nested loops (comparing every item to every other item), which is slow for large datasets. Instructing the agent to “use a set to track seen items and flag duplicates” produces a solution that runs in a fraction of the time.

Tip

When reviewing AI-generated code, check the data structures early. Agents tend to reach for simple lists and arrays by default. A quick note like “use a hash map for lookups” in your prompt can prevent serious performance problems.

Example Prompt

“The duplicate-detection function uses nested loops, which is slow on large lists. Rewrite it to use a set for tracking seen items so lookups are O(1) instead of O(n).”

Consequences

The right data structure makes code faster, simpler, and more readable. It often eliminates the need for clever algorithms because the structure itself handles the hard work. It communicates intent, too: seeing a queue in the code tells a reader “this processes items in order.”

The cost is that data structures require understanding. Choosing poorly (a list where a hash map belongs, a tree where a set suffices) creates invisible performance traps. Over-engineering, using a complex structure where a simple one would work, adds unnecessary complexity. And data structures in memory are transient; if you need persistence, you eventually reach for a Database or Serialization.

  • Uses / Depends on: Data Model — data structures implement parts of the data model in running code.
  • Contrasts with: Schema (Database) — schemas organize data at rest; data structures organize data in motion.
  • Enables: State — data structures are the containers that hold program state.
  • Enables: Serialization — serialization converts in-memory data structures to a portable format.
  • Contrasts with: Domain Model — data structures are implementation-level; a domain model is conceptual.