Skip to Content

🎭 Detailed List of Sessions for DDIA Book Reading Odyssey

Return back to the DDIA odyssey page here.

Please note, this page lists only recommended sessions. The optional accelerator sessions keep happening regularly, and are not listed here.

Introduction and Kick-off

Session Intro.1: What is the goal of reading?

  • Blooms taxonomy

  • Why do YOU want to read DDIA?

  • Power of abstractions and jargons

  • What does the book have to offer?


Session Intro.2: Design Your Reading System 

  • Designing your environment for reading
  • Managing distractions while reading
  • Create personalized plan for your reading
  • Learn reading in multiple passes
  • Note taking techniques to maximize focus
  • Format and requirements of discussions

Chapter 1: Reliable, Scalable, and Maintainable Applications

Discussion 1.1 (Pages: 3-13)

  • Thinking about data systems
  • Reliability
    • Hardware Faults
    • Software Errors
    • Human Errors
    • How Important Is Reliability?
  • Scalability
    • Describing Load

Discussion 1.2 (Pages: 13-23)

  • Scalability
    • Describing Performance
      • Percentiles in Practice
    • Approaches for Coping with Load
  • Maintainability
    • Operability: Making Life Easy for Operations
    • Simplicity: Managing Complexity
    • Evolvability: Making Change Easy
  • Summary

Masterclass 1.M: Intuitive view of system performance

  • Basic building blocks of a system
    • Simple view of a large complex system
    • Breaking a system down into components
  • Dealing with confusing numbers
  • Fundamental laws of system performance
    • Utilization law
    • Little's law
    • Response time law
    • Forced flow law
    • Flow balance assumption
  • Understanding wait time vs service time vs response time
  • Examples
    • Deeper view of web-server and DB
    • Performance of cache
    • Performance with retries
    • Performance with distributions (and not constants)
  • Describing performance of a system
    • Describing performance (scalability)
    • Describing availability (reliability)
  • Bottlenecks
    • Solving for just correctness (and not scale)
    • Finding the bottlenecks
    • Focusing on bottlenecks and solving for them

Case Study 1.C: Intuitive view of your production system

  • Build an intuitive model for your production system
  • Discuss if you aren't able to solve some part of it
  • Show and see the skeleton of systems trivially



Chapter 2: Data Models and Query Languages

Discussion 2.1 (Pages: 27-48)

  • Relational Model Versus Document Model
    • The Birth of NoSQL
    • The Object-Relational Mismatch
    • Many-to-One and Many-to-Many Relationships
    • Are Document Databases Repeating History?
      • The network model
      • The relational model
      • Comparison to document databases
    • Relational Versus Document Databases Today
      • Schema flexibility in the document model
      • Data locality for queries
      • Convergence of document and relational databases
  • Query Languages for Data
    • Declarative Queries on the Web
    • MapReduce Querying

Discussion 2.2 (Pages: 49-64)

  • Graph-Like Data Models
    • Property Graphs
    • The Cypher Query Language
    • Graph Queries in SQL
    • Triple-Stores and SPARQL
      • The semantic web
      • The RDF data model
      • The SPARQL query language
      • Graph databases compared to the network model
    • The Foundation: Datalog
  • Summary

Masterclass 2.M: Schema considerations for query performance

  • Relational DB: Normalize vs denormalize

  • Document DB: Nest vs Separate

  • Data Warehouses: Schema considerations

  • Time series DB: Schema considerations


Case Study 2.C: Cost considerations for DBs

  • Pre-provisioned large capacity
  • Auto-expanding disks
  • Needless replicas
  • Avoiding provisioning for peaks

Chapter 3:  Storage and Retrieval

Discussion 3.1 (Pages: 69-79)

  • Data Structures That Power Your Database
    • Hash Indexes
    • SSTables and LSM-Trees
      • Constructing and maintaining SSTables
      • Making an LSM-tree out of SSTables
      • Performance optimizations

Discussion 3.2 (Pages: 79-90)

  • Data Structures That Power Your Database
    • B-Trees
      • Making B-trees reliable
      • B-tree optimizations
    • Comparing B-Trees and LSM-Trees
      • Advantages of LSM-trees
      • Downsides of LSM-trees
    • Other Indexing Structures 
      • Storing values within the index
      • Multi-column indexes
      • Full-text search and fuzzy indexes
      • Keeping everything in memory

Discussion 3.3 (Pages: 90-104)

  • Transaction Processing or Analytics?
    • Data Warehousing
      • The divergence between OLTP databases and data warehouses
    • Stars and Snowflakes: Schemas for Analytics
  • Column-Oriented Storage
    • Column Compression
    • Sort Order in Column Storage
      • Several different sort orders
    • Writing to Column-Oriented Storage
    • Aggregation: Data Cubes and Materialized Views 
  • Summary

Masterclass 3.M: Indexes!

  • Hash indexes
  • GIN
  • BRIN
  • Full text indexes
  • 2dsphere
  • Hands-on!

Case Study 3.C: Production issues due to DB slowdown

  • Slow query analysis: A primer
  • Solve 5 simulated production issues

Chapter 4:  Encoding and Evolution

Discussion 4.1 (Pages: 111-128)

  • Formats for Encoding Data
    • Language-Specific Formats
    • JSON, XML, and Binary Variants
      • Binary encoding
    • Thrift and Protocol Buffers
      • Field tags and schema evolution
      • Datatypes and schema evolution
    • Avro
      • The writer's schema and the reader's schema
      • Schema evolution rules
      • But what is the writer's schema?
      • Dynamically generated schemas
      • Code generation and dynamically typed languages
    • The Merits of Schemas 

Discussion 4.2 (Pages: 128-140)

  • Modes of Dataflow
    • Dataflow Through Databases
      • Different values written at different times
      • Archival storage
    • Dataflow Through Services: REST and RPC
      • Web services
      • The problems with remote procedure calls (RPCs)
      • Current directions for RPC
      • Data encoding and evolution for RPC
    • Message-Passing Dataflow
      • Message brokers
      • Distributed actor frameworks
  • Summary

Masterclass 4.M: Zero Downtime Deployments (ZDD)

  • Easy: Backward compatible rollouts

    • Blue green deployments

    • Rolling deployments

  • Hard: Backward incompatible rollouts: through backward compatibility layers

  • What can’t be done in a zero downtime manner?

  • Examples

    • API changes

    • DB schema changes

    • Cache schema changes

    • Queue changes


Case Study 4.C: ZDD in your production projects

  • Talk about interesting variations of system evolution
  • Talk about ugly downtime situations that you think could be avoided with ZDD
  • Challenges that you see in getting ZDD implemented

Chapter 5: Replication

Discussion 5.1 (Pages: 151-161)

  • Leaders and Followers
    • Synchronous Versus Asynchronous Replication
      • Research on replication
    • Setting Up New Followers
    • Handling Node Outages
      • Follower failure: Catch-up recovery
      • Leader failure: Failover
    • Implementation of Replication Logs
      • Statement-based replication
      • Write-ahead log (WAL) shipping
      • Logical (row-based) log replication
      • Trigger-based replication

Discussion 5.2 (Pages: 161-167)

  • Problems with Replication Lag
    • Reading Your Own Writes
    • Monotonic Reads
    • Consistent Prefix Reads
    • Solutions for Replication Lag

Discussion 5.3 (Pages: 168-177)

  • Multi-Leader Replication
    • Use Cases for Multi-Leader Replication
      • Multi-datacenter operation
      • Clients with offline operation
      • Collaborative editing
    • Handling Write Conflicts
      • Synchronous versus asynchronous conflict detection
      • Conflict avoidance
      • Converging toward a consistent state
      • Custom conflict resolution logic
      • Atomic conflict resolution
      • What is a conflict?
    • Multi-Leader Replication Topologies

Discussion 5.4 (Pages: 177-193)

  • Leaderless Replication
    • Writing to the Database When a Node Is Down
      • Read repair and anti-entropy
      • Quorums for reading and writing
    • Limitations of Quorum Consistency
      • Monitoring staleness
    • Sloppy Quorums and Hinted Handoff
      • Multi-datacenter operation
    • Detecting Concurrent Writes
      • Last write wins (discarding concurrent writes)
      • The "happens-before" relationship and concurrency
      • Concurrency, Time, and Relativity
      • Capturing the happens-before relationship
      • Merging concurrently written values
      • Version vectors
  • Summary

Masterclass 5.M: Replication hands-on with Postgres and MongoDB


Case Study 5.C: Google Docs (real-time collaborative editor)


Chapter 6: Partitioning

Discussion 6.1 (Pages: 199-209)

  • Partitioning and Replication
  • Partitioning of Key-Value Data
    • Partitioning by Key Range
    • Partitioning by Hash of Key
      • Consistent Hashing
    • Skewed Workloads and Relieving Hot Spots
  • Partitioning and Secondary Indexes
    • Partitioning Secondary Indexes by Document
    • Partitioning Secondary Indexes by Term 

Discussion 6.2 (Pages: 209-218)

  • Rebalancing Partitions
    • Strategies for Rebalancing
      • How not to do it: hash mod N
      • Fixed number of partitions
      • Dynamic partitioning
      • Partitioning proportionally to nodes
    • Operations: Automatic or Manual Rebalancing
  • Request Routing
    • Parallel Query Execution
  • Summary

Masterclass 6.M: Partitioning considerations in real-world

  • Triggers for partitioning
  • Selecting partitioning key
  • Online schema changes
    • Considerations
    • Techniques
  • Versioned schemas for DBs
  • Versioned schemas for queues

Case Study 6.C: Scaling for eCommerce flash sale

  • Partitioning concerns

Chapter 7: Transactions

Discussion 7.1 (Pages: 221-232)

  • The Slippery Concept of a Transaction
    • The Meaning of ACID
      • Atomicity
      • Consistency
      • Isolation
      • Durability
      • Replication and durability
    • Single-Object and Multi-Object Operations 
      • Single-object writes
      • The need for multi-object transactions
      • Handling errors and aborts

Discussion 7.2 (Pages: 233-242)

  • Weak Isolation Levels
    • Read Committed
      • No dirty reads
      • No dirty writes
      • Implementing read committed
    • Snapshot Isolation and Repeatable Read
      • Implementing snapshot isolation
      • Visibility rules for observing a consistent snapshot
      • Indexes and snapshot isolation
      • Repeatable read and naming confusion

Discussion 7.3 (Pages: 242-251)

  • Weak Isolation Levels
    • Preventing Lost Updates
      • Atomic write operations
      • Explicit locking
      • Automatically detecting lost updates
      • Compare-and-set
      • Conflict resolution and replication
    • Write Skew and Phantoms 
      • Characterizing write skew
      • More examples of write skew
      • Phantoms causing write skew
      • Materializing conflicts

Discussion 7.4 (Pages: 251-267)

  • Serializability
    • Actual Serial Execution
      • Encapsulating transactions in stored procedures
      • Pros and cons of stored procedures
      • Partitioning
      • Summary of serial execution
    • Two-Phase Locking (2PL)
      • Implementation of two-phase locking
      • Performance of two-phase locking
      • Predicate locks
      • Index-range locks
    • Serializable Snapshot Isolation (SSI)
      • Pessimistic versus optimistic concurrency control
      • Decisions based on an outdated premise
      • Detecting stale MVCC reads
      • Detecting writes that affect prior reads
      • Performance of serializable snapshot isolation
  • Summary 

Masterclass 7.M: MVCC and SSI Implementation Details

  • MVCC and SSI: Recap
  • Construct a design for MVCC
  • MVCC in Postgresql
    • Row storage
    • Versioning: transaction id
    • Visibility rules for transactions
    • Vaccumming
  • SSI in Postgresql
  • MVCC in MySQL

Case Study 7.C: Scaling for eCommerce flash sale

  • Isolation concerns

Chapter 8: The Trouble with Distributed Systems

Discussion 8.1 (Pages: 273-287)

  • Faults and Partial Failures
    • Cloud Computing and Supercomputing
  • Unreliable Networks
    • Network Faults in Practice
    • Detecting Faults
    • Timeouts and Unbounded Delays
      • Network congestion and queueing
      • TCP vs UDP
    • Synchronous Versus Asynchronous Networks
      • Can we not simply make network delays predictable?
      • Latency and Resource Utilization

Discussion 8.2 (Pages: 287-299)

  • Unreliable Clocks
    • Monotonic Versus Time-of-Day Clocks
      • Time-of-day clocks
      • Monotonic clocks
    • Clock Synchronization and Accuracy
    • Relying on Synchronized Clocks
      • Timestamps for ordering events
      • Clock readings have a confidence interval
      • Synchronized clocks for global snapshots
    • Process Pauses
      • Response time guarantees
      • Limiting the impact of garbage collection

Discussion 8.3 (Pages: 300-312)

  • Knowledge, Truth, and Lies
    • The Truth Is Defined by the Majority
      • The leader and the lock
      • Fencing tokens
    • Byzantine Faults
      • Weak forms of lying
    • System Model and Reality
      • Correctness of an algorithm
      • Safety and liveness
      • Mapping system models to the real world
  • Summary 

Masterclass 8.M: Mental models for resiliency in production systems

  • Embrace Failure (Assume unreliability)
    • Timeouts and its antipatterns
    • Retries and its antipatterns
    • Circuit breakers: what it is
    • Circuit breakers: antipatterns
    • NTP sync for clocks
  • Isolate and contain (Limit blast radius)
    • Avoid cascading failure
    • Bulkhead pattern
    • Async communication
  • Maintain consistency (Despite uncertainty)
    • Safe retries: what
    • Safe retries: how to achieve idempotency?
      • HTTP methods
      • Idempotency keys
      • Resource versionings / ETags
      • Database constraints
      • Transactional outbox pattern
      • State machines
    • Ordering events for debugging
  • Observe and adapt (Know your system)
    • Debugging and diagnosis through observability
    • Validating resilience (chaos engineering)
    • Learning from failures
      • incident RCAs
      • Blameless postmortems
  • Define your boundaries and contracts (Manage dependencies)
    • Inter-service communication
      • API contracts
      • API versioning

Case Study 8.C: Complex production incident triggered by payment-gateway scaling issues in an eCommerce platform


Chapter 9: Consistency and Consensus

Discussion 9.1 (Pages: 321-338)

  • Consistency Guarantees
  • Linearizability
    • What Makes a System Linearizable?
      • Linearizability vs Serializability
    • Relying on Linearizability
      • Locking and leader election
      • Constraints and uniqueness guarantees
      • Cross-channel timing dependencies
    • Implementing Linearizable Systems
      • Linearizability and quorums
    • The Cost of Linearizability
      • The CAP theorem
      • The unhelpful CAP Theorem
      • Linearizability and network delays

Discussion 9.2 (Pages: 339-352)

  • Ordering Guarantees
    • Ordering and Causality
      • The causal order is not a total order
      • Linearizability is stronger than causal consistency
      • Capturing causal dependencies
    • Sequence Number Ordering 
      • Noncausal sequence number generators
      • Lamport timestamps
      • Timestamp ordering is not sufficient
    • Total Order Broadcast
      • Using total order broadcast
      • Implementing linearizable storage using total order broadcast
      • Implementing total order broadcast using linearizable storage

Discussion 9.3 (Pages: 352-364)

  • Distributed Transactions and Consensus
    • The impossibility of consensus
    • Atomic Commit and Two-Phase Commit (2PC)
      • From single-node to distributed atomic commit
      • Introduction to two-phase commit
      • Don't confuse 2PC and 2PL
      • A system of promises
      • Coordinator failure
      • Three-phase commit
    • Distributed Transactions in Practice
      • Exactly-once message processing
      • XA transactions
      • Holding locks while in doubt
      • Recovering from coordinator failure
      • Limitations of distributed transactions

Discussion 9.4 (Pages: 364-375)

  • Distributed Transactions and Consensus
    • Fault-Tolerant Consensus
      • Consensus algorithms and total order broadcast
      • Single-leader replication and consensus
      • Epoch numbering and quorums
      • Limitations of consensus
    • Membership and Coordination Services
      • Allocating work to nodes
      • Service discovery
      • Membership services
  • Summary

Masterclass 9.M: Details of Paxos and Raft

  • Abstraction for fault tolerance: Consensus
  • Consensus vs Consistency
  • Fault tolerant consensus
  • Paxos deep-dive
  • Raft deep-dive

Case Study 9.C: MongoDB down in production

  • Limitations of Raft

Chapter 10: Batch Processing

Discussion 10.1 (Pages: 389-397)

  • Batch Processing with Unix Tools
    • Simple Log Analysis
      • Chain of commands versus custom program
      • Sorting versus in-memory aggregation
    • The Unix Philosophy
      • A uniform interface
      • Separation of logic and wiring
      • Transparency and experimentation

Discussion 10.2 (Pages: 397-418)

  • MapReduce and Distributed Filesystems
    • MapReduce Job Execution
      • Distributed execution of MapReduce
      • MapReduce workflows
    • Reduce-Side Joins and Grouping
      • Example: analysis of user activity events
      • Sort-merge joins
      • Bringing related data together in the same place
      • Group by
      • Handling skew
    • Map-Side Joins
      • Broadcast hash joins
      • Partitioned hash joins
      • Map-side merge joins
      • MapReduce workflows with map-side joins
    • The Output of Batch Workflows
      • Building search indexes
      • Key-value stores as batch process output
      • Philosophy of batch process outputs
    • Comparing Hadoop to Distributed Databases
      • Diversity of storage
      • Diversity of processing models
      • Designing for frequent faults

Discussion 10.3 (Pages: 418-431)

  • Beyond MapReduce
    • Materialization of Intermediate State
      • Dataflow engines
      • Fault tolerance
      • Discussion of materialization
    • Graphs and Iterative Processing
      • The Pregel processing model
      • Fault tolerance
      • Parallel execution
    • High-Level APIs and Languages
      • The move toward declarative query languages
      • Specialization for different domains
  • Summary

Masterclass 10.M: Unix Shell-Scripting Internals


Case Study 10.C: Hands-on Apache Airflow


Chapter 11: Stream Processing

Discussion 11.1 (Pages: 439-451)

  • Transmitting Event Streams
    • Messaging Systems
      • Direct messaging from producers to consumers
      • Message brokers
      • Message brokers compared to databases
      • Multiple consumers
      • Acknowledgements and redelivery
    • Partitioned Logs
      • Using logs for message storage
      • Logs compared to traditional messaging
      • Consumer offsets
      • Disk space usage
      • When consumers cannot keep up with producers
      • Replaying old messages

Discussion 11.2 (Pages: 451-464)

  • Databases and Streams
    • Keeping Systems in Sync
    • Change Data Capture
      • Implementing change data capture
      • Initial snapshot
      • Log compaction
      • API support for change streams
    • Event Sourcing
      • Deriving current state from the event log
      • Commands and events
    • State, Streams, and Immutability
      • Advantages of immutable events
      • Deriving several views from the same event log
      • Concurrency control
      • Limitations of immutability

Discussion 11.3 (Pages: 464-481)

  • Processing Streams
    • Uses of Stream Processing
      • Complex event processing
      • Stream analytics
      • Maintaining materialized views
      • Search on streams
      • Message passing and RPC
    • Reasoning About Time
      • Event time versus processing time
      • Knowing when you are ready
      • Whose clock are you using, anyway?
      • Types of windows
    • Stream Joins
      • Stream-stream join (window join)
      • Stream-table join (stream enrichment)
      • Table-table join (materialized view maintenance)
      • Time-dependence of joins
    • Fault Tolerance
      • Microbatching and checkpointing
      • Atomic commit revisited
      • Idempotence
      • Rebuilding state after a failure
  • Summary

Masterclass 11.M: CDC Hands-on

  • Heterogeneous replication

    • Debezium and CDC

    • Hands-on

  • Use cases:

    • Search β†’ Postgres to ElasticSearch

    • Caching β†’ Postgres to redis

    • Search and reporting β†’ Postgres to MongoDB

    • Warehousing β†’ Postgres/MongoDB to BigQuery/Hadoop/Teradata/etc

  • Cost considerations for CDC

    • Write amplification

    • Network costs

    • Architectural hygiene vs cost


Case Study 11.C: Real-Time Credit Card Fraud Detection


Chapter 12: The Future of Data Systems

Discussion 12.1 (Pages: 489-498)

  • Data Integration
    • Combining Specialized Tools by Deriving Data
      • Reasoning about dataflows
      • Derived state versus distributed transactions
      • The limits of total ordering
      • Ordering events to capture causality
    • Batch and Stream Processing
      • Maintaining derived state
      • Reprocessing data for application evolution
      • Schema migrations on railways
      • The lambda architecture
      • Unifying batch and stream processing

Discussion 12.2 (Pages: 499-515)

  • Unbundling Databases
    • Composing Data Storage Technologies
      • Creating an index
      • The meta-database of everything
      • Making unbundling work
      • Unbundled versus integrated systems
      • What's missing?
    • Designing Applications Around Dataflow
      • Application code as a derivation function
      • Separation of application code and state
      • Dataflow: Interplay between state changes and application code
      • Stream processor and services
    • Observing Derived State
      • Materialized views and caching
      • Stateful, offline-capable clients
      • Pushing state changes to clients
      • End-to-end event streams
      • Reads are events too
      • Multi-parition data processing

Discussion 12.3 (Pages: 515-533)

  • Aiming for Correctness
    • The End-to-End Argument for Databases
      • Exactly-once execution of an operation
      • Duplicate suppression
      • Operation identifiers
      • The end-to-end argument
      • Applying end-to-end thinking in data systems
    • Enforcing Constraints
      • Uniqueness constraints require consensus
      • Uniqueness in log-based messaging
      • Multi-partition request processing
    • Timeliness and Integrity
      • Correctness of dataflow systems
      • Loosely interpreted constraints
      • Coordination-avoiding data systems
    • Trust, but Verify
      • Maintaining integrity in the face of software bugs
      • Don't just blindly trust what they promise
      • A culture of verification
      • Desiging for auditability
      • The end-to-end argument again
      • Tools for auditable data systems

Discussion 12.4 (Pages: 533-544)

  • Doing the Right Thing
    • Predictive Analytics
      • Bias and discrimination
      • Responsibility and accountability
      • Feedback loops
    • Privacy and Tracking
      • Surveillance
      • Consent and freedom of choice
      • Privacy and use of data
      • Data as assets and power
      • Remembering the Industrial Revolution
      • Legislation and self-regulation
  • Summary

Conclusion

Discussion Retro.1

Let's celebrate our collective achievement with a final look back at how far we've come together. We'll share our biggest takeaways, gather your valuable insights, and officially conclude our memorable Odyssey.


Return back to the DDIA odyssey page here.