🎭 Detailed List of Sessions for DDIA Book Reading Odyssey

Return back to the DDIA odyssey page here.

Please note, this page lists only recommended sessions. The optional accelerator sessions keep happening regularly, and are not listed here.

Introduction and Kick-off

Session Intro.1: What is the goal of reading?

Blooms taxonomy
Why do YOU want to read DDIA?
Power of abstractions and jargons
What does the book have to offer?

Session Intro.2: Design Your Reading System

Designing your environment for reading
Managing distractions while reading
Create personalized plan for your reading
Learn reading in multiple passes
Note taking techniques to maximize focus
Format and requirements of discussions

Chapter 1: Reliable, Scalable, and Maintainable Applications

Discussion 1.1 (Pages: 3-13)

Thinking about data systems
Reliability
- Hardware Faults
- Software Errors
- Human Errors
- How Important Is Reliability?
Scalability
- Describing Load

Discussion 1.2 (Pages: 13-23)

Scalability
- Describing Performance
- - Percentiles in Practice
- Approaches for Coping with Load
Maintainability
- Operability: Making Life Easy for Operations
- Simplicity: Managing Complexity
- Evolvability: Making Change Easy
Summary

Masterclass 1.M: Intuitive view of system performance

Basic building blocks of a system
- Simple view of a large complex system
- Breaking a system down into components
Dealing with confusing numbers
Fundamental laws of system performance
- Utilization law
- Little's law
- Response time law
- Forced flow law
- Flow balance assumption
Understanding wait time vs service time vs response time
Examples
- Deeper view of web-server and DB
- Performance of cache
- Performance with retries
- Performance with distributions (and not constants)
Describing performance of a system
- Describing performance (scalability)
- Describing availability (reliability)
Bottlenecks
- Solving for just correctness (and not scale)
- Finding the bottlenecks
- Focusing on bottlenecks and solving for them

Case Study 1.C: Intuitive view of your production system

Build an intuitive model for your production system
Discuss if you aren't able to solve some part of it
Show and see the skeleton of systems trivially

Chapter 2: Data Models and Query Languages

Discussion 2.1 (Pages: 27-48)

Relational Model Versus Document Model
- The Birth of NoSQL
- The Object-Relational Mismatch
- Many-to-One and Many-to-Many Relationships
- Are Document Databases Repeating History?
- - The network model
  - The relational model
  - Comparison to document databases
- Relational Versus Document Databases Today
- - Schema flexibility in the document model
  - Data locality for queries
  - Convergence of document and relational databases
Query Languages for Data
- Declarative Queries on the Web
- MapReduce Querying

Discussion 2.2 (Pages: 49-64)

Graph-Like Data Models
- Property Graphs
- The Cypher Query Language
- Graph Queries in SQL
- Triple-Stores and SPARQL
- - The semantic web
  - The RDF data model
  - The SPARQL query language
  - Graph databases compared to the network model
- The Foundation: Datalog
Summary

Masterclass 2.M: Schema considerations for query performance

Relational DB: Normalize vs denormalize
Document DB: Nest vs Separate
Data Warehouses: Schema considerations
Time series DB: Schema considerations

Case Study 2.C: Cost considerations for DBs

Pre-provisioned large capacity
Auto-expanding disks
Needless replicas
Avoiding provisioning for peaks

Chapter 3: Storage and Retrieval

Discussion 3.1 (Pages: 69-79)

Data Structures That Power Your Database
- Hash Indexes
- SSTables and LSM-Trees
- - Constructing and maintaining SSTables
  - Making an LSM-tree out of SSTables
  - Performance optimizations

Discussion 3.2 (Pages: 79-90)

Data Structures That Power Your Database
- B-Trees
- - Making B-trees reliable
  - B-tree optimizations
- Comparing B-Trees and LSM-Trees
- - Advantages of LSM-trees
  - Downsides of LSM-trees
- Other Indexing Structures
- - Storing values within the index
  - Multi-column indexes
  - Full-text search and fuzzy indexes
  - Keeping everything in memory

Discussion 3.3 (Pages: 90-104)

Transaction Processing or Analytics?
- Data Warehousing
- - The divergence between OLTP databases and data warehouses
- Stars and Snowflakes: Schemas for Analytics
Column-Oriented Storage
- Column Compression
- Sort Order in Column Storage
- - Several different sort orders
- Writing to Column-Oriented Storage
- Aggregation: Data Cubes and Materialized Views
Summary

Masterclass 3.M: Indexes!

Hash indexes
GIN
BRIN
Full text indexes
2dsphere
Hands-on!

Case Study 3.C: Production issues due to DB slowdown

Slow query analysis: A primer
Solve 5 simulated production issues

Chapter 4: Encoding and Evolution

Discussion 4.1 (Pages: 111-128)

Formats for Encoding Data
- Language-Specific Formats
- JSON, XML, and Binary Variants
- - Binary encoding
- Thrift and Protocol Buffers
- - Field tags and schema evolution
  - Datatypes and schema evolution
- Avro
- - The writer's schema and the reader's schema
  - Schema evolution rules
  - But what is the writer's schema?
  - Dynamically generated schemas
  - Code generation and dynamically typed languages
- The Merits of Schemas

Discussion 4.2 (Pages: 128-140)

Modes of Dataflow
- Dataflow Through Databases
- - Different values written at different times
  - Archival storage
- Dataflow Through Services: REST and RPC
- - Web services
  - The problems with remote procedure calls (RPCs)
  - Current directions for RPC
  - Data encoding and evolution for RPC
- Message-Passing Dataflow
- - Message brokers
  - Distributed actor frameworks
Summary

Masterclass 4.M: Zero Downtime Deployments (ZDD)

Easy: Backward compatible rollouts
- Blue green deployments
- Rolling deployments
Hard: Backward incompatible rollouts: through backward compatibility layers
What can’t be done in a zero downtime manner?
Examples
- API changes
- DB schema changes
- Cache schema changes
- Queue changes

Case Study 4.C: ZDD in your production projects

Talk about interesting variations of system evolution
Talk about ugly downtime situations that you think could be avoided with ZDD
Challenges that you see in getting ZDD implemented

Chapter 5: Replication

Discussion 5.1 (Pages: 151-161)

Leaders and Followers
- Synchronous Versus Asynchronous Replication
- - Research on replication
- Setting Up New Followers
- Handling Node Outages
- - Follower failure: Catch-up recovery
  - Leader failure: Failover
- Implementation of Replication Logs
- - Statement-based replication
  - Write-ahead log (WAL) shipping
  - Logical (row-based) log replication
  - Trigger-based replication

Discussion 5.2 (Pages: 161-167)

Problems with Replication Lag
- Reading Your Own Writes
- Monotonic Reads
- Consistent Prefix Reads
- Solutions for Replication Lag

Discussion 5.3 (Pages: 168-177)

Multi-Leader Replication
- Use Cases for Multi-Leader Replication
- - Multi-datacenter operation
  - Clients with offline operation
  - Collaborative editing
- Handling Write Conflicts
- - Synchronous versus asynchronous conflict detection
  - Conflict avoidance
  - Converging toward a consistent state
  - Custom conflict resolution logic
  - Atomic conflict resolution
  - What is a conflict?
- Multi-Leader Replication Topologies

Discussion 5.4 (Pages: 177-193)

Leaderless Replication
- Writing to the Database When a Node Is Down
- - Read repair and anti-entropy
  - Quorums for reading and writing
- Limitations of Quorum Consistency
- - Monitoring staleness
- Sloppy Quorums and Hinted Handoff
- - Multi-datacenter operation
- Detecting Concurrent Writes
- - Last write wins (discarding concurrent writes)
  - The "happens-before" relationship and concurrency
  - Concurrency, Time, and Relativity
  - Capturing the happens-before relationship
  - Merging concurrently written values
  - Version vectors
Summary

Masterclass 5.M: Replication hands-on with Postgres and MongoDB

Case Study 5.C: Google Docs (real-time collaborative editor)

Chapter 6: Partitioning

Discussion 6.1 (Pages: 199-209)

Partitioning and Replication
Partitioning of Key-Value Data
- Partitioning by Key Range
- Partitioning by Hash of Key
- - Consistent Hashing
- Skewed Workloads and Relieving Hot Spots
Partitioning and Secondary Indexes
- Partitioning Secondary Indexes by Document
- Partitioning Secondary Indexes by Term

Discussion 6.2 (Pages: 209-218)

Rebalancing Partitions
- Strategies for Rebalancing
- - How not to do it: hash mod N
  - Fixed number of partitions
  - Dynamic partitioning
  - Partitioning proportionally to nodes
- Operations: Automatic or Manual Rebalancing
Request Routing
- Parallel Query Execution
Summary

Masterclass 6.M: Partitioning considerations in real-world

Triggers for partitioning
Selecting partitioning key
Online schema changes
- Considerations
- Techniques
Versioned schemas for DBs
Versioned schemas for queues

Case Study 6.C: Scaling for eCommerce flash sale

Partitioning concerns

Chapter 7: Transactions

Discussion 7.1 (Pages: 221-232)

The Slippery Concept of a Transaction
- The Meaning of ACID
- - Atomicity
  - Consistency
  - Isolation
  - Durability
  - Replication and durability
- Single-Object and Multi-Object Operations
- - Single-object writes
  - The need for multi-object transactions
  - Handling errors and aborts

Discussion 7.2 (Pages: 233-242)

Weak Isolation Levels
- Read Committed
- - No dirty reads
  - No dirty writes
  - Implementing read committed
- Snapshot Isolation and Repeatable Read
- - Implementing snapshot isolation
  - Visibility rules for observing a consistent snapshot
  - Indexes and snapshot isolation
  - Repeatable read and naming confusion

Discussion 7.3 (Pages: 242-251)

Weak Isolation Levels
- Preventing Lost Updates
- - Atomic write operations
  - Explicit locking
  - Automatically detecting lost updates
  - Compare-and-set
  - Conflict resolution and replication
- Write Skew and Phantoms
- - Characterizing write skew
  - More examples of write skew
  - Phantoms causing write skew
  - Materializing conflicts

Discussion 7.4 (Pages: 251-267)

Serializability
- Actual Serial Execution
- - Encapsulating transactions in stored procedures
  - Pros and cons of stored procedures
  - Partitioning
  - Summary of serial execution
- Two-Phase Locking (2PL)
- - Implementation of two-phase locking
  - Performance of two-phase locking
  - Predicate locks
  - Index-range locks
- Serializable Snapshot Isolation (SSI)
- - Pessimistic versus optimistic concurrency control
  - Decisions based on an outdated premise
  - Detecting stale MVCC reads
  - Detecting writes that affect prior reads
  - Performance of serializable snapshot isolation
Summary

Masterclass 7.M: MVCC and SSI Implementation Details

MVCC and SSI: Recap
Construct a design for MVCC
MVCC in Postgresql
- Row storage
- Versioning: transaction id
- Visibility rules for transactions
- Vaccumming
SSI in Postgresql
MVCC in MySQL

Case Study 7.C: Scaling for eCommerce flash sale

Isolation concerns

Chapter 8: The Trouble with Distributed Systems

Discussion 8.1 (Pages: 273-287)

Faults and Partial Failures
- Cloud Computing and Supercomputing
Unreliable Networks
- Network Faults in Practice
- Detecting Faults
- Timeouts and Unbounded Delays
- - Network congestion and queueing
  - TCP vs UDP
- Synchronous Versus Asynchronous Networks
- - Can we not simply make network delays predictable?
  - Latency and Resource Utilization

Discussion 8.2 (Pages: 287-299)

Unreliable Clocks
- Monotonic Versus Time-of-Day Clocks
- - Time-of-day clocks
  - Monotonic clocks
- Clock Synchronization and Accuracy
- Relying on Synchronized Clocks
- - Timestamps for ordering events
  - Clock readings have a confidence interval
  - Synchronized clocks for global snapshots
- Process Pauses
- - Response time guarantees
  - Limiting the impact of garbage collection

Discussion 8.3 (Pages: 300-312)

Knowledge, Truth, and Lies
- The Truth Is Defined by the Majority
- - The leader and the lock
  - Fencing tokens
- Byzantine Faults
- - Weak forms of lying
- System Model and Reality
- - Correctness of an algorithm
  - Safety and liveness
  - Mapping system models to the real world
Summary

Masterclass 8.M: Mental models for resiliency in production systems

Embrace Failure (Assume unreliability)
- Timeouts and its antipatterns
- Retries and its antipatterns
- Circuit breakers: what it is
- Circuit breakers: antipatterns
- NTP sync for clocks
Isolate and contain (Limit blast radius)
- Avoid cascading failure
- Bulkhead pattern
- Async communication
Maintain consistency (Despite uncertainty)
- Safe retries: what
- Safe retries: how to achieve idempotency?
- - HTTP methods
  - Idempotency keys
  - Resource versionings / ETags
  - Database constraints
  - Transactional outbox pattern
  - State machines
- Ordering events for debugging
Observe and adapt (Know your system)
- Debugging and diagnosis through observability
- Validating resilience (chaos engineering)
- Learning from failures
- - incident RCAs
  - Blameless postmortems
Define your boundaries and contracts (Manage dependencies)
- Inter-service communication
- - API contracts
  - API versioning

Case Study 8.C: Complex production incident triggered by payment-gateway scaling issues in an eCommerce platform

Chapter 9: Consistency and Consensus

Discussion 9.1 (Pages: 321-338)

Consistency Guarantees
Linearizability
- What Makes a System Linearizable?
- - Linearizability vs Serializability
- Relying on Linearizability
- - Locking and leader election
  - Constraints and uniqueness guarantees
  - Cross-channel timing dependencies
- Implementing Linearizable Systems
- - Linearizability and quorums
- The Cost of Linearizability
- - The CAP theorem
  - The unhelpful CAP Theorem
  - Linearizability and network delays

Discussion 9.2 (Pages: 339-352)

Ordering Guarantees
- Ordering and Causality
- - The causal order is not a total order
  - Linearizability is stronger than causal consistency
  - Capturing causal dependencies
- Sequence Number Ordering
- - Noncausal sequence number generators
  - Lamport timestamps
  - Timestamp ordering is not sufficient
- Total Order Broadcast
- - Using total order broadcast
  - Implementing linearizable storage using total order broadcast
  - Implementing total order broadcast using linearizable storage

Discussion 9.3 (Pages: 352-364)

Distributed Transactions and Consensus
- The impossibility of consensus
- Atomic Commit and Two-Phase Commit (2PC)
- - From single-node to distributed atomic commit
  - Introduction to two-phase commit
  - Don't confuse 2PC and 2PL
  - A system of promises
  - Coordinator failure
  - Three-phase commit
- Distributed Transactions in Practice
- - Exactly-once message processing
  - XA transactions
  - Holding locks while in doubt
  - Recovering from coordinator failure
  - Limitations of distributed transactions

Discussion 9.4 (Pages: 364-375)

Distributed Transactions and Consensus
- Fault-Tolerant Consensus
- - Consensus algorithms and total order broadcast
  - Single-leader replication and consensus
  - Epoch numbering and quorums
  - Limitations of consensus
- Membership and Coordination Services
- - Allocating work to nodes
  - Service discovery
  - Membership services
Summary

Masterclass 9.M: Details of Paxos and Raft

Abstraction for fault tolerance: Consensus
Consensus vs Consistency
Fault tolerant consensus
Paxos deep-dive
Raft deep-dive

Case Study 9.C: MongoDB down in production

Limitations of Raft

Chapter 10: Batch Processing

Discussion 10.1 (Pages: 389-397)

Batch Processing with Unix Tools
- Simple Log Analysis
- - Chain of commands versus custom program
  - Sorting versus in-memory aggregation
- The Unix Philosophy
- - A uniform interface
  - Separation of logic and wiring
  - Transparency and experimentation

Discussion 10.2 (Pages: 397-418)

MapReduce and Distributed Filesystems
- MapReduce Job Execution
- - Distributed execution of MapReduce
  - MapReduce workflows
- Reduce-Side Joins and Grouping
- - Example: analysis of user activity events
  - Sort-merge joins
  - Bringing related data together in the same place
  - Group by
  - Handling skew
- Map-Side Joins
- - Broadcast hash joins
  - Partitioned hash joins
  - Map-side merge joins
  - MapReduce workflows with map-side joins
- The Output of Batch Workflows
- - Building search indexes
  - Key-value stores as batch process output
  - Philosophy of batch process outputs
- Comparing Hadoop to Distributed Databases
- - Diversity of storage
  - Diversity of processing models
  - Designing for frequent faults

Discussion 10.3 (Pages: 418-431)

Beyond MapReduce
- Materialization of Intermediate State
- - Dataflow engines
  - Fault tolerance
  - Discussion of materialization
- Graphs and Iterative Processing
- - The Pregel processing model
  - Fault tolerance
  - Parallel execution
- High-Level APIs and Languages
- - The move toward declarative query languages
  - Specialization for different domains
Summary

Masterclass 10.M: Unix Shell-Scripting Internals

Case Study 10.C: Hands-on Apache Airflow

Chapter 11: Stream Processing

Discussion 11.1 (Pages: 439-451)

Transmitting Event Streams
- Messaging Systems
- - Direct messaging from producers to consumers
  - Message brokers
  - Message brokers compared to databases
  - Multiple consumers
  - Acknowledgements and redelivery
- Partitioned Logs
- - Using logs for message storage
  - Logs compared to traditional messaging
  - Consumer offsets
  - Disk space usage
  - When consumers cannot keep up with producers
  - Replaying old messages

Discussion 11.2 (Pages: 451-464)

Databases and Streams
- Keeping Systems in Sync
- Change Data Capture
- - Implementing change data capture
  - Initial snapshot
  - Log compaction
  - API support for change streams
- Event Sourcing
- - Deriving current state from the event log
  - Commands and events
- State, Streams, and Immutability
- - Advantages of immutable events
  - Deriving several views from the same event log
  - Concurrency control
  - Limitations of immutability

Discussion 11.3 (Pages: 464-481)

Processing Streams
- Uses of Stream Processing
- - Complex event processing
  - Stream analytics
  - Maintaining materialized views
  - Search on streams
  - Message passing and RPC
- Reasoning About Time
- - Event time versus processing time
  - Knowing when you are ready
  - Whose clock are you using, anyway?
  - Types of windows
- Stream Joins
- - Stream-stream join (window join)
  - Stream-table join (stream enrichment)
  - Table-table join (materialized view maintenance)
  - Time-dependence of joins
- Fault Tolerance
- - Microbatching and checkpointing
  - Atomic commit revisited
  - Idempotence
  - Rebuilding state after a failure
Summary

Masterclass 11.M: CDC Hands-on

Heterogeneous replication
- Debezium and CDC
- Hands-on
Use cases:
- Search → Postgres to ElasticSearch
- Caching → Postgres to redis
- Search and reporting → Postgres to MongoDB
- Warehousing → Postgres/MongoDB to BigQuery/Hadoop/Teradata/etc
Cost considerations for CDC
- Write amplification
- Network costs
- Architectural hygiene vs cost

Case Study 11.C: Real-Time Credit Card Fraud Detection

Chapter 12: The Future of Data Systems

Discussion 12.1 (Pages: 489-498)

Data Integration
- Combining Specialized Tools by Deriving Data
- - Reasoning about dataflows
  - Derived state versus distributed transactions
  - The limits of total ordering
  - Ordering events to capture causality
- Batch and Stream Processing
- - Maintaining derived state
  - Reprocessing data for application evolution
  - Schema migrations on railways
  - The lambda architecture
  - Unifying batch and stream processing

Discussion 12.2 (Pages: 499-515)

Unbundling Databases
- Composing Data Storage Technologies
- - Creating an index
  - The meta-database of everything
  - Making unbundling work
  - Unbundled versus integrated systems
  - What's missing?
- Designing Applications Around Dataflow
- - Application code as a derivation function
  - Separation of application code and state
  - Dataflow: Interplay between state changes and application code
  - Stream processor and services
- Observing Derived State
- - Materialized views and caching
  - Stateful, offline-capable clients
  - Pushing state changes to clients
  - End-to-end event streams
  - Reads are events too
  - Multi-parition data processing

Discussion 12.3 (Pages: 515-533)

Aiming for Correctness
- The End-to-End Argument for Databases
- - Exactly-once execution of an operation
  - Duplicate suppression
  - Operation identifiers
  - The end-to-end argument
  - Applying end-to-end thinking in data systems
- Enforcing Constraints
- - Uniqueness constraints require consensus
  - Uniqueness in log-based messaging
  - Multi-partition request processing
- Timeliness and Integrity
- - Correctness of dataflow systems
  - Loosely interpreted constraints
  - Coordination-avoiding data systems
- Trust, but Verify
- - Maintaining integrity in the face of software bugs
  - Don't just blindly trust what they promise
  - A culture of verification
  - Desiging for auditability
  - The end-to-end argument again
  - Tools for auditable data systems

Discussion 12.4 (Pages: 533-544)

Doing the Right Thing
- Predictive Analytics
- - Bias and discrimination
  - Responsibility and accountability
  - Feedback loops
- Privacy and Tracking
- - Surveillance
  - Consent and freedom of choice
  - Privacy and use of data
  - Data as assets and power
  - Remembering the Industrial Revolution
  - Legislation and self-regulation
Summary

Conclusion

Discussion Retro.1

Let's celebrate our collective achievement with a final look back at how far we've come together. We'll share our biggest takeaways, gather your valuable insights, and officially conclude our memorable Odyssey.

Return back to the DDIA odyssey page here.