π Detailed List of Sessions for DDIA Book Reading Odyssey
Return back to the DDIA odyssey page here.
Please note, this page lists only recommended sessions. The optional accelerator sessions keep happening regularly, and are not listed here.
Introduction and Kick-off
Session Intro.1: What is the goal of reading?
Blooms taxonomy
Why do YOU want to read DDIA?
Power of abstractions and jargons
What does the book have to offer?
Session Intro.2: Design Your Reading System
Chapter 1: Reliable, Scalable, and Maintainable Applications
Discussion 1.1 (Pages: 3-13)
- Thinking about data systems
- Reliability
- Hardware Faults
- Software Errors
- Human Errors
- How Important Is Reliability?
- Scalability
Discussion 1.2 (Pages: 13-23)
- Scalability
- Describing Performance
- Approaches for Coping with Load
- Maintainability
- Operability: Making Life Easy for Operations
- Simplicity: Managing Complexity
- Evolvability: Making Change Easy
- Summary
Masterclass 1.M: Intuitive view of system performance
- Basic building blocks of a system
- Simple view of a large complex system
- Breaking a system down into components
- Dealing with confusing numbers
- Fundamental laws of system performance
- Utilization law
- Little's law
- Response time law
- Forced flow law
- Flow balance assumption
- Understanding wait time vs service time vs response time
- Examples
- Deeper view of web-server and DB
- Performance of cache
- Performance with retries
- Performance with distributions (and not constants)
- Describing performance of a system
- Describing performance (scalability)
- Describing availability (reliability)
- Bottlenecks
- Solving for just correctness (and not scale)
- Finding the bottlenecks
- Focusing on bottlenecks and solving for them
Case Study 1.C: Intuitive view of your production system
- Build an intuitive model for your production system
- Discuss if you aren't able to solve some part of it
- Show and see the skeleton of systems trivially
Chapter 2: Data Models and Query Languages
Discussion 2.1 (Pages: 27-48)
- Relational Model Versus Document Model
- The Birth of NoSQL
- The Object-Relational Mismatch
- Many-to-One and Many-to-Many Relationships
- Are Document Databases Repeating History?
- The network model
- The relational model
- Comparison to document databases
- Relational Versus Document Databases Today
- Schema flexibility in the document model
- Data locality for queries
- Convergence of document and relational databases
- Query Languages for Data
- Declarative Queries on the Web
- MapReduce Querying
Discussion 2.2 (Pages: 49-64)
- Graph-Like Data Models
- Property Graphs
- The Cypher Query Language
- Graph Queries in SQL
- Triple-Stores and SPARQL
- The semantic web
- The RDF data model
- The SPARQL query language
- Graph databases compared to the network model
- The Foundation: Datalog
- Summary
Masterclass 2.M: Schema considerations for query performance
Relational DB: Normalize vs denormalize
Document DB: Nest vs Separate
Data Warehouses: Schema considerations
Time series DB: Schema considerations
Case Study 2.C: Cost considerations for DBs
- Pre-provisioned large capacity
- Auto-expanding disks
- Needless replicas
- Avoiding provisioning for peaks
Chapter 3: Storage and Retrieval
Discussion 3.1 (Pages: 69-79)
- Data Structures That Power Your Database
- Hash Indexes
- SSTables and LSM-Trees
- Constructing and maintaining SSTables
- Making an LSM-tree out of SSTables
- Performance optimizations
Discussion 3.2 (Pages: 79-90)
- Data Structures That Power Your Database
- B-Trees
- Making B-trees reliable
- B-tree optimizations
- Comparing B-Trees and LSM-Trees
- Advantages of LSM-trees
- Downsides of LSM-trees
- Other Indexing Structures
- Storing values within the index
- Multi-column indexes
- Full-text search and fuzzy indexes
- Keeping everything in memory
Discussion 3.3 (Pages: 90-104)
- Transaction Processing or Analytics?
- Data Warehousing
- The divergence between OLTP databases and data warehouses
- Stars and Snowflakes: Schemas for Analytics
- Column-Oriented Storage
- Column Compression
- Sort Order in Column Storage
- Several different sort orders
- Writing to Column-Oriented Storage
- Aggregation: Data Cubes and Materialized Views
- Summary
Masterclass 3.M: Indexes!
- Hash indexes
- GIN
- BRIN
- Full text indexes
- 2dsphere
- Hands-on!
Case Study 3.C: Production issues due to DB slowdown
- Slow query analysis: A primer
- Solve 5 simulated production issues
Chapter 4: Encoding and Evolution
Discussion 4.1 (Pages: 111-128)
- Formats for Encoding Data
- Language-Specific Formats
- JSON, XML, and Binary Variants
- Thrift and Protocol Buffers
- Field tags and schema evolution
- Datatypes and schema evolution
- Avro
- The writer's schema and the reader's schema
- Schema evolution rules
- But what is the writer's schema?
- Dynamically generated schemas
- Code generation and dynamically typed languages
- The Merits of Schemas
Discussion 4.2 (Pages: 128-140)
- Modes of Dataflow
- Dataflow Through Databases
- Different values written at different times
- Archival storage
- Dataflow Through Services: REST and RPC
- Web services
- The problems with remote procedure calls (RPCs)
- Current directions for RPC
- Data encoding and evolution for RPC
- Message-Passing Dataflow
- Message brokers
- Distributed actor frameworks
- Summary
Masterclass 4.M: Zero Downtime Deployments (ZDD)
Easy: Backward compatible rollouts
Blue green deployments
Rolling deployments
Hard: Backward incompatible rollouts: through backward compatibility layers
What canβt be done in a zero downtime manner?
Examples
API changes
DB schema changes
Cache schema changes
Queue changes
Case Study 4.C: ZDD in your production projects
- Talk about interesting variations of system evolution
- Talk about ugly downtime situations that you think could be avoided with ZDD
- Challenges that you see in getting ZDD implemented
Chapter 5: Replication
Discussion 5.1 (Pages: 151-161)
- Leaders and Followers
- Synchronous Versus Asynchronous Replication
- Setting Up New Followers
- Handling Node Outages
- Follower failure: Catch-up recovery
- Leader failure: Failover
- Implementation of Replication Logs
- Statement-based replication
- Write-ahead log (WAL) shipping
- Logical (row-based) log replication
- Trigger-based replication
Discussion 5.2 (Pages: 161-167)
- Problems with Replication Lag
- Reading Your Own Writes
- Monotonic Reads
- Consistent Prefix Reads
- Solutions for Replication Lag
Discussion 5.3 (Pages: 168-177)
- Multi-Leader Replication
- Use Cases for Multi-Leader Replication
- Multi-datacenter operation
- Clients with offline operation
- Collaborative editing
- Handling Write Conflicts
- Synchronous versus asynchronous conflict detection
- Conflict avoidance
- Converging toward a consistent state
- Custom conflict resolution logic
- Atomic conflict resolution
- What is a conflict?
- Multi-Leader Replication Topologies
Discussion 5.4 (Pages: 177-193)
- Leaderless Replication
- Writing to the Database When a Node Is Down
- Read repair and anti-entropy
- Quorums for reading and writing
- Limitations of Quorum Consistency
- Sloppy Quorums and Hinted Handoff
- Multi-datacenter operation
- Detecting Concurrent Writes
- Last write wins (discarding concurrent writes)
- The "happens-before" relationship and concurrency
- Concurrency, Time, and Relativity
- Capturing the happens-before relationship
- Merging concurrently written values
- Version vectors
- Summary
Masterclass 5.M: Replication hands-on with Postgres and MongoDB
Case Study 5.C: Google Docs (real-time collaborative editor)
Chapter 6: Partitioning
Discussion 6.1 (Pages: 199-209)
- Partitioning and Replication
- Partitioning of Key-Value Data
- Partitioning by Key Range
- Partitioning by Hash of Key
- Skewed Workloads and Relieving Hot Spots
- Partitioning and Secondary Indexes
- Partitioning Secondary Indexes by Document
- Partitioning Secondary Indexes by Term
Discussion 6.2 (Pages: 209-218)
- Rebalancing Partitions
- Strategies for Rebalancing
- How not to do it: hash mod N
- Fixed number of partitions
- Dynamic partitioning
- Partitioning proportionally to nodes
- Operations: Automatic or Manual Rebalancing
- Request Routing
- Summary
Masterclass 6.M: Partitioning considerations in real-world
- Triggers for partitioning
- Selecting partitioning key
- Online schema changes
- Versioned schemas for DBs
- Versioned schemas for queues
Case Study 6.C: Scaling for eCommerce flash sale
Chapter 7: Transactions
Discussion 7.1 (Pages: 221-232)
- The Slippery Concept of a Transaction
- The Meaning of ACID
- Atomicity
- Consistency
- Isolation
- Durability
- Replication and durability
- Single-Object and Multi-Object Operations
- Single-object writes
- The need for multi-object transactions
- Handling errors and aborts
Discussion 7.2 (Pages: 233-242)
- Weak Isolation Levels
- Read Committed
- No dirty reads
- No dirty writes
- Implementing read committed
- Snapshot Isolation and Repeatable Read
- Implementing snapshot isolation
- Visibility rules for observing a consistent snapshot
- Indexes and snapshot isolation
- Repeatable read and naming confusion
Discussion 7.3 (Pages: 242-251)
- Weak Isolation Levels
- Preventing Lost Updates
- Atomic write operations
- Explicit locking
- Automatically detecting lost updates
- Compare-and-set
- Conflict resolution and replication
- Write Skew and Phantoms
- Characterizing write skew
- More examples of write skew
- Phantoms causing write skew
- Materializing conflicts
Discussion 7.4 (Pages: 251-267)
- Serializability
- Actual Serial Execution
- Encapsulating transactions in stored procedures
- Pros and cons of stored procedures
- Partitioning
- Summary of serial execution
- Two-Phase Locking (2PL)
- Implementation of two-phase locking
- Performance of two-phase locking
- Predicate locks
- Index-range locks
- Serializable Snapshot Isolation (SSI)
- Pessimistic versus optimistic concurrency control
- Decisions based on an outdated premise
- Detecting stale MVCC reads
- Detecting writes that affect prior reads
- Performance of serializable snapshot isolation
- Summary
Masterclass 7.M: MVCC and SSI Implementation Details
- MVCC and SSI: Recap
- Construct a design for MVCC
- MVCC in Postgresql
- Row storage
- Versioning: transaction id
- Visibility rules for transactions
- Vaccumming
- SSI in Postgresql
- MVCC in MySQL
Case Study 7.C: Scaling for eCommerce flash sale
Chapter 8: The Trouble with Distributed Systems
Discussion 8.1 (Pages: 273-287)
- Faults and Partial Failures
- Cloud Computing and Supercomputing
- Unreliable Networks
- Network Faults in Practice
- Detecting Faults
- Timeouts and Unbounded Delays
- Network congestion and queueing
- TCP vs UDP
- Synchronous Versus Asynchronous Networks
- Can we not simply make network delays predictable?
- Latency and Resource Utilization
Discussion 8.2 (Pages: 287-299)
- Unreliable Clocks
- Monotonic Versus Time-of-Day Clocks
- Time-of-day clocks
- Monotonic clocks
- Clock Synchronization and Accuracy
- Relying on Synchronized Clocks
- Timestamps for ordering events
- Clock readings have a confidence interval
- Synchronized clocks for global snapshots
- Process Pauses
- Response time guarantees
- Limiting the impact of garbage collection
Discussion 8.3 (Pages: 300-312)
- Knowledge, Truth, and Lies
- The Truth Is Defined by the Majority
- The leader and the lock
- Fencing tokens
- Byzantine Faults
- System Model and Reality
- Correctness of an algorithm
- Safety and liveness
- Mapping system models to the real world
- Summary
Masterclass 8.M: Mental models for resiliency in production systems
- Embrace Failure (Assume unreliability)
- Timeouts and its antipatterns
- Retries and its antipatterns
- Circuit breakers: what it is
- Circuit breakers: antipatterns
- NTP sync for clocks
- Isolate and contain (Limit blast radius)
- Avoid cascading failure
- Bulkhead pattern
- Async communication
- Maintain consistency (Despite uncertainty)
- Safe retries: what
- Safe retries: how to achieve idempotency?
- HTTP methods
- Idempotency keys
- Resource versionings / ETags
- Database constraints
- Transactional outbox pattern
- State machines
- Ordering events for debugging
- Observe and adapt (Know your system)
- Debugging and diagnosis through observability
- Validating resilience (chaos engineering)
- Learning from failures
- incident RCAs
- Blameless postmortems
- Define your boundaries and contracts (Manage dependencies)
- Inter-service communication
- API contracts
- API versioning
Case Study 8.C: Complex production incident triggered by payment-gateway scaling issues in an eCommerce platform
Chapter 9: Consistency and Consensus
Discussion 9.1 (Pages: 321-338)
- Consistency Guarantees
- Linearizability
- What Makes a System Linearizable?
- Linearizability vs Serializability
- Relying on Linearizability
- Locking and leader election
- Constraints and uniqueness guarantees
- Cross-channel timing dependencies
- Implementing Linearizable Systems
- Linearizability and quorums
- The Cost of Linearizability
- The CAP theorem
- The unhelpful CAP Theorem
- Linearizability and network delays
Discussion 9.2 (Pages: 339-352)
- Ordering Guarantees
- Ordering and Causality
- The causal order is not a total order
- Linearizability is stronger than causal consistency
- Capturing causal dependencies
- Sequence Number Ordering
- Noncausal sequence number generators
- Lamport timestamps
- Timestamp ordering is not sufficient
- Total Order Broadcast
- Using total order broadcast
- Implementing linearizable storage using total order broadcast
- Implementing total order broadcast using linearizable storage
Discussion 9.3 (Pages: 352-364)
- Distributed Transactions and Consensus
- The impossibility of consensus
- Atomic Commit and Two-Phase Commit (2PC)
- From single-node to distributed atomic commit
- Introduction to two-phase commit
- Don't confuse 2PC and 2PL
- A system of promises
- Coordinator failure
- Three-phase commit
- Distributed Transactions in Practice
- Exactly-once message processing
- XA transactions
- Holding locks while in doubt
- Recovering from coordinator failure
- Limitations of distributed transactions
Discussion 9.4 (Pages: 364-375)
- Distributed Transactions and Consensus
- Fault-Tolerant Consensus
- Consensus algorithms and total order broadcast
- Single-leader replication and consensus
- Epoch numbering and quorums
- Limitations of consensus
- Membership and Coordination Services
- Allocating work to nodes
- Service discovery
- Membership services
- Summary
Masterclass 9.M: Details of Paxos and Raft
- Abstraction for fault tolerance: Consensus
- Consensus vs Consistency
- Fault tolerant consensus
- Paxos deep-dive
- Raft deep-dive
Case Study 9.C: MongoDB down in production
Chapter 10: Batch Processing
Discussion 10.1 (Pages: 389-397)
- Batch Processing with Unix Tools
- Simple Log Analysis
- Chain of commands versus custom program
- Sorting versus in-memory aggregation
- The Unix Philosophy
- A uniform interface
- Separation of logic and wiring
- Transparency and experimentation
Discussion 10.2 (Pages: 397-418)
- MapReduce and Distributed Filesystems
- MapReduce Job Execution
- Distributed execution of MapReduce
- MapReduce workflows
- Reduce-Side Joins and Grouping
- Example: analysis of user activity events
- Sort-merge joins
- Bringing related data together in the same place
- Group by
- Handling skew
- Map-Side Joins
- Broadcast hash joins
- Partitioned hash joins
- Map-side merge joins
- MapReduce workflows with map-side joins
- The Output of Batch Workflows
- Building search indexes
- Key-value stores as batch process output
- Philosophy of batch process outputs
- Comparing Hadoop to Distributed Databases
- Diversity of storage
- Diversity of processing models
- Designing for frequent faults
Discussion 10.3 (Pages: 418-431)
- Beyond MapReduce
- Materialization of Intermediate State
- Dataflow engines
- Fault tolerance
- Discussion of materialization
- Graphs and Iterative Processing
- The Pregel processing model
- Fault tolerance
- Parallel execution
- High-Level APIs and Languages
- The move toward declarative query languages
- Specialization for different domains
- Summary
Masterclass 10.M: Unix Shell-Scripting Internals
Case Study 10.C: Hands-on Apache Airflow
Chapter 11: Stream Processing
Discussion 11.1 (Pages: 439-451)
- Transmitting Event Streams
- Messaging Systems
- Direct messaging from producers to consumers
- Message brokers
- Message brokers compared to databases
- Multiple consumers
- Acknowledgements and redelivery
- Partitioned Logs
- Using logs for message storage
- Logs compared to traditional messaging
- Consumer offsets
- Disk space usage
- When consumers cannot keep up with producers
- Replaying old messages
Discussion 11.2 (Pages: 451-464)
- Databases and Streams
- Keeping Systems in Sync
- Change Data Capture
- Implementing change data capture
- Initial snapshot
- Log compaction
- API support for change streams
- Event Sourcing
- Deriving current state from the event log
- Commands and events
- State, Streams, and Immutability
- Advantages of immutable events
- Deriving several views from the same event log
- Concurrency control
- Limitations of immutability
Discussion 11.3 (Pages: 464-481)
- Processing Streams
- Uses of Stream Processing
- Complex event processing
- Stream analytics
- Maintaining materialized views
- Search on streams
- Message passing and RPC
- Reasoning About Time
- Event time versus processing time
- Knowing when you are ready
- Whose clock are you using, anyway?
- Types of windows
- Stream Joins
- Stream-stream join (window join)
- Stream-table join (stream enrichment)
- Table-table join (materialized view maintenance)
- Time-dependence of joins
- Fault Tolerance
- Microbatching and checkpointing
- Atomic commit revisited
- Idempotence
- Rebuilding state after a failure
- Summary
Masterclass 11.M: CDC Hands-on
Case Study 11.C: Real-Time Credit Card Fraud Detection
Chapter 12: The Future of Data Systems
Discussion 12.1 (Pages: 489-498)
- Data Integration
- Combining Specialized Tools by Deriving Data
- Reasoning about dataflows
- Derived state versus distributed transactions
- The limits of total ordering
- Ordering events to capture causality
- Batch and Stream Processing
- Maintaining derived state
- Reprocessing data for application evolution
- Schema migrations on railways
- The lambda architecture
- Unifying batch and stream processing
Discussion 12.2 (Pages: 499-515)
- Unbundling Databases
- Composing Data Storage Technologies
- Creating an index
- The meta-database of everything
- Making unbundling work
- Unbundled versus integrated systems
- What's missing?
- Designing Applications Around Dataflow
- Application code as a derivation function
- Separation of application code and state
- Dataflow: Interplay between state changes and application code
- Stream processor and services
- Observing Derived State
- Materialized views and caching
- Stateful, offline-capable clients
- Pushing state changes to clients
- End-to-end event streams
- Reads are events too
- Multi-parition data processing
Discussion 12.3 (Pages: 515-533)
- Aiming for Correctness
- The End-to-End Argument for Databases
- Exactly-once execution of an operation
- Duplicate suppression
- Operation identifiers
- The end-to-end argument
- Applying end-to-end thinking in data systems
- Enforcing Constraints
- Uniqueness constraints require consensus
- Uniqueness in log-based messaging
- Multi-partition request processing
- Timeliness and Integrity
- Correctness of dataflow systems
- Loosely interpreted constraints
- Coordination-avoiding data systems
- Trust, but Verify
- Maintaining integrity in the face of software bugs
- Don't just blindly trust what they promise
- A culture of verification
- Desiging for auditability
- The end-to-end argument again
- Tools for auditable data systems
Discussion 12.4 (Pages: 533-544)
- Doing the Right Thing
- Predictive Analytics
- Bias and discrimination
- Responsibility and accountability
- Feedback loops
- Privacy and Tracking
- Surveillance
- Consent and freedom of choice
- Privacy and use of data
- Data as assets and power
- Remembering the Industrial Revolution
- Legislation and self-regulation
- Summary
Conclusion
Discussion Retro.1
Let's celebrate our collective achievement with a final look back at how far we've come together. We'll share our biggest takeaways, gather your valuable insights, and officially conclude our memorable Odyssey.
Return back to the DDIA odyssey page here.