Posts

Showing posts from November, 2024

Blood draw

[trigger warning: blood] I had my first blood draw in 13 years yesterday. The lengthy gap is not random. My last blood draw had gone horribly wrong. The last time That previous visit had been for a fasting blood draw. Until then, I'd never had issues with blood draw before. Nurses always complimented me on my veins. One of them said that I had "veins like a garden hose, they are impossible to miss." So, I felt zero fear/anxiety for the procedure. But I think I made the mistake of looking at the syringe. My blood flows fast, and blood draw goes twice as quickly for me than for my wife. I saw my bright red blood enthusiastically gushing the syringe. That was the last thing I remember.  Then came nothing. It was the void, and after an indeterminate time, came the reboot.  I literally experienced my brain booting up like an old Intel 488 computer booting up DOS. First the BIOS kicked in, it checked for available memory, scanned disk to locate the operating system, and starte...

Everything is a Transaction: Unifying Logical Concurrency Control and Physical Data Structure Maintenance in Database Management Systems

Image
This paper from CIDR'21 introduces the Deferred Action Framework (DAF). This framework aims to unify transaction control and data structure maintenance under multi-version concurrency control (MVCC) database systems, particularly for complex maintenance tasks like garbage collection and index cleanup. In MVCC systems, transactions and data maintenance tasks (like garbage collection) often interfere, requiring complex coordination. This can lead to issues in system performance and code maintainability, as physical structures (e.g., B+ trees) aren't inherently designed to handle multi-versioning. The paper proposes DAF to handle deferred actions—maintenance tasks that are queued to execute only when they no longer interfere with active transactions. DAF relies on timestamp-based ordering and executes tasks only when their actions won’t affect concurrent transactions. DAF guarantees to process actions deferred at some timestamp t only after all transactions started before t have ...

DDIA: Chp 10. Batch Processing

Image
Batch processing allows large-scale data transformations through bulk-synchronous processing. The simplicity of this approach allowed building reliable, scalable, maintainable applications with it. If you recall, "reliable-scalable-maintainable" was what we set out to learn when we began the DDIA book. This story of MapReduce starts when Google engineers realize there were a lot of repetitive tasks involved when computing over large data. These tasks often involved individually processing elements and then gathering and fusing their output. Interestingly, this bores a striking resemblance to electromechanical IBM card-sorting machines from the 1940-50s. MapReduce also got some inspiration from the map reduce operations in Lisp: (map square '(1 2 3 4)) gives us  (1 4 9 16), and (reduce + '(1 4 9 16))  gives us 30. The key innovation of Google's MapReduce framework was its ability to simplify parallel processing by abstracting away complex network communication and ...

DBSP: Automatic Incremental View Maintenance for Rich Query Languages

Image
Incremental computation represents a transformative (!) approach to data processing. Instead of recomputing everything when your input changes slightly, incremental computation aims to reuse the original output and efficiently update the results. Efficiently means performing work proportional only to input and output changes. This paper introduces DBSP, a programming language inspired by signal processing (hence the name DB-SP). DBSP is  simple, yet it offers extensive computational capabilities. With just four operators, it covers complex database queries, including entire relational algebra, set and multiset computations, nested relations, aggregations, recursive queries, and streaming computations. Basic DBSP operators  The language is designed to capture computation on streams. Streams are represented as infinite vectors indexed by consecutive time. Each stream can represent values of any type and incorporates basic mathematical operations like addition, subtraction, and ...

UB Hacking 2024

I attended the University at Buffalo Hacking event over the weekend. It was fun. There were 90+ projects, I judged 15 projects. There were some interesting talks as well. It was good to see youth energy. It feels good to teach next generation something. Another thing,  GeoGuessr played as a group game under time pressure is a lot of fun. This may be a great family activity. Why should you care about proofs, if all you want to do is coding? Atri and Andrew decided this would be a good talk to give at a hackathon. Daring! They did a good job imparting their passion about the joys and benefits of mathematical thinking. They talked about Paul Erdos 's the book of proofs concept, and the difference between a  correct proof versus a great proof from "the book". They talked about the deep insight that you can achieve through an abstract mathematical thinking. They also mentioned that if you don't have good insight to the problem or your program, you will have a hard time de...

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book