Posts

Showing posts from November, 2024

DDIA: Chp 10. Batch Processing

Image
Batch processing allows large-scale data transformations through bulk-synchronous processing. The simplicity of this approach allowed building reliable, scalable, maintainable applications with it. If you recall, "reliable-scalable-maintainable" was what we set out to learn when we began the DDIA book. This story of MapReduce starts when Google engineers realize there were a lot of repetitive tasks involved when computing over large data. These tasks often involved individually processing elements and then gathering and fusing their output. Interestingly, this bores a striking resemblance to electromechanical IBM card-sorting machines from the 1940-50s. MapReduce also got some inspiration from the map reduce operations in Lisp: (map square '(1 2 3 4)) gives us  (1 4 9 16), and (reduce + '(1 4 9 16))  gives us 30. The key innovation of Google's MapReduce framework was its ability to simplify parallel processing by abstracting away complex network communication and

DBSP: Automatic Incremental View Maintenance for Rich Query Languages

Image
Incremental computation represents a transformative (!) approach to data processing. Instead of recomputing everything when your input changes slightly, incremental computation aims to reuse the original output and efficiently update the results. Efficiently means performing work proportional only to input and output changes. This paper introduces DBSP, a programming language inspired by signal processing (hence the name DB-SP). DBSP is  simple, yet it offers extensive computational capabilities. With just four operators, it covers complex database queries, including entire relational algebra, set and multiset computations, nested relations, aggregations, recursive queries, and streaming computations. Basic DBSP operators  The language is designed to capture computation on streams. Streams are represented as infinite vectors indexed by consecutive time. Each stream can represent values of any type and incorporates basic mathematical operations like addition, subtraction, and a zero el

UB Hacking 2024

I attended the University at Buffalo Hacking event over the weekend. It was fun. There were 90+ projects, I judged 15 projects. There were some interesting talks as well. It was good to see youth energy. It feels good to teach next generation something. Another thing,  GeoGuessr played as a group game under time pressure is a lot of fun. This may be a great family activity. Why should you care about proofs, if all you want to do is coding? Atri and Andrew decided this would be a good talk to give at a hackathon. Daring! They did a good job imparting their passion about the joys and benefits of mathematical thinking. They talked about Paul Erdos 's the book of proofs concept, and the difference between a  correct proof versus a great proof from "the book". They talked about the deep insight that you can achieve through an abstract mathematical thinking. They also mentioned that if you don't have good insight to the problem or your program, you will have a hard time de

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book