Making database systems usable

C. J. Date's Sigmod 1983 keynote, "Database Usability", was prescient. Usability is the most important thing to the customers. They care less about impressive benchmarks or clever algorithms, and more about whether they can operate and use a database efficiently to query, update, analyze, and persist their data with minimal headache. (BTW, does anyone have a link to the contents of this Sigmod'83 talk? There is no transcript around, except for this short abstract.)

The paper we cover today is from Sigmod 2007. It takes on the database usability problem raised in that 1983 keynote head-on, and calls out that the king is still naked. 

Let's give some context for the year 2007. Yes, XML format was still popular then. The use-case in the paper is XQuery. The paper does not contain any  reference to json. MongoDB would be released in 2009 with the document model; and that seems to be great timing for some of the usability pains mentioned in the paper! Web 2.0 was in full swing, social media was rising. Cloud computing and big data was emerging. NoSQL movement was just starting.


Database usability is an important but wicked problem

Why is database usability important? "In places where database systems are used extensively, we find an army of database administrators, consultants, and other technical experts all busily helping users get data into and out of a database. For almost all organizations, the indirect cost of maintaining a technical support team far exceeds the direct cost of hardware infrastructure and database product licenses. Not only are support staff expensive, they also interpose themselves between the users and the databases. Users cannot interact with the database directly and are therefore less likely to try less straightforward operations. This hidden opportunity cost may be greater than the visible costs of hardware/software and technical staff."

This is spot on. And the paper follows up on this with flight booking example. You youngins may not remember, but pre-2005 we had to call a travel agent to book our flights and get paper tickets in return. This sucked, we don't have any transparency in to the process, we couldn't explore options (price, convenience, dates, airports) and customize our trip. Having access to flight booking via web was really a great improvement for user experience. 

For flight booking, slapping a web-search interface to the flight databases solved the problem, but you still needed the database including its querying engine for this solution. But things are not that easy for operationalizing databases. It is all about how you manage expectations when you are serving the customer (as I wrote before). The paper compares the database problem with the web search problem to make this point. Users expect the ability to query the database in a more sophisticated way than text based web-querying. Secondly, users expect more precise and complete answers from database search. Finally, users expect to create and update databases, not just query them in a read-only manner.

The big takeaway for me is that database usability is a wicked problem. This is a technical term, look it up. Maybe by defining it well, we can tame this a bit, and downgrade it from the wicked problem to the messy problem category. 


Case study: MIMI protein interaction database system

After the sequencing of the human genome was completed, biologists began focusing their attention on the proteins expressed by these genes, their interactions, and their functions. This team build MIMI to facilitate this, and collaborated with University of Michigan biologists throughout a multi-year project.

"Given the XML representation of the data, XQuery was our first choice for accessing the database. Indeed, some users wanted the power of a declarative query language, even if they didn’t have the training to write such queries. A majority of users, however, were complete technophobes and preferred forms-based interfaces. (Such interfaces do a good job today for specific applications—quite complex back-end queries can be run, for instance in an airline reservations database, while the user is shielded from this complexity by a simple form-based query interface.) Aside from these were a few users who wanted to download the entire dataset and write Perl scripts to slice and dice it. Our challenge in MiMI was to provide easy-to-use interfaces beyond a few hand- designed forms for some common queries. In fact, MiMI allows users to access data through various interfaces, which are depicted in Figure 1."

This sounds good on paper right? But when you have users, you run into a whole bunch of problems you haven't anticipated. The paper lists some of these anecdotes, and writes: "As we analyze our accomplishments, and more impor- tantly, the many remaining issues described above, we have come to realize that the usability of a database system is much more than skin deep. Our work on query interfaces may contribute towards the usability of a system, but they are far from enough to provide the optimal user experience. In the next section, we enunciate what we believe are the major database usability problems."

The title of the next section is "The persistence of pain".

The subsection titles are even funnier. Painful relations (this takes the prime spot!), Painful options, Unexpected pain, Unseen pain, and Birthing pain.


Painful relations

In this subsection, the paper complains about the relational model.

"Whereas a single table of data is natural for most peo- ple, joins between multiple tables are not. Unfortunately, normalization is at the center of relational design. Indeed normalization saves space, avoids update anomalies, and is a desirable property from many perspectives. However, the use of joins in a relational model does not retain the integrity of data objects that a user regards as one unit."


The paper gets back to the flight example, but this time from the POV of the backend database in Fig 2. It says that "Such “splattering” of data decreases the usability of the database in terms of schema comprehension, join computation, and query expression." 

Actually Figure 2 doesn't look that bad, but things devolve from there. "The users will have to stitch the information back together to answer most of the real queries. The fundamental issue is that joins destroy the connections between information pertaining to the same real world entities and are nonintuitive to most normal users. We note that many commercial database systems carefully denormalize their schema to reduce the number of joins required, although the purpose there is to speed up query evaluation. Finally, queries become painful to express across multiple tables. Because joins innately disrupt data cohesion, such queries are problematic for many users. For example, consider a query as simple as “Find all flights from Detroit to Beijing” in our airline database. Even though we are interested only in information about flights, the city names that specify the selection predicate are found only in the airports relation, which must be joined twice with flight info to express our query."

The paper mentions that: "The current solution to manage this pain is to hire database administrators (DBAs) and offer them copious amounts of money not to leave once they have learned the company’s database schema well."


The other pains

Painful options: Too many options overwhelms users. While simplicity is crucial for novices, experts require advanced tools. The challenge lies in balancing functionality for diverse user needs without cluttering the interface.

Unexpected pains: Systems often produce confusing results without explanation. Users need clear insights into why outcomes differ from their expectations. This requires transparent system behavior, explainability, and intuitive error messaging.

Unseen pain: The lack of WYSIWYG (What You See Is What You Get) interfaces makes query formulation difficult. Real-time, predictive capabilities during query construction help users refine searches effectively. This involves instant feedback and suggestions as users type.

Birthing pain: Creating and populating databases is complex for ordinary users. They struggle with schema design and data reorganization for existing structures. Simplifying these processes without sacrificing database integrity is crucial.


The painless future

There is no silver bullet. Of course, you knew this, because database usability is a wicked problem. The paper makes some suggestions, and leaves it at that. 

Presentation data models: Expand the concept of views beyond tables to support various presentation data models (object, geographic, network, multidimensional, tabular). This flexibility would allow users to interact with data in formats most suitable for their needs.

Integrated Provenance: Incorporate data provenance (both "why" and "where") into presentation models. This feature would help users understand the origin and transformations of their data, enhancing transparency and trust in results.

Intuitive Interaction: Develop presentation data models capable of direct manipulation through familiar actions like point-and-click and drag-and-drop. This approach would make databases more accessible to non-technical users, reducing the learning curve.

Flexible Schema Design: Support "schema-later" and "heterogeneous" database creation, allowing users to start with unstructured data and add structure as needed. This flexibility would enable faster database setup and evolution as requirements change.


Discussion

Ok, it is 17 years after the publication of this paper. Where are we at in terms of database usability? What were the biggest improvements you can think of? What were the steps taken to reduce the friction?

The Designing Data Intensive Applications Book (2017) goes into usability/maintainability discussion a bit in Chapters 1 and 2. It compares relational model and document model. It also talks about Object-relational mapping (ORM) frameworks a bit. But it seems like we are still long way from making databases usable. 

If nothing else, since we had to make database systems also distributed systems we added to these problems, especially for the operationalization of databases. We opened Pandora's box in terms of partial failures, unexplained performance problems due to geo-distribution, data migration, storage disaggregation, and metastable failures. Well we have our work cut out for us. 

Well, let's look at this more optimistically and consider recent developments that may be beneficial. How about LLMs? LLMs may help, but if not used intentionally in a principled manner, they make more of a mess. They would be adding to the painful options, unexpected pains. They probably can't do much to address the unseen pain, and maybe do a probabilistic (best effort) job in addressing birthing pain, which may make things worse. But applied in a principled way, I believe LLMs can go a long way to improve database usability. It is someone else's paper to write, and when this is written, I will be interested in learning from it. 

Comments

Anonymous said…
https://dl.acm.org/doi/10.1145/971695.582194 C.J.Date's paper

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Looming Liability Machines (LLMs)

Foundational distributed systems papers

Advice to the young

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Scalable OLTP in the Cloud: What’s the BIG DEAL?

Designing Data Intensive Applications (DDIA) Book