OSDI2022, continued

Venue, logistics, and travel

The conference was held at the Omni La Costa Resort at Carlsbad CA. Carlsbad is 30 miles from San Diego, still an easy to access place compared to earlier SOSP/OSDI venues. This didn't stop my Lyft driver to complain; he asked me why they are not holding the conference in one of the ample city venues. I told him, this is part of the tradition.

The resort was nice and clean. It had splash pools, which was great for people traveling with kids. (I, of course, tried the water slides, they were fun.) The resort also had a huge golf course, around which I ran in the early mornings with my body clock still on eastern time.

San Diego is a very exotic place 15 miles from the border with Tijuana Mexico. There are palm trees everywhere. Just seeing the palm trees trigger a visceral relaxing reflex, as confirmed with several other friends. We started calling palm trees, log-structured trees, compared to the regular, run of the mill binary trees we see in the north.



The conference enforced strict vaccination and masking rules. Everyone is checked for vaccination cards. This was the first time someone checked my vaccination card. Everyone masked in the meeting rooms. Having breakfast, lunch, and breaks in open air at the foyer was nice. The food was good, but the coffee was terrible.

My flights was with Delta, through the Detroit hop. All my flights were punctual, knock on wood. The masking rates at the flights dropped to 1/20th, the same rate as in supermarkets in Buffalo. When I flew to Seattle a month ago, the masking rate at the flight was close to 1/3rd. Looks like people are giving up on masking, but Covid spread started to climb again.

Monday and Tuesday was busy all day with sessions, breaks, and evening programs at the conference. But we still found some time to do extra sightseeing on Monday night and Wednesday afternoon. AbuTalib had a rental car, and took Aleksey and me for excursions around, including a 10pm ocean beach visit at Carlsbad Monday night, La Jolla beach walk Wednesday afternoon, and dinners on Sunday and Wednesday at a nice/underrated Chinese restaurant.

Some statistics from the conference

253 papers were submitted to OSDI, out of which 49 was accepted, giving 19.4% acceptance ratio. This was down from record 400 submissions in OSDI 2020.

253 submissions led to 1130 reviews produced, 2900 online program committee (PC) discussion comments, and 3-days of online PC meeting. Marcos and Hakim, the PC chairs, included a nice cost calculation for the reviews, first time I am seeing this done. The reviewing for OSDI 22 alone cost a total of over \$1 million. This was calculated by assigning an average of 4.5 hours per review, \$200 per hour salary, which results to a cost of average  \$4K per submission, with 3-5 reviews for each paper. I was on the program committee and can attest that reviewing takes a long time out from an already very busy schedule. This \$1 million is of course a hidden cost: no actual money exchanged hands, the institutions of the PC members are left with absorbing this cost. The monetary cost being hidden doesn't mean that it should be ignored, kudos to Marcos and Hakim for drawing attention to this. I don't know if there could be a more sustainable model designed going forward. This cost did not even take into account the artefact reviewing effort for checking reproducability for the experiments in the papers (27 out of 49 papers got certificates).

Jay Lepreau best paper awards went to the following papers:

OSDI 2023 will be in Boston, colocated with ATC again. The submission deadline is in December.


Trustworthy open source: the consequences of success (Keynote from Eric Brewer)


Eric is the VP of infrastructure at Google. He talked about what has been keeping busy and concerned: the security problems of opensource software. Opensource packages, such as linux, openssl, llvm, kubernetes, are widely used in industy and government, and also part of critical infrastructure for electrical grids, water supplies, oil pipelines, telecommunications, and
mobile networked devices.

It is no secret that opensource software has security problems. This is also the case with closed source software. It has been said that many eyeballs look at opensource, so it would be less prone to bugs and security problems.
eyes on openballs are not always on the right things. 30% of packages have a single maintainer and sometimes packages have zero eyes looking at issues. Some examples are

  • eventstreams.js (with 8 million donwloads/week): uninterested owner unknowingly handed it down to an attacker who changed the software to do bitcoin mining
  • us-parser.js: hackers stole maintainer credentials
  • leftpad: the owner removed the package, broke many things. This package (used for leftpadding characters to the beginning of strings) is only 11 lines, but was getting 2m downloads/week
  • colors.js: changed in protest of invasion of Ukraine, deleted files if in Russia


The problem is these packages have been optimized for ease of use over accountability. 90% of vulnerabilities are in your dependencies and it is not unusual to have 100s of dependencies. As in the case of log4j, which is used in 8% of java packages (maven), this dependency is ingrained often deep in the stack of projects, requiring a fix to everything on the path. It is also possible to get stuck on older versions (semantic changes) then can't easily updated. To identify dependency problems, Eric mentioned of the opensource tool they developed, https://deps.dev, which helps build accurate transitive dependency graphs for projects, and comes with web interface and bigquery support.

Opensource is free but it is provided as is. You can't send the developers, who are often hobbyist and volunteers a legal fix letter and demand fixing. (It turns out that is exactly what one company did to the log4j open source maintainers.)

Eric said it is important to separate the two roles: distribution vs accountability. The package manager is doing the distribution, but accountability is about who is going to fix it if it is broke. He proposed to add a curation layer for addressing accountability. This could be like a support model like redhat. One way for curation model is to pay maintainer for support. If the developers are not interested in support, others could be hired for curation. Given how important this role is for security of critical infrastructure, it is important to put a curation model in place. Eric mentioned about a Biden executive order and some regulations that would be coming with it.

Eric then talked about the two phases of supply-chain attack: (1) create an attack factory, (2) use those assets. He mentioned solarwinds as an example, but focused on the Codecov (opensource codecoverage package) example. A malicious actor got credentials from Mercari, Japanese e-commerce retailer, using leaked credentials in a docker container, and then modified codecov bash uploader using a single line addition. This line copied your environment variables to the attackers collection site. The credentials/secrets are passed in as environment variables. The attack farmed for credentials for 2 months, and then with credentials from mercari, hashicorp, accessed to their github repos. They could see anything in Mercari's private github repos, scan logs and other things.

Eric mention that Google addresses supply-chain problems internally by leveraging Google internal tools/practices, such monorepo, universal libraries, single trusted build systems, proof that code review happened. But this solution is limited to Google. He mentioned helpful efforts such as sustainable trustworthy opensource and OpenSSF.

Comments

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book