Deep reading

Twenty years ago, a well-known professor in computer networking field told me that he reviews any paper in 30 minutes. Not just read the paper, but also write the review, mind you. All in 30 minutes!

I said "I am slow it takes me 4 hours to read a paper". I lied. It actually took me 8+ hours to read papers, because I was a graduate student and didn't have much background and paper reading experience.

Things have improved, but it still takes me 4-8 hours to read a paper and understand everything so I can write an informative conference review or blog post about the paper. (Of course, I am talking about good research papers, not content-free publication for the sake of publication papers.)

Maybe because of that early encounter with that flamboyant networking professor, I always felt I am very slow in reading a paper. I checked with fellow professors who read 100+ papers a year. They also tell me that it takes them 4-8 hours. And if the paper is involved, and utilizes pieces from different domains, or has some unfamiliar flavors, it would take much more than that.

I am writing this because I think more people should speak up about how time consuming it is to read a paper.

Skimming versus reading

A lot of people responded to my Twitter poll saying that "reading a paper" is undefined. Maybe I should have clarified saying reading to write a conference peer-review for the paper. But, then again, if somebody says "Yes, I read that paper", but insists that  reading that 10 page research paper took less than an hour, I think they shouldn't have been using "read" in the first place. They should have said, "I skimmed the paper".

I have never been good at skimming papers. So for me "reading a paper" had always meant reading to fully understand it. I am a contextual learner (maybe the technical term is a constructivist learner), so I feel the need to fully understand. I cannot draw any lesson from a paper by skimming it. I treat any paper with a healthy dose of suspicion, so I refuse to draw any lesson before I fully understand what is being done. If you need more elaboration on this, 9 years ago, I wrote this post about how I read a research paper.

I remember in 2000 walking to the office of our distributed systems seminar Professor. He had suggested that for the seminar class students should only skim the paper in one hour and we could use the two hours seminar class to understand the paper together. I complained to him that I am incapable of skimming and half-understanding a paper. When I read a paper I need to fully understand it, and I cannot let the paper go with superficial reading, because then I don't understand anything. (These were distributed systems theory papers so I still think I had a valid point there, and not just being cocky.) I tried to pitch him the idea that students should come having grappled with the paper many hours, and we should do deeper discussions in the seminar hours, and deepen our understanding.

I still strongly believe in a deep-work approach to reading a paper. When you are done reading the paper, you should fully understand it, know on what grounds/context you can refute some of the claims in the paper, and know which parts of it is fertile ground for extension/future work.

Whenever I advocate this deep-reading approach, I get push back from some people that this is too radical and extreme. Of course, don't apply it to a random paper that you may not be interested in understanding. But if you are interested in the problem, and it is a good research paper, I think this is the approach that provides most value.

Comments

Unknown said…
I was reading the article you wrote 9 years ago ("How I read a research paper"), advised by my professor.

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book