Auto-dropboxify all web-downloaded documents


I often download pdf presentations from the web, but then when I reopen them in my laptop later, I wonder if they got updated/changed in the meanwhile. It would be nice to have a service where my local copy also gets automatically updated/synced (a la dropbox) when the authoritative source copy gets changed. (The difference from the dropbox model is here the source does not need to cooperate by deploying a sync software, e.g. dropbox, or even be aware that its document is being synced.)

So here is my project idea: auto-sync all web/url downloaded documents, so the downloaded local copy of a file is kept up-to-date with the authoritative copy on the web. I think many people will find this useful. Some other examples of documents downloaded from the web and would benefit from auto-updating include: city documents (e.g., street-parking rules, garbage collection rules, etc.), event calendars, tax documents, CVs. (Do you have any other examples?)

This shouldn't be too hard to build. A completely client-side solution is possible. A client-side software that keeps track of web-downloaded documents, and periodically checks with http to detect whether any of these documents got changed would do it. If a change is detected, the software should download a copy, and should prompt the user when she opens the local document next about whether the updated or the old copy to be used.

Of course a cloud-hosted push-based synchronization solution would be more scalable and efficient. This would be also more gentle for the original content provider as instead of thousands of client-side software periodically checking for updates, only the cloud-hosted synchronization service will check for the update to the document. Upon detecting an update the cloud service will push the updated document to all clients that posses this document. I am sure there are a lot of details to work out to make this service efficient. And there may even be a way to monetize this service eventually, who knows? This cloud hosted synchronization service idea may be seen as extending the CDNs to reach out and embrace the pc/laptops as the last hop for content-replication.

A further generalization of this "auto-sync documents downloaded from the web" idea is to allow any host to modify/update the document and yet still maintain the other copies in sync/up-to-date. This then turns into an application-level virtual-filesystem that is web-scale. And implementing that would be more tricky.

Comments

Ted Herman said…
What about www.archive.org? Or, as Fukuyama suggested, is history ended (though that's not what he meant in this context)?
Murat said…
@ Ted Herman
My idea is selfish. I don't have the persistence or archiving of web documents (for the good of mankind) in mind. I am lazy, and I don't want to check via web if the documents I downloaded earlier got updated. If it is updated, I have access [at least choice to access] to the updated copy.
Mert Emin said…
Deploy Button (https://deploybutton.com) works the other way around. You can keep your documents on the web up-do-date with the ones in your Dropbox or github etc.
Good tips, im dropbox user for years and this is helpful a lot!

Popular posts from this blog

Hints for Distributed Systems Design

Learning about distributed systems: where to start?

Making database systems usable

Looming Liability Machines (LLMs)

Advice to the young

Foundational distributed systems papers

Distributed Transactions at Scale in Amazon DynamoDB

Linearizability: A Correctness Condition for Concurrent Objects

Understanding the Performance Implications of Storage-Disaggregated Databases

Designing Data Intensive Applications (DDIA) Book