[Discuss] Sync Revisited: inotify
Tom Metro
tmetro+blu at gmail.com
Wed Jul 30 17:57:43 EDT 2014
Richard Pieri wrote:
>> *BTSync*
>
> The in-memory databases peaked for me at around 500MB. They've made
> improvements for directories with lots of small files but there's a
> limit to how much optimization can be done. Note that this isn't a
> problem unique to BTSync; anything using in-memory databases for file
> metadata is going to chew up lots of RAM.
>
> Startup times are the slowest aspect. The software needs to scan
> everything to build up the in-memory databases. It's a killer on my
> notebook.
This could be avoided with architectural changes. Specifically using
inotify[1] on Linux to monitor a file system. (See also fsnotify[2].) A
background daemon would receive notifications of file changes, and write
them to an indexed database. It would take into account the configured
sync frequency and delay writing high frequency changes to disk. (For
example, if you sync hourly, no point in noting that a file was modified
200 times in the last hour.)
1. http://inotify.aiken.cz/?section=inotify&page=why&lang=en
2. http://lwn.net/Articles/318618/
This may not entirely eliminate doing a full disk scan. An unclean shut
down might leave change notifications not written to disk, or if you
plug in an external drive that was modified elsewhere. So the tool would
still need to do a full scan occasionally.
Another possibility would be building a system that leveraged the
journal in a journaling file system. Obviously the solution would be
tightly coupled to a specific file system.
Even though inotify and file systems are not cross-platform, that
doesn't negate the utility in using these techniques to speed things up
on the Linux side.
There are some lesser known sync tools that make use of inotify:
Lsyncd
https://code.google.com/p/lsyncd/
(installation guide)
https://docs.google.com/document/d/1XpqM5h5YMwuQqzdknyDDnjcQVYGjAsyAxfYprqSnhd0/edit?pli=1
sersync (scroll down for English version; claims to use multiple threads
and be optimized for large files)
https://code.google.com/p/sersync/
Openduckbill
https://code.google.com/p/openduckbill/
Of these Lsyncd is the most recently updated (a year ago). The other two
haven't been updated in years. My guess is that a search of Github would
turn up some fresher, but equally below-the-radar tools. So far none of
these has broken into the mainstream. Perhaps since I last checked one
of the better known backup/sync tools has added inotify capability.
There is an inotify command line tool (inotifywatch/inotifywait) and a
cron-like tool that fires scripts on inotify events rather than time
events (incron, see [1] above) making it fairly easy to hack together
some scripts with an existing sync tool for more efficient syncing. 2 of
the 3 projects above essentially do this.
-Tom
--
Tom Metro
The Perl Shop, Newton, MA, USA
"Predictable On-demand Perl Consulting."
http://www.theperlshop.com/
More information about the Discuss
mailing list