Thoughts on Monitoring file changes with Linux over the network

Monitoring a directory for changes with Linux is possible through the well-known mechanism inotify. With inotify it's possible to set a watch on a directory, configure it to watch events on the contents, and you'll receive messages on a file descriptor when something happens. This works perfectly when the directory is on local storage, like a hard drive, SSD or a USB drive, But it is not sufficient when the directory is on a network filesystem when the storage is on another computer. Another user working in the same directory, connected via the same or another filesystem, can remove a file and the watch you've set on it will not get notified.

Why is this?

By design, inotify gets the result of an operation (like mkdir or chmod) but what kind of filesystem the watch is on is unknown (a black box) to inotify. The filesystem does not "know" a watch has been set, and thus cannot take the right action, like notifying the remote host somebody wants to watch a directory on.

As long you are the only user, there is no problem. It becomes a problem when there are more users working in the directory you want to watch.

You can compare this behaviour with a public library. When you are the only user, you are able to know which books are available and which ones's not as you know which one's you have borrowed. This is not possible anymore when you're not the only user, there are more users borrowing the books.

In that case, somebody in the library should administer what is borrowed by any user (which is the usual case), and you have to contact this person to know if a book is available or not. That's like asking somebody to inform you when a book, which is currently not present, is available again.

Now this getting in contact with the library to inform you does not work with Linux, where of course the library is the remote storage, and the server is "somebody" who is working in the library.

To make this work with Linux is to make the remote server getting notified a watch has been set.

Actually, filesystems like CIFS and recent versions of NFS contain support of sending of a watch to the server: for CIFS on line 6438 of fs/cifs/cifssmb.c of kernel 4.1.2 the SMB message for this (NT_TRANSACT_NOTIFY_CHANGE) is commented out but still present. The reason for commenting this out is that it worked with dnotify, which isn't the default fsnotify system for Linux for a long time now.

Making forwarding of watches work on Linux with network filesystems and FUSE is possible via kernel space.

Recently I've tried to implement this "forwarding of the watch to the server" with FUSE. I had to patch:

The fsnotify kernel subsystem, to notify the FUSE kernel module a watch has been set or removed on an inode.

The FUSE kernel module to take action after informed by fsnotify. I introduced a new operation code
FUSE_FSNOTIFY the kernel module sends to the userspace daemon together with the inodenumber and mask.

The FUSE library to receive and process the FUSE_FSNOTIFY call by calling the right function of the userspace filesystem.

The FUSE library to receive and process the fs events and report those back to the VFS.

Taking a closer look at how things work, when the watch has been set by the userspace filesystem on it's backend successfully (note it's also possible for it to reply ENOSYS), the backend can send an event on the watch at any moment, till the watch is removed. What to do with this event?

A possible scenario:

Introduce an extra FUSE opcode FUSE_FSNOTIFY_EVENT, translate the mask in the event received from the backend protocol in something that fsnotify understands, and send it back to the FUSE module using the new opcode, the inode of the watch, the name of the entry, and the translated mask. The FUSE module on his turn sends it to the fsnotify subsystem, which informs the listners (inotify and or fanotify), where the information is provided that the event is on the backend. (an extra event flag is required, for example for inotify the event mask IN_REMOTE, for fanotify FAN_REMOTE). It's up to the listener what to do with this information. The local VFS may or may not be already up to date.

Notes:

Translating a mask from a backend into something fsnotify understands can be very easy and not so easy, depending on the event. The basic events like the creating (or removal) of an entry in the watched directory is simple (FS_CREATE and FS_DELETE resp.), the changing of the owner is also not so hard (FS_ATTRIB), but something like an extended attribute (SMB uses those a lot) can only be translated into something generic as FS_ATTRIB.

The FUSE module should check the watch and/or inode is still valid, and if the mask of the watch applies to the eventmask.

Extra mask bits IN_REMOTE (for inotify) and FAN_REMOTE (for fanotify) are required.

Double information is to be avoided. This is tricky. For example, the creating of a file in the watched directory on the same host as the watch is on. When this operation is successful, this will cause a fsnotify event FS_CREATE, and it will also create an FS_CREATE | FS_REMOTE event, since the operation is performed successfully on the backend, resulting in this message (from backend→fuse library→FUSE kernel module→fsnotify subsystem→inotify and/or fanotify).

One way to tackle this is to ask the backend to only send events initiated by others. For the backend, it is pretty simple to compare the initiator (host) of a FS event with the host making the connection.

Another solution is to compare the reported event with the local cache in the fuse library and FUSE module. With the example creating a file, the library (and the FUSE module) should check the entry does exist already in the watched directory. If not, it's not initiated by this host. For a delete, this is similar.

For other events, like a file is written to, or owner is changed, this method is not sufficient, additional information about what has changed remote (like new size, new owner) has to be in the message send by the remote host.

If that information is not provided by the backend, another solution is to make the daemon responsible for the watching of FS events on behalf of clients maintain a cache of recent local events. If a remote event is reported, and in the cache is not a local equivalent found, it's initiated by another host. This can become tricky cause the events are reported using a connection for a certain user, other users may or may not be authorized to receive events. And how big will this cache be?

I've used FUSE above, I think it's similar for other filesystems like CIFS and NFS.

Oh and yes, there is still another option: just poll every 5 seconds or so.

Stef Bon