Ben RUBSON
2018-04-09 10:28:55 UTC
Hi,
This follows the "Why does readdir() open files ?" discussion we had a few
days ago.
Thank you Corinna for your answers and suport there !
So, context is Cygwin, especially rsync, working over a Fuse FS.
This Fuse FS is assumed to be mounted on `/cygdrive/x` below.
I finally found that readdir() does not open every file.
`ls /cygdrive/x` does not fire any open() call. Perfect.
However, `ls -l /cygdrive/x` does, every file is opened, with read access.
As `rsync -an /cygdrive/x /tmp/`, which is a dry-run just grabbing files'
attributes.
I then went through Cygwin code and found that NtCreateFile/NtOpenFile
calls from symlink_info::check() in path.cc may be the culprits.
To demonstrate this I added write access to these calls, and found that
every file was then opened with write access.
Later in this function we have a failback to NtQueryDirectoryFile call.
I assume (assume only, I may be wrong) this one does not open the requested
file.
So as a test I disabled NtCreateFile/NtOpenFile calls above, but every file
was then still opened, with read access.
I did not manage to find out where this came from.
So, any reason why `ls -l` (or `rsync -an`) opens every file ?
I assume this is because they stat every file, so path_conv::check(),
calling symlink_info::check(), is performed on every path ?
Could we make this without opening every file ?
Goal is to avoid the performance impact of opening every file while
browsing / stating huge directories, which for example rsync does.
As an example, the mount option "acl" adds an open call to every file.
With the "noacl" option, this is a 23% gain in time browsing a files tree
(*).
I then expect such improvement again if we can avoid this last remaining
open call.
(*)(consistent tests performed on a 10.000 (sub)directories and 1.000.000
files' tree)
Thank you very much for your support !
Best regards,
Ben
This follows the "Why does readdir() open files ?" discussion we had a few
days ago.
Thank you Corinna for your answers and suport there !
So, context is Cygwin, especially rsync, working over a Fuse FS.
This Fuse FS is assumed to be mounted on `/cygdrive/x` below.
I finally found that readdir() does not open every file.
`ls /cygdrive/x` does not fire any open() call. Perfect.
However, `ls -l /cygdrive/x` does, every file is opened, with read access.
As `rsync -an /cygdrive/x /tmp/`, which is a dry-run just grabbing files'
attributes.
I then went through Cygwin code and found that NtCreateFile/NtOpenFile
calls from symlink_info::check() in path.cc may be the culprits.
To demonstrate this I added write access to these calls, and found that
every file was then opened with write access.
Later in this function we have a failback to NtQueryDirectoryFile call.
I assume (assume only, I may be wrong) this one does not open the requested
file.
So as a test I disabled NtCreateFile/NtOpenFile calls above, but every file
was then still opened, with read access.
I did not manage to find out where this came from.
So, any reason why `ls -l` (or `rsync -an`) opens every file ?
I assume this is because they stat every file, so path_conv::check(),
calling symlink_info::check(), is performed on every path ?
Could we make this without opening every file ?
Goal is to avoid the performance impact of opening every file while
browsing / stating huge directories, which for example rsync does.
As an example, the mount option "acl" adds an open call to every file.
With the "noacl" option, this is a 23% gain in time browsing a files tree
(*).
I then expect such improvement again if we can avoid this last remaining
open call.
(*)(consistent tests performed on a 10.000 (sub)directories and 1.000.000
files' tree)
Thank you very much for your support !
Best regards,
Ben