Discussion:
Why does (stat() ?) open files ?
Ben RUBSON
2018-04-09 10:28:55 UTC
Permalink
Hi,

This follows the "Why does readdir() open files ?" discussion we had a few
days ago.
Thank you Corinna for your answers and suport there !

So, context is Cygwin, especially rsync, working over a Fuse FS.
This Fuse FS is assumed to be mounted on `/cygdrive/x` below.

I finally found that readdir() does not open every file.
`ls /cygdrive/x` does not fire any open() call. Perfect.

However, `ls -l /cygdrive/x` does, every file is opened, with read access.
As `rsync -an /cygdrive/x /tmp/`, which is a dry-run just grabbing files'
attributes.

I then went through Cygwin code and found that NtCreateFile/NtOpenFile
calls from symlink_info::check() in path.cc may be the culprits.
To demonstrate this I added write access to these calls, and found that
every file was then opened with write access.

Later in this function we have a failback to NtQueryDirectoryFile call.
I assume (assume only, I may be wrong) this one does not open the requested
file.
So as a test I disabled NtCreateFile/NtOpenFile calls above, but every file
was then still opened, with read access.
I did not manage to find out where this came from.

So, any reason why `ls -l` (or `rsync -an`) opens every file ?
I assume this is because they stat every file, so path_conv::check(),
calling symlink_info::check(), is performed on every path ?

Could we make this without opening every file ?

Goal is to avoid the performance impact of opening every file while
browsing / stating huge directories, which for example rsync does.

As an example, the mount option "acl" adds an open call to every file.
With the "noacl" option, this is a 23% gain in time browsing a files tree
(*).
I then expect such improvement again if we can avoid this last remaining
open call.
(*)(consistent tests performed on a 10.000 (sub)directories and 1.000.000
files' tree)

Thank you very much for your support !

Best regards,

Ben
Corinna Vinschen
2018-04-09 10:52:34 UTC
Permalink
Post by Ben RUBSON
Hi,
This follows the "Why does readdir() open files ?" discussion we had a few
days ago.
Thank you Corinna for your answers and suport there !
So, context is Cygwin, especially rsync, working over a Fuse FS.
This Fuse FS is assumed to be mounted on `/cygdrive/x` below.
I finally found that readdir() does not open every file.
`ls /cygdrive/x` does not fire any open() call. Perfect.
However, `ls -l /cygdrive/x` does, every file is opened, with read access.
As `rsync -an /cygdrive/x /tmp/`, which is a dry-run just grabbing files'
attributes.
I then went through Cygwin code and found that NtCreateFile/NtOpenFile calls
from symlink_info::check() in path.cc may be the culprits.
To demonstrate this I added write access to these calls, and found that
every file was then opened with write access.
Later in this function we have a failback to NtQueryDirectoryFile call.
I assume (assume only, I may be wrong) this one does not open the requested
file.
It's nice that you're testing all this, but you should ask *why* Cygwin
does it in the first place. The reason is that the information one can
gather without opening the file on Windows is insufficient to fill in
all of the stat struct. The directory info returned by
NtQueryDirectoryFile just isn't, thus it's only a fallback.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ben RUBSON
2018-04-09 12:12:39 UTC
Permalink
Post by Corinna Vinschen
Post by Ben RUBSON
Hi,
This follows the "Why does readdir() open files ?" discussion we had a few
days ago.
Thank you Corinna for your answers and suport there !
So, context is Cygwin, especially rsync, working over a Fuse FS.
This Fuse FS is assumed to be mounted on `/cygdrive/x` below.
I finally found that readdir() does not open every file.
`ls /cygdrive/x` does not fire any open() call. Perfect.
However, `ls -l /cygdrive/x` does, every file is opened, with read access.
As `rsync -an /cygdrive/x /tmp/`, which is a dry-run just grabbing files'
attributes.
I then went through Cygwin code and found that NtCreateFile/NtOpenFile calls
from symlink_info::check() in path.cc may be the culprits.
To demonstrate this I added write access to these calls, and found that
every file was then opened with write access.
Later in this function we have a failback to NtQueryDirectoryFile call.
I assume (assume only, I may be wrong) this one does not open the requested
file.
It's nice that you're testing all this, but you should ask *why* Cygwin
does it in the first place. The reason is that the information one can
gather without opening the file on Windows is insufficient to fill in
all of the stat struct. The directory info returned by
NtQueryDirectoryFile just isn't, thus it's only a fallback.
Corinna, thank you very much for your answer.
What info would be missing without opening the file ?

Do you know where the open call could come from, when only using
NtQueryDirectoryFile in symlink_info::check() ?
(certainly related to the previous question)

Thank you !

Ben
Corinna Vinschen
2018-04-09 13:07:42 UTC
Permalink
Post by Ben RUBSON
Post by Corinna Vinschen
Post by Ben RUBSON
Hi,
This follows the "Why does readdir() open files ?" discussion we had a few
days ago.
Thank you Corinna for your answers and suport there !
So, context is Cygwin, especially rsync, working over a Fuse FS.
This Fuse FS is assumed to be mounted on `/cygdrive/x` below.
I finally found that readdir() does not open every file.
`ls /cygdrive/x` does not fire any open() call. Perfect.
However, `ls -l /cygdrive/x` does, every file is opened, with read access.
As `rsync -an /cygdrive/x /tmp/`, which is a dry-run just grabbing files'
attributes.
I then went through Cygwin code and found that
NtCreateFile/NtOpenFile calls
from symlink_info::check() in path.cc may be the culprits.
To demonstrate this I added write access to these calls, and found that
every file was then opened with write access.
Later in this function we have a failback to NtQueryDirectoryFile call.
I assume (assume only, I may be wrong) this one does not open the requested
file.
It's nice that you're testing all this, but you should ask *why* Cygwin
does it in the first place. The reason is that the information one can
gather without opening the file on Windows is insufficient to fill in
all of the stat struct. The directory info returned by
NtQueryDirectoryFile just isn't, thus it's only a fallback.
Corinna, thank you very much for your answer.
What info would be missing without opening the file ?
uid, gid, number of links.
Post by Ben RUBSON
Do you know where the open call could come from, when only using
NtQueryDirectoryFile in symlink_info::check() ?
(certainly related to the previous question)
The answer here is that your FS should handle the open call differently
depending on the access mask. The NtCreateFile call in symlink_info::check
opens the file with READ_CONTROL | FILE_READ_ATTRIBUTES | FILE_READ_EA
only. All these access flags only request *meta* info, not actual data
from the data stream of the file. In other words, a Windows open call
without FILE_READ_DATA/FILE_WRITE_DATA and related flags does not
actually have to open the file at all in the FS driver. It only has to
provide metadata subsequently, an operation which you usally can have at
much lower cost if the remote FS is running on a POSIX OS:

- NtCreateFile called with only metadata access flags does not have to
open the file.

- Make sure to ignore the EaBuffer and EaLength parameter, rather than
to return a failure (this avoids YA NtOpenFile call).

- Just call stat/statvfs on the remote file as required to fulfill
subsequent NtQueryVolumeInformationFile and NtQueryInformationFile
calls.

Does that make sense?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ben RUBSON
2018-04-10 14:28:54 UTC
Permalink
Post by Corinna Vinschen
Post by Ben RUBSON
Post by Corinna Vinschen
Post by Ben RUBSON
Hi,
This follows the "Why does readdir() open files ?" discussion we had a few
days ago.
Thank you Corinna for your answers and suport there !
So, context is Cygwin, especially rsync, working over a Fuse FS.
This Fuse FS is assumed to be mounted on `/cygdrive/x` below.
I finally found that readdir() does not open every file.
`ls /cygdrive/x` does not fire any open() call. Perfect.
However, `ls -l /cygdrive/x` does, every file is opened, with read access.
As `rsync -an /cygdrive/x /tmp/`, which is a dry-run just grabbing files'
attributes.
I then went through Cygwin code and found that
NtCreateFile/NtOpenFile calls
from symlink_info::check() in path.cc may be the culprits.
To demonstrate this I added write access to these calls, and found that
every file was then opened with write access.
Later in this function we have a failback to NtQueryDirectoryFile call.
I assume (assume only, I may be wrong) this one does not open the requested
file.
It's nice that you're testing all this, but you should ask *why* Cygwin
does it in the first place. The reason is that the information one can
gather without opening the file on Windows is insufficient to fill in
all of the stat struct. The directory info returned by
NtQueryDirectoryFile just isn't, thus it's only a fallback.
Corinna, thank you very much for your answer.
What info would be missing without opening the file ?
uid, gid, number of links.
If we use "noacl" mount option, I think uid and gid do not really make
sense and could be forced to some default value ?
Number of links should not be critical too.
Thus gathering files' information without opening them could be possible.
However sounds like a more correct way to handle this is in your answer
below, though I'm not sure it will be possible.
Post by Corinna Vinschen
Post by Ben RUBSON
Do you know where the open call could come from, when only using
NtQueryDirectoryFile in symlink_info::check() ?
(certainly related to the previous question)
The answer here is that your FS should handle the open call differently
depending on the access mask. The NtCreateFile call in symlink_info::check
opens the file with READ_CONTROL | FILE_READ_ATTRIBUTES | FILE_READ_EA
only. All these access flags only request *meta* info, not actual data
from the data stream of the file. In other words, a Windows open call
without FILE_READ_DATA/FILE_WRITE_DATA and related flags does not
actually have to open the file at all in the FS driver. It only has to
provide metadata subsequently, an operation which you usally can have at
- NtCreateFile called with only metadata access flags does not have to
open the file.
- Make sure to ignore the EaBuffer and EaLength parameter, rather than
to return a failure (this avoids YA NtOpenFile call).
- Just call stat/statvfs on the remote file as required to fulfill
subsequent NtQueryVolumeInformationFile and NtQueryInformationFile
calls.
Does that make sense?
Thank you very much for your detailed answer Corinna !

Yes it does make sense, at least I understand how it should work.
The FS is not really running on a POSIX OS, as it stands on the Windows
machine using WinFsp FUSE API.
I'll see with WinFsp team if something can be done in this way.

Thank you !

Ben

Corinna Vinschen
2018-04-09 13:09:38 UTC
Permalink
Post by Ben RUBSON
Post by Corinna Vinschen
It's nice that you're testing all this, but you should ask *why* Cygwin
does it in the first place. The reason is that the information one can
gather without opening the file on Windows is insufficient to fill in
all of the stat struct. The directory info returned by
NtQueryDirectoryFile just isn't, thus it's only a fallback.
Corinna, thank you very much for your answer.
What info would be missing without opening the file ?
Do you know where the open call could come from, when only using
NtQueryDirectoryFile in symlink_info::check() ?
(certainly related to the previous question)
NtQueryDirectoryFile requires an open directory handle of course.
NtOpenFile on the dir is called a few lines prior to NtQueryDirectoryFile.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ben RUBSON
2018-04-10 14:11:04 UTC
Permalink
Post by Corinna Vinschen
Post by Ben RUBSON
Post by Corinna Vinschen
It's nice that you're testing all this, but you should ask *why* Cygwin
does it in the first place. The reason is that the information one can
gather without opening the file on Windows is insufficient to fill in
all of the stat struct. The directory info returned by
NtQueryDirectoryFile just isn't, thus it's only a fallback.
Corinna, thank you very much for your answer.
What info would be missing without opening the file ?
Do you know where the open call could come from, when only using
NtQueryDirectoryFile in symlink_info::check() ?
(certainly related to the previous question)
NtQueryDirectoryFile requires an open directory handle of course.
NtOpenFile on the dir is called a few lines prior to NtQueryDirectoryFile.
Yes I agree, but the open call I'm talking about is made on every file, not
only on the containing directory.
I wonder if another function later in the code path would re-open every
file, due to some missing information non-provided by symlink_info::check()
without its NtCreateFile/NtOpenFile calls.

Ben
Loading...