Discussion:
src/winsup/cygwin ChangeLog fhandler_proc.cc f ...
Eric Blake
2014-10-09 14:31:42 UTC
Permalink
* fhandler_proc.cc (fhandler_proc::readdir): Set dirent d_type.
* fhandler_process.cc (fhandler_process::readdir): Ditto.
* fhandler_procnet.cc (fhandler_procnet::readdir): Ditto.
* fhandler_procsys.cc (fhandler_procsys::readdir): Ditto.
* fhandler_procsysvipc.cc (fhandler_procsysvipc::readdir): Ditto.
* fhandler_virtual.h (virt_ftype_to_dtype): Define new inline function
to generate dirent d_type from virtual_ftype_t.
@@ -357,10 +358,17 @@
res = ENMFILE;
else
{
+ struct stat st;
+ char *file = tp.c_get ();
+
sys_wcstombs (de->d_name, NAME_MAX + 1, f.dbi.ObjectName.Buffer,
f.dbi.ObjectName.Length / sizeof (WCHAR));
de->d_ino = hash_path_name (get_ino (), de->d_name);
- de->d_type = 0;
+ stpcpy (stpcpy (stpcpy (file, get_name ()), "/"), de->d_name);
+ if (!lstat64 (file, &st))
+ de->d_type = IFTODT (st.st_mode);
+ else
+ de->d_type = DT_UNKNOWN;
The whole point of d_type is for optimization, to tell a process when it
can avoid the overhead of an lstat() because the system was able to
obtain the information in a cheaper manner. But if you have to resort
to an lstat() to get the information, then you are wasting cycles on the
case of a user that doesn't care about d_type. I'd rather we always
return DT_UNKNOWN if the only way we'd get a better type is by calling
lstat().
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Corinna Vinschen
2014-10-09 14:51:28 UTC
Permalink
Post by Eric Blake
* fhandler_proc.cc (fhandler_proc::readdir): Set dirent d_type.
* fhandler_process.cc (fhandler_process::readdir): Ditto.
* fhandler_procnet.cc (fhandler_procnet::readdir): Ditto.
* fhandler_procsys.cc (fhandler_procsys::readdir): Ditto.
* fhandler_procsysvipc.cc (fhandler_procsysvipc::readdir): Ditto.
* fhandler_virtual.h (virt_ftype_to_dtype): Define new inline function
to generate dirent d_type from virtual_ftype_t.
@@ -357,10 +358,17 @@
res = ENMFILE;
else
{
+ struct stat st;
+ char *file = tp.c_get ();
+
sys_wcstombs (de->d_name, NAME_MAX + 1, f.dbi.ObjectName.Buffer,
f.dbi.ObjectName.Length / sizeof (WCHAR));
de->d_ino = hash_path_name (get_ino (), de->d_name);
- de->d_type = 0;
+ stpcpy (stpcpy (stpcpy (file, get_name ()), "/"), de->d_name);
+ if (!lstat64 (file, &st))
+ de->d_type = IFTODT (st.st_mode);
+ else
+ de->d_type = DT_UNKNOWN;
The whole point of d_type is for optimization, to tell a process when it
can avoid the overhead of an lstat() because the system was able to
obtain the information in a cheaper manner. But if you have to resort
to an lstat() to get the information, then you are wasting cycles on the
case of a user that doesn't care about d_type. I'd rather we always
return DT_UNKNOWN if the only way we'd get a better type is by calling
lstat().
I see. The idea here was to try and, at least on my machine, it
was still *very* fast, likely because the whole thing occurs only
in globally allocated memory and there's no disk access or paging
involved.

The question is, what exactly do we lose? /proc/sys isn't often
accessed at all (I guess) and what could be gained? Yaakov asked
for setting d_type under /proc, so he might enlighten us which
tools make heavy use of the stuff, so the net gain is > 0...


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Eric Blake
2014-10-09 15:07:24 UTC
Permalink
Post by Corinna Vinschen
Post by Eric Blake
The whole point of d_type is for optimization, to tell a process when it
can avoid the overhead of an lstat() because the system was able to
obtain the information in a cheaper manner. But if you have to resort
to an lstat() to get the information, then you are wasting cycles on the
case of a user that doesn't care about d_type. I'd rather we always
return DT_UNKNOWN if the only way we'd get a better type is by calling
lstat().
I see. The idea here was to try and, at least on my machine, it
was still *very* fast, likely because the whole thing occurs only
in globally allocated memory and there's no disk access or paging
involved.
The question is, what exactly do we lose? /proc/sys isn't often
accessed at all (I guess) and what could be gained? Yaakov asked
for setting d_type under /proc, so he might enlighten us which
tools make heavy use of the stuff, so the net gain is > 0...
Some modes of 'find' and 'ls' (such as ls -F) are faster if d_type is
accurate (because they avoided an lstat); there, returning DT_UNKNOWN
requires the lstat. In other cases (like ls -l) an lstat is always
required. Anywhere that lstat is slow, embedding an lstat into d_type
determination as well as a followup lstat is going to make directory
traversal twice as slow (well, maybe the second call is faster because
of caching effects); conversely, anywhere that lstat is not required by
the caller, it is wasted effort during the readdir. But as you say,
lstat in /proc/sys is mostly stuff in memory and already fast, so maybe
it doesn't hurt to leave it in.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Corinna Vinschen
2014-10-09 16:22:02 UTC
Permalink
Post by Eric Blake
Post by Corinna Vinschen
Post by Eric Blake
The whole point of d_type is for optimization, to tell a process when it
can avoid the overhead of an lstat() because the system was able to
obtain the information in a cheaper manner. But if you have to resort
to an lstat() to get the information, then you are wasting cycles on the
case of a user that doesn't care about d_type. I'd rather we always
return DT_UNKNOWN if the only way we'd get a better type is by calling
lstat().
I see. The idea here was to try and, at least on my machine, it
was still *very* fast, likely because the whole thing occurs only
in globally allocated memory and there's no disk access or paging
involved.
The question is, what exactly do we lose? /proc/sys isn't often
accessed at all (I guess) and what could be gained? Yaakov asked
for setting d_type under /proc, so he might enlighten us which
tools make heavy use of the stuff, so the net gain is > 0...
Some modes of 'find' and 'ls' (such as ls -F) are faster if d_type is
accurate (because they avoided an lstat); there, returning DT_UNKNOWN
requires the lstat. In other cases (like ls -l) an lstat is always
required. Anywhere that lstat is slow, embedding an lstat into d_type
determination as well as a followup lstat is going to make directory
traversal twice as slow (well, maybe the second call is faster because
of caching effects); conversely, anywhere that lstat is not required by
the caller, it is wasted effort during the readdir. But as you say,
lstat in /proc/sys is mostly stuff in memory and already fast, so maybe
it doesn't hurt to leave it in.
I made a quick test on 64 bit W8.1 and a non-opimized Cygwin DLL.

time ls -l --color=always /proc/sys/Device/

takes a constant 0.53 secs without my patch, and a constant 0.83 secs
with my patch. So it's actually rather noticable, being more than 50%
slower. It's hard to justify such a hit...


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2014-10-09 17:47:40 UTC
Permalink
Post by Corinna Vinschen
Post by Eric Blake
Post by Corinna Vinschen
Post by Eric Blake
The whole point of d_type is for optimization, to tell a process when it
can avoid the overhead of an lstat() because the system was able to
obtain the information in a cheaper manner. But if you have to resort
to an lstat() to get the information, then you are wasting cycles on the
case of a user that doesn't care about d_type. I'd rather we always
return DT_UNKNOWN if the only way we'd get a better type is by calling
lstat().
I see. The idea here was to try and, at least on my machine, it
was still *very* fast, likely because the whole thing occurs only
in globally allocated memory and there's no disk access or paging
involved.
The question is, what exactly do we lose? /proc/sys isn't often
accessed at all (I guess) and what could be gained? Yaakov asked
for setting d_type under /proc, so he might enlighten us which
tools make heavy use of the stuff, so the net gain is > 0...
Some modes of 'find' and 'ls' (such as ls -F) are faster if d_type is
accurate (because they avoided an lstat); there, returning DT_UNKNOWN
requires the lstat. In other cases (like ls -l) an lstat is always
required. Anywhere that lstat is slow, embedding an lstat into d_type
determination as well as a followup lstat is going to make directory
traversal twice as slow (well, maybe the second call is faster because
of caching effects); conversely, anywhere that lstat is not required by
the caller, it is wasted effort during the readdir. But as you say,
lstat in /proc/sys is mostly stuff in memory and already fast, so maybe
it doesn't hurt to leave it in.
I made a quick test on 64 bit W8.1 and a non-opimized Cygwin DLL.
time ls -l --color=always /proc/sys/Device/
takes a constant 0.53 secs without my patch, and a constant 0.83 secs
with my patch. So it's actually rather noticable, being more than 50%
slower. It's hard to justify such a hit...
I applied another technique which has no noticable performance hit. It
doesn't recognize all objects, only directories, symlinks, and partially
character devices, but on the upside it uses only information which has
already been provided anyway.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Eric Blake
2014-10-09 18:07:24 UTC
Permalink
Post by Corinna Vinschen
I applied another technique which has no noticable performance hit. It
doesn't recognize all objects, only directories, symlinks, and partially
character devices, but on the upside it uses only information which has
already been provided anyway.
Yep, that looks a lot cleaner - reusing stuff we have for free is the
whole point of d_type optimization :)
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Loading...