Discussion:
[PATCH] winsup/cygwin: Protect fork() against dll- and exe-updates.
Corinna Vinschen
2015-07-27 07:50:53 UTC
Permalink
Hi Michael,
Hi!
When starting to port Gentoo Prefix to Cygwin, the first real problem
discovered is that fork() does use the original executable's location
to Create the child's Process, probably finding linked dlls that just
have emerged in the current directory (sth. like /my/prefix/usr/bin),
causing "Loaded different DLL with same basename in forked child" errors.
Unfortunately there's some red tape to get over with, first. We need a
copyright assignment from you before we can go much further. See
https://cygwin.com/contrib.html, "Before you get started". Please fill
out the standard assignment form and send the signed PDF to the address
given therein.
Diving into the details, I'm coming up with (a patch-draft based on) the
idea to create hardlinks for the binaries-in-use into some cygwin-specific
directory, like \proc\<ntpid>\object\ ('ve seen this name on AIX),
and use these hardlinks instead to create the new child's process.
Thoughts so far?
Well, yes. Off the top of my head a couple of potential problems come
to mind:

- /proc is already available as virtual filesystem as on Linux. Reusing
it for some purposes is ok, but in this case we're talking about a
real directory of the same name, which then would be hidden beneath
the virtual one. Is that deliberate? The directory wouldn't be
accessible from Cygwin applications while native Windows apps would
see the dir. I think hidden is bad. Something like this should take
place in a visible cache dir. /var/cache or /var/spool come to mind.

Also, using the Windows PID as dir name seems a bit weird, given that
the virtual /proc obviously uses the Cygwin PID. This sounds like a
source for confusion.

- What about running Cygwin on filesystems not supporting hardlinks,
like FAT?

- There's a meme along the lines of "Why is Cygwin soooo Slow!!!1!!11".

The most important factor for this slowness is the way fork() has to
be emulated. The method you're proposing would add to the overhead by
having to create hardlinks on fork and deleting them again at exit or
execve time.

Did you run tests to find out the cost of this additional overhead?
\proc\<ntpid>\object\bash.exe -> /bin/bash.exe
\proc\<ntpid>\object\bash.exe.local (empty file for dll redirection)
\proc\<ntpid>\object\cygwin1.dll -> /bin/cygwin1.dll
\proc\<ntpid>\object\cygreadline7.dll -> /bin/cygreadline7.dll
CreateProcess("\proc\<ntpid>\object\bash.exe", "/bin/bash.exe", ...)
Resulting in another \proc\<ntpid>\object\ directory with same hardlinks.
*) dll-redirection for LoadLibrary using "app.exe.local" file does operate on
the dll's basename only, breaking perl's Hash::Util and List::Util at least.
So creating hardlinks for dynamically loaded dlls is disabled for now.
Eventually, manifests and/or app.exe.config could help here, but I'm still
failing to really grok them...
Hmm. The DLLs are loaded dynamically anyway, so they will be loaded
dynamically in the child as well in dll_list::load_after_fork_impl. Why
not simply hardlinking them using a unique filename (e.g. using the
inode number), storing the unique number or name in the dll struct and
then calling LoadLibrary on this name?
*) Who can clean up \proc\<ntpid>\ directory after power-loss or similar?
For now, if stale \proc\<ntpid>\ is found, it is removed beforehand.
But when this was from a different user, cleanup will fail. However,
using \proc\S-<current-user-id>\<ntpid>\ instead could help here...
Yes, that seems necessary. The requirement to remove a complete
directory on process startup is a lot of effort, though. I'm feeling a
sweat attack coming...
*) Is it really necessary to create these hardlinks in the real filesystem?
I could imagine to create them directly in $Recycle.bin instead, or some
(other) memory-only thing...
Uh, well, they are hardlinks after all. They must be created on the
same filesystem.
Thoughts welcome!
In general I like the basic idea behind this. But given the overhead it
adds to the already slow fork, I'm rather reluctant. I'm sure this
needs at least a lot more discussion (for which the cygwin-developers
mailing list, redirected to, would be better suited). For instance:

- What if a EXE/DLL is replace more than once during the lifetime of
a process?

- What about reducing the overhead by implementing some kind of generic
exe/dll cache used by all processes? It would reduce the requirement
to cleanup, reduce the footprint of the cache, speed up subsequent
forks.

- Given that the /bin directory alone can be easily 0.5 Gigs and more,
the cache(s) can take as much memory. This really asks for some
cleanup mechanism.

- The heretical question of course: Is the underlying problem really
worth the additional overhead? The patch is pretty intrusive.
Is there a simpler way to achieve the same or, at least, a similar
result?
Thank you!
/haubi/
Thank you,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Michael Haubenwallner
2015-07-28 16:40:25 UTC
Permalink
Hi Corinna,
Post by Corinna Vinschen
When starting to port Gentoo Prefix to Cygwin, the first real problem
discovered is that fork() does use the original executable's location
Unfortunately there's some red tape to get over with, first. We need a
copyright assignment from you before we can go much further.
Copyright assignment is in progress.
Post by Corinna Vinschen
Diving into the details, I'm coming up with (a patch-draft based on) the
idea to create hardlinks for the binaries-in-use into some cygwin-specific
directory, like \proc\<ntpid>\object\ ('ve seen this name on AIX),
and use these hardlinks instead to create the new child's process.
Thoughts so far?
Well, yes. Off the top of my head a couple of potential problems come
- /proc is already available as virtual filesystem as on Linux. Reusing
it for some purposes is ok, but in this case we're talking about a
real directory of the same name, which then would be hidden beneath
the virtual one. Is that deliberate? The directory wouldn't be
accessible from Cygwin applications while native Windows apps would
see the dir. I think hidden is bad. Something like this should take
place in a visible cache dir. /var/cache or /var/spool come to mind.
Also, using the Windows PID as dir name seems a bit weird, given that
the virtual /proc obviously uses the Cygwin PID. This sounds like a
source for confusion.
There's no particular reason for /proc/ actually - just came to my mind
first. I've also seen /run/ on recent Linux boxes...
Post by Corinna Vinschen
- What about running Cygwin on filesystems not supporting hardlinks,
like FAT?
This is the functional reason to keep these hardlinks optional:
I don't want Cygwin itself to require NTFS, but Gentoo Prefix only - which
IMHO is a corner use-case for Cygwin, but requires an updates-protected fork.
Post by Corinna Vinschen
- There's a meme along the lines of "Why is Cygwin soooo Slow!!!1!!11".
The most important factor for this slowness is the way fork() has to
be emulated. The method you're proposing would add to the overhead by
having to create hardlinks on fork and deleting them again at exit or
execve time.
Agreed, and this is the performance reason to keep hardlinks optional.

However, I've been using Interix before - and Cygwin feels faster even
with hardlinks enabled. And I do prefer "slow" over "broken" setups.
Post by Corinna Vinschen
Did you run tests to find out the cost of this additional overhead?
On a 4-core Windows Server 2012r2 VM, three times building cygwin takes:
$ ( time for x in {1..3}; do
rm -rf cygwin-2.2.0-0.1.x86_64 ;
cygport -8 cygwin.cygport prep compile install ;
done ) > outfile 2>&1

vanilla cygwin-2.2.0-0.1:
real 31m35.530s
user 30m11.245s
sys 22m20.365s

patched cygwin-2.2.0-0.1:
real 52m26.049s
user 30m39.820s
sys 26m41.637s
Post by Corinna Vinschen
*) dll-redirection for LoadLibrary using "app.exe.local" file does operate on
the dll's basename only, breaking perl's Hash::Util and List::Util at least.
So creating hardlinks for dynamically loaded dlls is disabled for now.
Eventually, manifests and/or app.exe.config could help here, but I'm still
failing to really grok them...
Hmm. The DLLs are loaded dynamically anyway, so they will be loaded
dynamically in the child as well in dll_list::load_after_fork_impl. Why
not simply hardlinking them using a unique filename (e.g. using the
inode number), storing the unique number or name in the dll struct and
then calling LoadLibrary on this name?
This might be necessary in the initial dlopen() already: I've tried hardlinks
for loaded dlls mangling the full path into the hardlink's filename, but
encountered different load addresses in the child - most likely due to the
now different dll's filename.
Haven't tried creating a mangled subdir only to keep the filename, though.
Post by Corinna Vinschen
*) Who can clean up \proc\<ntpid>\ directory after power-loss or similar?
For now, if stale \proc\<ntpid>\ is found, it is removed beforehand.
But when this was from a different user, cleanup will fail. However,
using \proc\S-<current-user-id>\<ntpid>\ instead could help here...
Yes, that seems necessary. The requirement to remove a complete
directory on process startup is a lot of effort, though. I'm feeling a
sweat attack coming...
Removing that subdirectory at process startup should be rarely needed, as the
original process removes it at pinfo::exit - unless something goes really wrong.
Post by Corinna Vinschen
Thoughts welcome!
In general I like the basic idea behind this. But given the overhead it
adds to the already slow fork, I'm rather reluctant. I'm sure this
needs at least a lot more discussion (for which the cygwin-developers
- What if a EXE/DLL is replace more than once during the lifetime of
a process?
This wouldn't make any difference: The hardlinks are created upon the first
use of some exe/dll in parent (even if that process won't ever use fork),
and the forked child gets the parent's first-use versions. Still there is
a short timeframe between process startup and hardlink creation, but that
is not a real problem (yet). The main issue is with programs (shell scripts,
bots) that do run the Gentoo Prefix bootstrap&update itself - these are
started before the actual dll/exe updates and do some forks afterwards.
Post by Corinna Vinschen
- What about reducing the overhead by implementing some kind of generic
exe/dll cache used by all processes? It would reduce the requirement
to cleanup, reduce the footprint of the cache, speed up subsequent
forks.
I'm all for it, but I've no idea of currently available cross-process
mechanisms in cygwin/windows that could help here ...

But: From the LoadLibrary docs I've got the impression that even an exe
can be dynamically loaded. Iff it would be possible to branch to an exe's
main() - what if there is some pool of cygwin-starter processes, that do
nothing but wait for some cygwin-process-start event, to dynamically load
the exe in question and branch to it's main()? And if that starter does
create the new exe's (and its linked dll's) optional hardlinks before
dynamically loading them, even the timeframe mentioned above would be gone.
Post by Corinna Vinschen
- Given that the /bin directory alone can be easily 0.5 Gigs and more,
the cache(s) can take as much memory. This really asks for some
cleanup mechanism.
Of course unused hardlinks have to be removed at some time. However,
without performing updates (of dlls in use), only hardlinks would be
cached, which shouldn't consume a lot of diskspace.
Post by Corinna Vinschen
- The heretical question of course: Is the underlying problem really
worth the additional overhead? The patch is pretty intrusive.
The underlying problem is:
Gentoo Prefix breaks on Cygwin with current fork implementation. OTOH,
both - enabling the hardlink creation plus the performance overhead - is
acceptable to me (and for now) to allow for Gentoo Prefix on Cygwin.
The alternative - to not have some POSIX-like buildsystem on Windows (since
Interix is gone) for our otherwise portable application - is... an issue.
While our final application is built using the MSVC toolchain - wrapped
to some gcc-like commandline tools using parity[1] (which provides some
additional features like preloading, embedded rpath, transparent
static/dynamic/runtime linking), the application's build system relies
on a POSIX-like system. We have tried using Linux with remote execution[2]
of the MSVC toolchain, but that failed due to filesystem sync issues.
[1] http://sourceforge.net/projects/parity/
[2] https://github.com/mduft/rex/
Post by Corinna Vinschen
Is there a simpler way to achieve the same or, at least, a similar
result?
Hmm - most likely there is a faster way than the current patch,
but I doubt there is a simpler way...
Post by Corinna Vinschen
Thank you,
Corinna
Thank you!
/haubi/
Corinna Vinschen
2015-07-29 13:22:11 UTC
Permalink
Hi Michael,
Post by Michael Haubenwallner
Hi Corinna,
Post by Corinna Vinschen
When starting to port Gentoo Prefix to Cygwin, the first real problem
discovered is that fork() does use the original executable's location
Unfortunately there's some red tape to get over with, first. We need a
copyright assignment from you before we can go much further.
Copyright assignment is in progress.
Cool.
Post by Michael Haubenwallner
Post by Corinna Vinschen
- /proc is already available as virtual filesystem as on Linux.
[blah]
Also, using the Windows PID as dir name seems a bit weird, given that
the virtual /proc obviously uses the Cygwin PID. This sounds like a
source for confusion.
There's no particular reason for /proc/ actually - just came to my mind
first. I've also seen /run/ on recent Linux boxes...
Yeah, /run might be a good option, albeit there may be installations
out there already using this path for their own dubious purposes.
Reusing a path existing in a cygwin installation by default would
avoid collisions. /var/run perhaps.
Post by Michael Haubenwallner
I don't want Cygwin itself to require NTFS, but Gentoo Prefix only - which
IMHO is a corner use-case for Cygwin, but requires an updates-protected fork.
Some people use Cygwin from a USB stick.
Post by Michael Haubenwallner
However, I've been using Interix before - and Cygwin feels faster even
with hardlinks enabled.
FTR: Me too, and I have not the faintest idea why, given that Interix
can fork natively while Cygwin has to go to great lengths to emulate it.
Post by Michael Haubenwallner
And I do prefer "slow" over "broken" setups.
...with a catch...
Post by Michael Haubenwallner
Post by Corinna Vinschen
Did you run tests to find out the cost of this additional overhead?
$ ( time for x in {1..3}; do
rm -rf cygwin-2.2.0-0.1.x86_64 ;
cygport -8 cygwin.cygport prep compile install ;
done ) > outfile 2>&1
real 31m35.530s
user 30m11.245s
sys 22m20.365s
real 52m26.049s
user 30m39.820s
sys 26m41.637s
So roughly 66% slowdown. It's quite a lot...
Post by Michael Haubenwallner
Post by Corinna Vinschen
*) dll-redirection for LoadLibrary using "app.exe.local" file does operate on
the dll's basename only, breaking perl's Hash::Util and List::Util at least.
So creating hardlinks for dynamically loaded dlls is disabled for now.
Eventually, manifests and/or app.exe.config could help here, but I'm still
failing to really grok them...
Hmm. The DLLs are loaded dynamically anyway, so they will be loaded
dynamically in the child as well in dll_list::load_after_fork_impl. Why
not simply hardlinking them using a unique filename (e.g. using the
inode number), storing the unique number or name in the dll struct and
then calling LoadLibrary on this name?
This might be necessary in the initial dlopen() already: I've tried hardlinks
for loaded dlls mangling the full path into the hardlink's filename, but
encountered different load addresses in the child - most likely due to the
now different dll's filename.
Huh? That shouldn't happen. The address is determined by the file's
PE/COFF header, not by the name. However, did you reuse the name field
in the dll structure or did you create another name field for the
mangled name? In the first case there may be some checks in dll_init.cc
not working. That's why I said to use an extra field for the mangled
name.
Post by Michael Haubenwallner
Post by Corinna Vinschen
- What if a EXE/DLL is replace more than once during the lifetime of
a process?
This wouldn't make any difference: The hardlinks are created upon the first
use of some exe/dll in parent (even if that process won't ever use fork),
So, here's a question. What if the directory is only created on
first fork? Given that only few processes actually call fork, shouldn't
that speed up typical usage profiles a lot? Even with `configure' or
`make', at least half of the involved processes don't fork.
Post by Michael Haubenwallner
and the forked child gets the parent's first-use versions. Still there is
a short timeframe between process startup and hardlink creation, but that
is not a real problem (yet).
This may be even academical, but something to keep in mind.
Post by Michael Haubenwallner
Post by Corinna Vinschen
- What about reducing the overhead by implementing some kind of generic
exe/dll cache used by all processes? It would reduce the requirement
to cleanup, reduce the footprint of the cache, speed up subsequent
forks.
I'm all for it, but I've no idea of currently available cross-process
mechanisms in cygwin/windows that could help here ...
Yeah, scratching my head myself, but we might want to discuss it
nevertheless. Maybe sombody has a good idea?
Post by Michael Haubenwallner
But: From the LoadLibrary docs I've got the impression that even an exe
can be dynamically loaded. Iff it would be possible to branch to an exe's
main() - what if there is some pool of cygwin-starter processes, that do
nothing but wait for some cygwin-process-start event, to dynamically load
the exe in question and branch to it's main()? And if that starter does
create the new exe's (and its linked dll's) optional hardlinks before
dynamically loading them, even the timeframe mentioned above would be gone.
rhis sounds hellishly complicated. Apart from the code complexity,
who's roing to set up the process pool? Who's managing the pool?
Post by Michael Haubenwallner
Post by Corinna Vinschen
the cache(s) can take as much memory. This really asks for some
cleanup mechanism.
Of course unused hardlinks have to be removed at some time. However,
without performing updates (of dlls in use), only hardlinks would be
cached, which shouldn't consume a lot of diskspace.
Performing updates was the idea, but I see your point.
Post by Michael Haubenwallner
Post by Corinna Vinschen
- The heretical question of course: Is the underlying problem really
worth the additional overhead? The patch is pretty intrusive.
Gentoo Prefix breaks on Cygwin with current fork implementation. OTOH,
both - enabling the hardlink creation plus the performance overhead - is
acceptable to me (and for now) to allow for Gentoo Prefix on Cygwin.
The alternative - to not have some POSIX-like buildsystem on Windows (since
Interix is gone) for our otherwise portable application - is... an issue.
Here's the catch. What you're doing is a deviation from how Cygwin
is trying to operate. If at all possible, Cygwin applications should
run in any environment. Cygwin is just some "operating system", and
despite striving for POSIX compatibility, we can't manage it under all
circumstances.

This in turn usually requires porting. Any application running under
multiple OSes has code to make sure differences in the various OSes
(and there are lots of them, even between the supposedly POSIX compatible
ones) are handled gracefully.

So you'd usually port gentoo prefix to Cygwin, not vice versa. And
to close the loop, your change to Cygwin requires to change the users'
environment, plus a noticable slowdown of the entire installation, just
to be able to run your application.

I'd expect that gentoo prefix, if there *is* an interest to port it to
Cygwin, would try to run under Cygwin as is. And it should preferredly
run under Cygwin in any environment, not only in the environment adding
the exe/dll hardlinks.

Do you understand what bugs me?
Post by Michael Haubenwallner
Post by Corinna Vinschen
Is there a simpler way to achieve the same or, at least, a similar
result?
Hmm - most likely there is a faster way than the current patch,
but I doubt there is a simpler way...
Your patch is rather intrusive. It's not "simple" as I understand it.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Michael Haubenwallner
2015-07-31 11:32:12 UTC
Permalink
Hi Corinna,
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
When starting to port Gentoo Prefix to Cygwin, the first real problem
discovered is that fork() does use the original executable's location
Unfortunately there's some red tape to get over with, first. We need a
copyright assignment from you before we can go much further.
Copyright assignment submitted.
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
- /proc is already available as virtual filesystem as on Linux.
[blah]
Also, using the Windows PID as dir name seems a bit weird, given that
the virtual /proc obviously uses the Cygwin PID. This sounds like a
source for confusion.
For the moment, using Windows PID as directory name is necessary, as the
Cygwin PID may be shared by multiple Windows processes, which feels like
it would require more sophisticated setup/cleanup logic.
Post by Corinna Vinschen
Post by Michael Haubenwallner
There's no particular reason for /proc/ actually - just came to my mind
first. I've also seen /run/ on recent Linux boxes...
Yeah, /run might be a good option, albeit there may be installations
out there already using this path for their own dubious purposes.
Reusing a path existing in a cygwin installation by default would
avoid collisions. /var/run perhaps.
've updated the patch to use /var/run/wproc/<sid>/<ntpid>/ now,
where /var/run/wproc/ needs to be created manually for enabling.
Post by Corinna Vinschen
Post by Michael Haubenwallner
I don't want Cygwin itself to require NTFS, but Gentoo Prefix only - which
IMHO is a corner use-case for Cygwin, but requires an updates-protected fork.
Some people use Cygwin from a USB stick.
Cool idea - does make sense of course. But to not slow down Cygwin,
one should not create /var/run/wproc/ on the USB stick...
Post by Corinna Vinschen
Post by Michael Haubenwallner
However, I've been using Interix before - and Cygwin feels faster even
with hardlinks enabled.
FTR: Me too, and I have not the faintest idea why, given that Interix
can fork natively while Cygwin has to go to great lengths to emulate it.
Won't be surprised actually if Interix does similar - just hidden as "native".

<snip>
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
*) dll-redirection for LoadLibrary using "app.exe.local" file does operate on
the dll's basename only, breaking perl's Hash::Util and List::Util at least.
So creating hardlinks for dynamically loaded dlls is disabled for now.
Eventually, manifests and/or app.exe.config could help here, but I'm still
failing to really grok them...
Hmm. The DLLs are loaded dynamically anyway, so they will be loaded
dynamically in the child as well in dll_list::load_after_fork_impl. Why
not simply hardlinking them using a unique filename (e.g. using the
inode number), storing the unique number or name in the dll struct and
then calling LoadLibrary on this name?
This might be necessary in the initial dlopen() already: I've tried hardlinks
for loaded dlls mangling the full path into the hardlink's filename, but
encountered different load addresses in the child - most likely due to the
now different dll's filename.
Huh? That shouldn't happen. The address is determined by the file's
PE/COFF header, not by the name. However, did you reuse the name field
in the dll structure or did you create another name field for the
mangled name? In the first case there may be some checks in dll_init.cc
not working. That's why I said to use an extra field for the mangled
name.
Fixed now. The basename of the loaded dll has to be preserved, so it can
be found in the child as "already loaded" link-dep of another dll that
is loaded afterwards.
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
- What if a EXE/DLL is replace more than once during the lifetime of
a process?
This wouldn't make any difference: The hardlinks are created upon the first
use of some exe/dll in parent (even if that process won't ever use fork),
So, here's a question. What if the directory is only created on
first fork? Given that only few processes actually call fork, shouldn't
that speed up typical usage profiles a lot? Even with `configure' or
`make', at least half of the involved processes don't fork.
Yeah - but how to create the original file-name (in another directory) so it
does refer to the original inode number, when the original file-name has been
renamed/unlinked during the upgrade? This is why I create the hardlink at
load-time already, where the original file-name is still available.

And WTF is ReFS? Is NTFS the next dead horse I'm gonna ride after Interix?
Post by Corinna Vinschen
Post by Michael Haubenwallner
and the forked child gets the parent's first-use versions. Still there is
a short timeframe between process startup and hardlink creation, but that
is not a real problem (yet).
This may be even academical, but something to keep in mind.
Post by Michael Haubenwallner
Post by Corinna Vinschen
- What about reducing the overhead by implementing some kind of generic
exe/dll cache used by all processes? It would reduce the requirement
to cleanup, reduce the footprint of the cache, speed up subsequent
forks.
I'm all for it, but I've no idea of currently available cross-process
mechanisms in cygwin/windows that could help here ...
Yeah, scratching my head myself, but we might want to discuss it
nevertheless. Maybe sombody has a good idea?
Just found NtCreateFile (CreateOptions=FILE_OPEN_BY_FILE_ID) - but I've not
found a mechanism yet to re-create a fully useable filesystem entry out of
the FILE_ID and/or the HMODULE only.

<snip>
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
- The heretical question of course: Is the underlying problem really
worth the additional overhead? The patch is pretty intrusive.
Gentoo Prefix breaks on Cygwin with current fork implementation. OTOH,
both - enabling the hardlink creation plus the performance overhead - is
acceptable to me (and for now) to allow for Gentoo Prefix on Cygwin.
The alternative - to not have some POSIX-like buildsystem on Windows (since
Interix is gone) for our otherwise portable application - is... an issue.
Here's the catch. What you're doing is a deviation from how Cygwin
is trying to operate. If at all possible, Cygwin applications should
run in any environment. Cygwin is just some "operating system", and
despite striving for POSIX compatibility, we can't manage it under all
circumstances.
This in turn usually requires porting. Any application running under
multiple OSes has code to make sure differences in the various OSes
(and there are lots of them, even between the supposedly POSIX compatible
ones) are handled gracefully.
So you'd usually port gentoo prefix to Cygwin, not vice versa. And
to close the loop, your change to Cygwin requires to change the users'
environment, plus a noticable slowdown of the entire installation, just
to be able to run your application.
I'd expect that gentoo prefix, if there *is* an interest to port it to
Cygwin, would try to run under Cygwin as is. And it should preferredly
run under Cygwin in any environment, not only in the environment adding
the exe/dll hardlinks.
Do you understand what bugs me?
I think I do understand - and I do agree for your point of view!

But still, before trying to work around any issues with the underlying OS,
I prefer to fix them if possible.

OTOH: For the moment, I'm in an evaluation phase whether Gentoo Prefix can
be ported to Windows - be it restricted to NTFS only, as FAT is ancient,
even if still used on USB sticks.

Now for Windows as the underlying OS, an existing workaround for the missing
fork() is available in newlib-cygwin package already, even if that one needs
some patching to allow for Gentoo Prefix. Although patching is business as
usual in Gentoo Prefix too - though upstream acceptable patches are preferred.

Still, for Gentoo Prefix I do prefer to run on "Cygwin" rather than on
"Windows", even if that feels like nitpicking. But probably I have to
mirror my own cygwin distro anyway for complete long term support...:
http://video.fosdem.org/2015/devroom-distributions/providing_an_lts_distro_with_gentoo_prefix.mp4
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
Is there a simpler way to achieve the same or, at least, a similar
result?
Hmm - most likely there is a faster way than the current patch,
but I doubt there is a simpler way...
Your patch is rather intrusive. It's not "simple" as I understand it.
Was it really myself that called this patch "simple"? ;)

Thanks!
/haubi/
Corinna Vinschen
2015-08-03 20:21:58 UTC
Permalink
Hi Michael,
Post by Michael Haubenwallner
Hi Corinna,
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
When starting to port Gentoo Prefix to Cygwin, the first real problem
discovered is that fork() does use the original executable's location
Unfortunately there's some red tape to get over with, first. We need a
copyright assignment from you before we can go much further.
Copyright assignment submitted.
By snail mail or by mail to ges-info AT redhat DOT com? We didn't
receive a mail yet...
Post by Michael Haubenwallner
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
- /proc is already available as virtual filesystem as on Linux.
[blah]
Also, using the Windows PID as dir name seems a bit weird, given that
the virtual /proc obviously uses the Cygwin PID. This sounds like a
source for confusion.
For the moment, using Windows PID as directory name is necessary, as the
Cygwin PID may be shared by multiple Windows processes, which feels like
it would require more sophisticated setup/cleanup logic.
Yeah, that's a secondary problem for now.
Post by Michael Haubenwallner
Post by Corinna Vinschen
Post by Michael Haubenwallner
There's no particular reason for /proc/ actually - just came to my mind
first. I've also seen /run/ on recent Linux boxes...
Yeah, /run might be a good option, albeit there may be installations
out there already using this path for their own dubious purposes.
Reusing a path existing in a cygwin installation by default would
avoid collisions. /var/run perhaps.
've updated the patch to use /var/run/wproc/<sid>/<ntpid>/ now,
where /var/run/wproc/ needs to be created manually for enabling.
Ok.
Post by Michael Haubenwallner
[...]
Post by Corinna Vinschen
Post by Michael Haubenwallner
However, I've been using Interix before - and Cygwin feels faster even
with hardlinks enabled.
FTR: Me too, and I have not the faintest idea why, given that Interix
can fork natively while Cygwin has to go to great lengths to emulate it.
Won't be surprised actually if Interix does similar - just hidden as "native".
Well, afaics they are using the native calls under the hood. Are you
comparing interix w/ the 64 bit Cygwin by any chance? Isn't Interix
running in 32 bit and thus under WOW64 only?
Post by Michael Haubenwallner
<snip>
Post by Corinna Vinschen
Post by Michael Haubenwallner
I've tried hardlinks
for loaded dlls mangling the full path into the hardlink's filename, but
encountered different load addresses in the child - most likely due to the
now different dll's filename.
Huh? That shouldn't happen. The address is determined by the file's
PE/COFF header, not by the name. However, did you reuse the name field
in the dll structure or did you create another name field for the
mangled name? In the first case there may be some checks in dll_init.cc
not working. That's why I said to use an extra field for the mangled
name.
Fixed now. The basename of the loaded dll has to be preserved, so it can
be found in the child as "already loaded" link-dep of another dll that
is loaded afterwards.
Cool.
Post by Michael Haubenwallner
Post by Corinna Vinschen
[...]
So, here's a question. What if the directory is only created on
first fork? Given that only few processes actually call fork, shouldn't
that speed up typical usage profiles a lot? Even with `configure' or
`make', at least half of the involved processes don't fork.
Yeah - but how to create the original file-name (in another directory) so it
does refer to the original inode number, when the original file-name has been
renamed/unlinked during the upgrade?
A process could open a handle to the file by itself at DLL load time and
just keep it in the dll table until the file is unloaded. The original
filename is stored at DLL load time anyway.

The real action could then take place at fork time since the handle to
the file is still valid, even after rename/unlink.
Post by Michael Haubenwallner
And WTF is ReFS? Is NTFS the next dead horse I'm gonna ride after Interix?
Don't ask. ReFS is a filesystem designed for "big data". I don't think
(and really *really* hope) it doesn't replace NTFS any time soon.
Post by Michael Haubenwallner
Post by Corinna Vinschen
Post by Michael Haubenwallner
Post by Corinna Vinschen
- What about reducing the overhead by implementing some kind of generic
exe/dll cache used by all processes? It would reduce the requirement
to cleanup, reduce the footprint of the cache, speed up subsequent
forks.
I'm all for it, but I've no idea of currently available cross-process
mechanisms in cygwin/windows that could help here ...
Yeah, scratching my head myself, but we might want to discuss it
nevertheless. Maybe sombody has a good idea?
Just found NtCreateFile (CreateOptions=FILE_OPEN_BY_FILE_ID) - but I've not
found a mechanism yet to re-create a fully useable filesystem entry out of
the FILE_ID and/or the HMODULE only.
The HMODULE is only the address of the section so, no, there's no way to
do an NtCreateFile with this information alone. As for the file id, I
never tried NtOpenFile w/ FILE_OPEN_BY_FILE_ID. The usage description
in the WDK docs is a bit vague, but it does work. You just have to have
the file id from some earlier call to NtQueryInformationFile.
Post by Michael Haubenwallner
<snip>
Post by Corinna Vinschen
Here's the catch. What you're doing is a deviation from how Cygwin
is trying to operate. If at all possible, Cygwin applications should
run in any environment. Cygwin is just some "operating system", and
despite striving for POSIX compatibility, we can't manage it under all
circumstances.
This in turn usually requires porting. Any application running under
multiple OSes has code to make sure differences in the various OSes
(and there are lots of them, even between the supposedly POSIX compatible
ones) are handled gracefully.
So you'd usually port gentoo prefix to Cygwin, not vice versa. And
to close the loop, your change to Cygwin requires to change the users'
environment, plus a noticable slowdown of the entire installation, just
to be able to run your application.
I'd expect that gentoo prefix, if there *is* an interest to port it to
Cygwin, would try to run under Cygwin as is. And it should preferredly
run under Cygwin in any environment, not only in the environment adding
the exe/dll hardlinks.
Do you understand what bugs me?
I think I do understand - and I do agree for your point of view!
But still, before trying to work around any issues with the underlying OS,
I prefer to fix them if possible.
The problem is that it's not a generic fix but one requiring to change
the behaviour of the Cygwin DLL in a way which noticably affects the
entire installation. The current performance hit is rather cruel, just
so that *one* package works as desired...
Post by Michael Haubenwallner
[...]
Still, for Gentoo Prefix I do prefer to run on "Cygwin" rather than on
"Windows", even if that feels like nitpicking.
Not as far as I'm concerned.
Post by Michael Haubenwallner
Post by Corinna Vinschen
Your patch is rather intrusive. It's not "simple" as I understand it.
Was it really myself that called this patch "simple"? ;)
Heh :)


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
cyg Simple
2015-08-07 15:01:59 UTC
Permalink
Post by Corinna Vinschen
The HMODULE is only the address of the section so, no, there's no way to
do an NtCreateFile with this information alone. As for the file id, I
never tried NtOpenFile w/ FILE_OPEN_BY_FILE_ID. The usage description
in the WDK docs is a bit vague, but it does work. You just have to have
the file id from some earlier call to NtQueryInformationFile.
I seem to remember that the file id isn't guaranteed to be constant on
Windows FS, especially FAT, unless you keep the file open.

--
cyg Simple
Corinna Vinschen
2015-08-07 16:18:41 UTC
Permalink
Post by cyg Simple
Post by Corinna Vinschen
The HMODULE is only the address of the section so, no, there's no way to
do an NtCreateFile with this information alone. As for the file id, I
never tried NtOpenFile w/ FILE_OPEN_BY_FILE_ID. The usage description
in the WDK docs is a bit vague, but it does work. You just have to have
the file id from some earlier call to NtQueryInformationFile.
I seem to remember that the file id isn't guaranteed to be constant on
Windows FS, especially FAT, unless you keep the file open.
FAT doesn't support opening files by file ID. On NTFS, file IDs are
constant and unique per file as long as it exists. Se the remarks
secion in

https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788%28v=vs.85%29.aspx


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
cyg Simple
2015-08-07 16:37:21 UTC
Permalink
Post by Corinna Vinschen
Post by cyg Simple
Post by Corinna Vinschen
The HMODULE is only the address of the section so, no, there's no way to
do an NtCreateFile with this information alone. As for the file id, I
never tried NtOpenFile w/ FILE_OPEN_BY_FILE_ID. The usage description
in the WDK docs is a bit vague, but it does work. You just have to have
the file id from some earlier call to NtQueryInformationFile.
I seem to remember that the file id isn't guaranteed to be constant on
Windows FS, especially FAT, unless you keep the file open.
FAT doesn't support opening files by file ID. On NTFS, file IDs are
constant and unique per file as long as it exists. Se the remarks
secion in
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788%28v=vs.85%29.aspx
Okay good. The first paragraph was what I was remembering.

But ReplaceFile could use the same ID on the replacement file; that
could cause issues if you expect the name to be the same.

--
cyg Simple
Corinna Vinschen
2015-08-07 16:58:06 UTC
Permalink
Post by cyg Simple
Post by Corinna Vinschen
Post by cyg Simple
Post by Corinna Vinschen
The HMODULE is only the address of the section so, no, there's no way to
do an NtCreateFile with this information alone. As for the file id, I
never tried NtOpenFile w/ FILE_OPEN_BY_FILE_ID. The usage description
in the WDK docs is a bit vague, but it does work. You just have to have
the file id from some earlier call to NtQueryInformationFile.
I seem to remember that the file id isn't guaranteed to be constant on
Windows FS, especially FAT, unless you keep the file open.
FAT doesn't support opening files by file ID. On NTFS, file IDs are
constant and unique per file as long as it exists. Se the remarks
secion in
https://msdn.microsoft.com/en-us/library/windows/desktop/aa363788%28v=vs.85%29.aspx
Okay good. The first paragraph was what I was remembering.
But ReplaceFile could use the same ID on the replacement file; that
could cause issues if you expect the name to be the same.
Not here, afaics. As long as the DLL is in use (and it certainly is in
this scenario) it keeps its file ID, even if it got moved to the trash.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Loading...