Discussion:
Extend faq.using to discuss fork failures
Ryan Johnson
2011-08-19 13:43:10 UTC
Permalink
Hi all,

I propose to add an entry to cygwin's faq.using which covers fork
failures. Frankly, I'm surprised it wasn't there years ago... it's
certainly frequently-asked, and the answer is always the same. Right now
users have to trawl the archives to figure out what to do (or more
likely, just blindly spam the list and get told to rebase and/or trawl
the list archives).

Also, what is the status of "the spawn family of calls provided by
Cygwin" [1]? There's nothing about it at the API page [2], and a search
though the user guide [3] comes up empty as well. Searching /usr/include
turns up only /usr/include/process.h, which contains only the function
declarations and a single comment -- "This file comes with MSDOS and
WIN32 systems" -- indicating that Windows, not cygwin, provides the
functions (which, incidentally, are deprecated in favor of the
posix-compliant _spawn* instead [4]). Would it make sense to update the
docs to mention these are native Windows functions, and update the
headers to include the non-deprecated function signatures?

[1] http://www.cygwin.com/cygwin-ug-net/highlights.html#ov-hi-process
[2] http://cygwin.com/cygwin-api/
[3] http://cygwin.com/cygwin-ug-net/cygwin-ug-net-nochunks.html.gz
[4] http://msdn.microsoft.com/en-us/library/ms235383%28v=vs.80%29.aspx

Seed text below...

Thoughts?
Ryan

Why does fork fail so often on my system?

Unix-like applications make extensive use of fork(), a function which
spawns an exact copy of the running process. Notable fork-using
applications include bash (and bash scripts), make, gcc, python, ruby,
perl and emacs. Unfortunately, the Windows ecosystem is quite hostile to
a reliable fork implementation, and reports of fork failures are
probably the single most common thread topic in the cygwin mailing list.

Common error messages include:
- unable to remap $dll to same address as parent
- couldn't allocate heap
- died waiting for dll loading
- child -1 - died waiting for longjmp before initialization
- STATUS_ACCESS_VIOLATION
- resource temporarily unavailable

The problem often (re)appears or worsens after installing up updating
cygwin packages (which can undo the effects of rebaseall and peflagsall,
see below). Applications which dynamically compile and load dlls (e.g.
perl, ruby, some lisps, building gcc from sources) are also especially
prone to fork failures for the same reason. Fork failures in general
also became significantly more common with the introduction of Vista and
Win7, whose address space layout randomization (ASLR) often causes child
processes to spawn with dlls, thread stacks, heaps, and other memory
objects allocated in different locations than the parent. While cygwin
compensates for as many of these relocations as possible, there always
remains a possibility of fork failures.

If you find that frequent fork failures interfere with normal use of
cygwin, please try the following steps:

1. Disable or uninstall applications known to interfere with cygwin (see
http://cygwin.com/faq/faq.using.html#faq.using.bloda). Many of them
inject dlls into processes at inconsistent locations, which breaks
fork() semantics.

2. Rebase your system (see /usr/share/doc/Cygwin/rebase-3.0.1.README).
Every dll in the system specifies a base address -- the preferred memory
location it should load at -- and the Windows loader does not break ties
consistently when it encounters base address conflicts.

3. With Vista and later, use peflagsall to set the TS-aware bit on all
cygwin dlls (see /usr/share/doc/Cygwin/rebase-3.0.1.README, reboot
needed for changes to take effect). This exploits a side effect of
address space layout randomization which (ironically) causes dlls to
nearly always load at the same address.

4. If you have access to the source code of the offending application
(this applies to all cygwin packages), consider replacing calls to
fork() with calls to the spawn family of functions. These are a native
(= reliable and highly efficient) replacement for fork+exec, which is by
far the most common usage of fork(), and are documented at
http://msdn.microsoft.com/en-us/library/20y988d2%28v=VS.100%29.aspx.
Corinna Vinschen
2011-08-19 14:35:15 UTC
Permalink
Post by Ryan Johnson
Hi all,
I propose to add an entry to cygwin's faq.using which covers fork
failures. Frankly, I'm surprised it wasn't there years ago... it's
certainly frequently-asked, and the answer is always the same. Right
now users have to trawl the archives to figure out what to do (or
more likely, just blindly spam the list and get told to rebase
and/or trawl the list archives).
If you convert the text into a patch against faq-using.xml and send
it to cygwin-patches, I'd take it.
Post by Ryan Johnson
Also, what is the status of "the spawn family of calls provided by
Cygwin" [1]? There's nothing about it at the API page [2], and a
search though the user guide [3] comes up empty as well.
Have a look into the section called "Process Creation". Granted, it's
not much. These functions are practically foster children only.
Post by Ryan Johnson
Searching
/usr/include turns up only /usr/include/process.h, which contains
only the function declarations and a single comment -- "This file
comes with MSDOS and WIN32 systems" -- indicating that Windows, not
cygwin, provides the functions (which, incidentally, are deprecated
No, no, Cygwin provides these functions as well. Apart from that, the
process.h file is a problem since it duplicates the exec function
declarations which are given in sys/unistd.h. I'll remove them. If you
want to document them as special Cygwin functions, feel free to add a
spawn.sgml file to winsup/cygwin which can be included into the section
about Cygwin-specific functions, like, for instance, path.sgml or
security.sgml.

Your text looks good, except...
Post by Ryan Johnson
3. With Vista and later, use peflagsall to set the TS-aware bit on
all cygwin dlls (see /usr/share/doc/Cygwin/rebase-3.0.1.README,
reboot needed for changes to take effect). This exploits a side
effect of address space layout randomization which (ironically)
causes dlls to nearly always load at the same address.
I'm not sure I ever read about that. On one hand, the TS-aware flag is
set by gcc 4.x automatically, on the other hand, the TS flag is only
relevant for actual Terminal Servers.
Do you mean the dynamicbase flag, maybe, as described by Chuck at one
point, years ago, on the cygwin-apps list? Still, I doubt that this
flag has any positive effect, as far as I understand how it works.


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2011-08-19 15:33:24 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
Hi all,
I propose to add an entry to cygwin's faq.using which covers fork
failures. Frankly, I'm surprised it wasn't there years ago... it's
certainly frequently-asked, and the answer is always the same. Right
now users have to trawl the archives to figure out what to do (or
more likely, just blindly spam the list and get told to rebase
and/or trawl the list archives).
If you convert the text into a patch against faq-using.xml and send
it to cygwin-patches, I'd take it.
Silly me... I should have realized that even the web site's content
would be under revision control.
Post by Corinna Vinschen
Post by Ryan Johnson
Also, what is the status of "the spawn family of calls provided by
Cygwin" [1]? There's nothing about it at the API page [2], and a
search though the user guide [3] comes up empty as well.
Have a look into the section called "Process Creation". Granted, it's
not much. These functions are practically foster children only.
I was quoting from the Process Creation section. I can't find anywhere
else that mentions spawn at all.
Post by Corinna Vinschen
Post by Ryan Johnson
Searching
/usr/include turns up only /usr/include/process.h, which contains
only the function declarations and a single comment -- "This file
comes with MSDOS and WIN32 systems" -- indicating that Windows, not
cygwin, provides the functions (which, incidentally, are deprecated
No, no, Cygwin provides these functions as well.
Does that mean Cygwin provides an independent implementation of the
functions which should be used instead of the ones from Windows, or just
that those functions are among the windows-native calls which cygwin
makes available out of the box?
Post by Corinna Vinschen
Apart from that, the
process.h file is a problem since it duplicates the exec function
declarations which are given in sys/unistd.h. I'll remove them. If you
want to document them as special Cygwin functions, feel free to add a
spawn.sgml file to winsup/cygwin which can be included into the section
about Cygwin-specific functions, like, for instance, path.sgml or
security.sgml.
Whether that makes sense (and what gets written) would depend on the
answer to the above, I think.
Post by Corinna Vinschen
Post by Ryan Johnson
3. With Vista and later, use peflagsall to set the TS-aware bit on
all cygwin dlls (see /usr/share/doc/Cygwin/rebase-3.0.1.README,
reboot needed for changes to take effect). This exploits a side
effect of address space layout randomization which (ironically)
causes dlls to nearly always load at the same address.
I'm not sure I ever read about that. On one hand, the TS-aware flag is
set by gcc 4.x automatically, on the other hand, the TS flag is only
relevant for actual Terminal Servers.
Do you mean the dynamicbase flag, maybe, as described by Chuck at one
point, years ago, on the cygwin-apps list? Still, I doubt that this
flag has any positive effect, as far as I understand how it works.
Oops... up to now I always thought tsaware was the flag that affected ASLR.

Reading from this MSDN article [1] clarified things a bit for me. All
dynamicbase-marked dlls are randomized (including all system dlls).
Unmarked dlls, and those which load them, will not be randomized. So,
rebasing dynamicbase dlls would seem to mean little or nothing, and we
might be able to get away with not rebasing dynamicbase dlls at all (on
Vista and later, of course). In particular, shipping dynamicbase dlls
would greatly reduce the need to run rebaseall, because the
newly-arrived/clobbered dlls would go right into Windows' ASLR bitmap.
Further, dlls which rebaseall can't catch (like those created and
dlopened dynamically) might also get a bit of a break. I guess that
argues for changing gcc/binutils rather than running peflagsall, tho.

Also, we might want to consider *not* marking .exe dynamicbase, because
"When a thread starts in a process linked with /DYNAMICBASE, Windows
Vista and later moves the thread's stack to a random location." There's
no way to turn off heap or mmap randomization, tho, AFAICT.

[1] http://msdn.microsoft.com/en-us/library/bb430720.aspx

Ryan
Corinna Vinschen
2011-08-19 15:56:18 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
No, no, Cygwin provides these functions as well.
Does that mean Cygwin provides an independent implementation of the
functions which should be used instead of the ones from Windows, or
just that those functions are among the windows-native calls which
cygwin makes available out of the box?
You're misunderstanding how Cygwin works. There are no windows-native
calls which Cygwin makes available. Cygwin implements its own set
of spawn functions. See winsup/cygwin/spawn.cc.
Post by Ryan Johnson
Post by Corinna Vinschen
Apart from that, the
process.h file is a problem since it duplicates the exec function
declarations which are given in sys/unistd.h. I'll remove them. If you
want to document them as special Cygwin functions, feel free to add a
spawn.sgml file to winsup/cygwin which can be included into the section
about Cygwin-specific functions, like, for instance, path.sgml or
security.sgml.
Whether that makes sense (and what gets written) would depend on the
answer to the above, I think.
On second thought, I agree with Chris. The spawn functions should not
be mentioned at all, since they only exist on Windows usually. We should
rather implement posix_spawn and posix_spawnp at one point(*).

(*) http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
3. With Vista and later, use peflagsall to set the TS-aware bit on
all cygwin dlls (see /usr/share/doc/Cygwin/rebase-3.0.1.README,
reboot needed for changes to take effect). This exploits a side
effect of address space layout randomization which (ironically)
causes dlls to nearly always load at the same address.
I'm not sure I ever read about that. On one hand, the TS-aware flag is
set by gcc 4.x automatically, on the other hand, the TS flag is only
relevant for actual Terminal Servers.
Do you mean the dynamicbase flag, maybe, as described by Chuck at one
point, years ago, on the cygwin-apps list? Still, I doubt that this
flag has any positive effect, as far as I understand how it works.
Oops... up to now I always thought tsaware was the flag that affected ASLR.
Reading from this MSDN article [1] clarified things a bit for me.
All dynamicbase-marked dlls are randomized (including all system
dlls). Unmarked dlls, and those which load them, will not be
randomized. So, rebasing dynamicbase dlls would seem to mean little
or nothing, and we might be able to get away with not rebasing
dynamicbase dlls at all (on Vista and later, of course). In
particular, shipping dynamicbase dlls would greatly reduce the need
to run rebaseall, because the newly-arrived/clobbered dlls would go
right into Windows' ASLR bitmap. Further, dlls which rebaseall can't
catch (like those created and dlopened dynamically) might also get a
bit of a break. I guess that argues for changing gcc/binutils rather
than running peflagsall, tho.
Hmm, I'm wondering if that's a solution. AFAIK the number of ASLR DLL
slots is less than the number of DLLs we're shipping in the distro. The
problem is to make sure that the DLLs are loaded into the same spot in
parent and child process on fork. I'm not sure you can guarantee that
by setting the dynamicbase flags.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2011-08-19 16:11:42 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
No, no, Cygwin provides these functions as well.
Does that mean Cygwin provides an independent implementation of the
functions which should be used instead of the ones from Windows, or
just that those functions are among the windows-native calls which
cygwin makes available out of the box?
You're misunderstanding how Cygwin works. There are no windows-native
calls which Cygwin makes available. Cygwin implements its own set
of spawn functions. See winsup/cygwin/spawn.cc.
Gotcha. The code comment in process.h just got me confused.
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Apart from that, the
process.h file is a problem since it duplicates the exec function
declarations which are given in sys/unistd.h. I'll remove them. If you
want to document them as special Cygwin functions, feel free to add a
spawn.sgml file to winsup/cygwin which can be included into the section
about Cygwin-specific functions, like, for instance, path.sgml or
security.sgml.
Whether that makes sense (and what gets written) would depend on the
answer to the above, I think.
On second thought, I agree with Chris. The spawn functions should not
be mentioned at all, since they only exist on Windows usually. We should
rather implement posix_spawn and posix_spawnp at one point(*).
(*) http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html
That would be ideal, other than the SHTDI part...
Post by Corinna Vinschen
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Ryan Johnson
3. With Vista and later, use peflagsall to set the TS-aware bit on
all cygwin dlls (see /usr/share/doc/Cygwin/rebase-3.0.1.README,
reboot needed for changes to take effect). This exploits a side
effect of address space layout randomization which (ironically)
causes dlls to nearly always load at the same address.
I'm not sure I ever read about that. On one hand, the TS-aware flag is
set by gcc 4.x automatically, on the other hand, the TS flag is only
relevant for actual Terminal Servers.
Do you mean the dynamicbase flag, maybe, as described by Chuck at one
point, years ago, on the cygwin-apps list? Still, I doubt that this
flag has any positive effect, as far as I understand how it works.
Oops... up to now I always thought tsaware was the flag that affected ASLR.
Reading from this MSDN article [1] clarified things a bit for me.
All dynamicbase-marked dlls are randomized (including all system
dlls). Unmarked dlls, and those which load them, will not be
randomized. So, rebasing dynamicbase dlls would seem to mean little
or nothing, and we might be able to get away with not rebasing
dynamicbase dlls at all (on Vista and later, of course). In
particular, shipping dynamicbase dlls would greatly reduce the need
to run rebaseall, because the newly-arrived/clobbered dlls would go
right into Windows' ASLR bitmap. Further, dlls which rebaseall can't
catch (like those created and dlopened dynamically) might also get a
bit of a break. I guess that argues for changing gcc/binutils rather
than running peflagsall, tho.
Hmm, I'm wondering if that's a solution. AFAIK the number of ASLR DLL
slots is less than the number of DLLs we're shipping in the distro.
I see. In that case, would it makes sense to have gcc/binutils emit
dynamicbase dlls by default (to catch cases rebaseall doesn't handle
well) and then remove the flag for dlls we distribute, depending on
rebaseall to keep them in line? Only thing is, I don't know how ASLR
would interact with dlls that appear out of nowhere like that (I guess
it would work until ASLR bitmap fills?)

Ryan
Christopher Faylor
2011-08-19 16:15:21 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
(*) http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html
That would be ideal, other than the SHTDI part...
I actually told Linus Torvalds that I'd do it a few years ago. So
I'm sorta on the hook, I guess.

cgf
Corinna Vinschen
2011-08-19 16:28:06 UTC
Permalink
Post by Christopher Faylor
Post by Ryan Johnson
Post by Corinna Vinschen
(*) http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html
That would be ideal, other than the SHTDI part...
I actually told Linus Torvalds that I'd do it a few years ago. So
I'm sorta on the hook, I guess.
Now, *that's* why he wrote Linux. He was disappointed by the missing
posix_spawn functionality...


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2011-08-19 16:31:35 UTC
Permalink
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Ryan Johnson
Post by Corinna Vinschen
(*) http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html
That would be ideal, other than the SHTDI part...
I actually told Linus Torvalds that I'd do it a few years ago. So
I'm sorta on the hook, I guess.
Now, *that's* why he wrote Linux. He was disappointed by the missing
posix_spawn functionality...
Maybe but, this was in the git context. Someone was being mean to me on
the git list, before I stopped following it, and the idea of using
posix_spawn for speed came up. The assumption was that posix_spawn
would be faster than fork on Cygwin. But, if you look at all of the
things that these functions do, it is not, IMO, a foregone conclusion.

cgf
Corinna Vinschen
2011-08-19 16:36:08 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Ryan Johnson
Post by Corinna Vinschen
(*) http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn.html
That would be ideal, other than the SHTDI part...
I actually told Linus Torvalds that I'd do it a few years ago. So
I'm sorta on the hook, I guess.
Now, *that's* why he wrote Linux. He was disappointed by the missing
posix_spawn functionality...
Maybe but, this was in the git context. Someone was being mean to me on
the git list, before I stopped following it, and the idea of using
posix_spawn for speed came up. The assumption was that posix_spawn
would be faster than fork on Cygwin. But, if you look at all of the
things that these functions do, it is not, IMO, a foregone conclusion.
Yup, the definition of these functions makes me kind of fuzzy.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Charles Wilson
2011-08-19 16:13:51 UTC
Permalink
Post by Corinna Vinschen
Hmm, I'm wondering if that's a solution. AFAIK the number of ASLR DLL
slots is less than the number of DLLs we're shipping in the distro.
Yes, that's true as well -- although it's based on memory size: The
total address range managed by ASLR is 0x50000000 to 0x78000000, in
0x2800 64KB chunks, or 630MB. But, a lot of that is reserved for system
DLLs which are already ASLR'ed. There's simply not enough room there
for everything.
Post by Corinna Vinschen
The
problem is to make sure that the DLLs are loaded into the same spot in
parent and child process on fork. I'm not sure you can guarantee that
by setting the dynamicbase flags.
Nope. See the links in my previous post in this thread.

--
Chuck
Christopher Faylor
2011-08-19 14:41:58 UTC
Permalink
Post by Ryan Johnson
Hi all,
I propose to add an entry to cygwin's faq.using which covers fork
failures. Frankly, I'm surprised it wasn't there years ago... it's
certainly frequently-asked, and the answer is always the same. Right now
users have to trawl the archives to figure out what to do (or more
likely, just blindly spam the list and get told to rebase and/or trawl
the list archives).
Also, what is the status of "the spawn family of calls provided by
Cygwin" [1]? There's nothing about it at the API page [2], and a search
though the user guide [3] comes up empty as well. Searching /usr/include
turns up only /usr/include/process.h, which contains only the function
declarations and a single comment -- "This file comes with MSDOS and
WIN32 systems" -- indicating that Windows, not cygwin, provides the
functions (which, incidentally, are deprecated in favor of the
posix-compliant _spawn* instead [4]). Would it make sense to update the
docs to mention these are native Windows functions, and update the
headers to include the non-deprecated function signatures?
I appreciate that you're trying to do this. I was actually going to
ask someone if they wanted to write a section like this but assumed
I wouldn't get any takers.

Wrt, the spawn function, they harken from a time when Cygwin was
confused about what API it was exporting. They *are* deprecated. I
don't see any pressing need to document them.

(And, yes, we know about the posix functions with _spawn in their names)
Post by Ryan Johnson
[1] http://www.cygwin.com/cygwin-ug-net/highlights.html#ov-hi-process
[2] http://cygwin.com/cygwin-api/
[3] http://cygwin.com/cygwin-ug-net/cygwin-ug-net-nochunks.html.gz
[4] http://msdn.microsoft.com/en-us/library/ms235383%28v=vs.80%29.aspx
Seed text below...
Thoughts?
Ryan
Why does fork fail so often on my system?
I'd prefer something like "Is there a way to fix fork failures?"
Post by Ryan Johnson
..., and reports of fork failures are
probably the single most common thread topic in the cygwin mailing list.
I don't think comments like this are appropriate. If there was a
magical time when we've fixed fork failures (and Corinna's proposed
changes to run rebase during setup.exe should at least cut back on them)
then this would be out-of-date. It doesn't provide any useful information
to the user anyway.
Post by Ryan Johnson
- unable to remap $dll to same address as parent
- couldn't allocate heap
- died waiting for dll loading
- child -1 - died waiting for longjmp before initialization
- STATUS_ACCESS_VIOLATION
- resource temporarily unavailable
The problem often (re)appears or worsens after installing up updating
cygwin packages (which can undo the effects of rebaseall and peflagsall,
see below). Applications which dynamically compile and load dlls (e.g.
perl, ruby, some lisps, building gcc from sources) are also especially
prone to fork failures for the same reason. Fork failures in general
also became significantly more common with the introduction of Vista and
Win7, whose address space layout randomization (ASLR) often causes child
processes to spawn with dlls, thread stacks, heaps, and other memory
objects allocated in different locations than the parent. While cygwin
compensates for as many of these relocations as possible, there always
remains a possibility of fork failures.
If you find that frequent fork failures interfere with normal use of
1. Disable or uninstall applications known to interfere with cygwin (see
http://cygwin.com/faq/faq.using.html#faq.using.bloda). Many of them
inject dlls into processes at inconsistent locations, which breaks
fork() semantics.
2. Rebase your system (see /usr/share/doc/Cygwin/rebase-3.0.1.README).
Every dll in the system specifies a base address -- the preferred memory
location it should load at -- and the Windows loader does not break ties
consistently when it encounters base address conflicts.
3. With Vista and later, use peflagsall to set the TS-aware bit on all
cygwin dlls (see /usr/share/doc/Cygwin/rebase-3.0.1.README, reboot
needed for changes to take effect). This exploits a side effect of
address space layout randomization which (ironically) causes dlls to
nearly always load at the same address.
4. If you have access to the source code of the offending application
(this applies to all cygwin packages), consider replacing calls to
fork() with calls to the spawn family of functions. These are a native
(= reliable and highly efficient) replacement for fork+exec, which is by
far the most common usage of fork(), and are documented at
http://msdn.microsoft.com/en-us/library/20y988d2%28v=VS.100%29.aspx.
I appreciate your thoroughness but I think there are way too many words
above. The FAQ should be solution-oriented. If it is important to
discuss the details behind why fork() fails then maybe another section
could be added. Otherwise, I'd prefer to see something which shows the
error messages and then, as briefly as possible, shows solutions.

While people do ask "Why does fork fail?", the majority of the askers
don't really care. They are really asking "How do I make Cygwin fork
work?" So, I don't think that it is really FAQ-appropriate to dive
too deep here.

And, again, we don't want to tell people to use non-POSIX solutions
except as a last resort. Telling people to rewrite their source code
flies in the face of what Cygwin is trying to do.

(And, yes, I presciently can hear the argument to the above paragraph
coming)

cgf
Corinna Vinschen
2011-08-19 15:12:55 UTC
Permalink
Post by Christopher Faylor
Post by Ryan Johnson
4. If you have access to the source code of the offending application
(this applies to all cygwin packages), consider replacing calls to
fork() with calls to the spawn family of functions. These are a native
(= reliable and highly efficient) replacement for fork+exec, which is by
far the most common usage of fork(), and are documented at
http://msdn.microsoft.com/en-us/library/20y988d2%28v=VS.100%29.aspx.
[...]
And, again, we don't want to tell people to use non-POSIX solutions
except as a last resort. Telling people to rewrite their source code
flies in the face of what Cygwin is trying to do.
Oh boy, I missed that part. Yes, I agree fully with your point of view.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2011-08-19 16:01:18 UTC
Permalink
Post by Christopher Faylor
Post by Ryan Johnson
Hi all,
I propose to add an entry to cygwin's faq.using which covers fork
failures. Frankly, I'm surprised it wasn't there years ago... it's
certainly frequently-asked, and the answer is always the same. Right now
users have to trawl the archives to figure out what to do (or more
likely, just blindly spam the list and get told to rebase and/or trawl
the list archives).
I appreciate that you're trying to do this. I was actually going to
ask someone if they wanted to write a section like this but assumed
I wouldn't get any takers.
Well, at some point there will be significantly fewer characters typed
to make the patch than to keep answering emails complaining about fork
failures...
Post by Christopher Faylor
Wrt, the spawn function, they harken from a time when Cygwin was
confused about what API it was exporting. They *are* deprecated. I
don't see any pressing need to document them.
(And, yes, we know about the posix functions with _spawn in their names)
Should we nuke the corresponding text from the user guide and be done
with it, then?
Post by Christopher Faylor
Post by Ryan Johnson
..., and reports of fork failures are
probably the single most common thread topic in the cygwin mailing list.
I don't think comments like this are appropriate. If there was a
magical time when we've fixed fork failures (and Corinna's proposed
changes to run rebase during setup.exe should at least cut back on them)
then this would be out-of-date. It doesn't provide any useful information
to the user anyway.
Fair enough.

BTW, I hope that rebasing wouldn't require every invocation of setup.exe
to shut down all cygwin processes... I really like how right now I can
pull new packages some configure script needs, without having to shut
down 4-5 sessions of emacs.
Post by Christopher Faylor
Post by Ryan Johnson
- unable to remap $dll to same address as parent
- couldn't allocate heap
- died waiting for dll loading
- child -1 - died waiting for longjmp before initialization
- STATUS_ACCESS_VIOLATION
- resource temporarily unavailable
The problem often (re)appears or worsens after installing up updating
cygwin packages (which can undo the effects of rebaseall and peflagsall,
see below). Applications which dynamically compile and load dlls (e.g.
perl, ruby, some lisps, building gcc from sources) are also especially
prone to fork failures for the same reason. Fork failures in general
also became significantly more common with the introduction of Vista and
Win7, whose address space layout randomization (ASLR) often causes child
processes to spawn with dlls, thread stacks, heaps, and other memory
objects allocated in different locations than the parent. While cygwin
compensates for as many of these relocations as possible, there always
remains a possibility of fork failures.
If you find that frequent fork failures interfere with normal use of
1. Disable or uninstall applications known to interfere with cygwin (see
http://cygwin.com/faq/faq.using.html#faq.using.bloda). Many of them
inject dlls into processes at inconsistent locations, which breaks
fork() semantics.
2. Rebase your system (see /usr/share/doc/Cygwin/rebase-3.0.1.README).
Every dll in the system specifies a base address -- the preferred memory
location it should load at -- and the Windows loader does not break ties
consistently when it encounters base address conflicts.
3. With Vista and later, use peflagsall to set the TS-aware bit on all
cygwin dlls (see /usr/share/doc/Cygwin/rebase-3.0.1.README, reboot
needed for changes to take effect). This exploits a side effect of
address space layout randomization which (ironically) causes dlls to
nearly always load at the same address.
4. If you have access to the source code of the offending application
(this applies to all cygwin packages), consider replacing calls to
fork() with calls to the spawn family of functions. These are a native
(= reliable and highly efficient) replacement for fork+exec, which is by
far the most common usage of fork(), and are documented at
http://msdn.microsoft.com/en-us/library/20y988d2%28v=VS.100%29.aspx.
I appreciate your thoroughness but I think there are way too many words
above. The FAQ should be solution-oriented. If it is important to
discuss the details behind why fork() fails then maybe another section
could be added. Otherwise, I'd prefer to see something which shows the
error messages and then, as briefly as possible, shows solutions.
While people do ask "Why does fork fail?", the majority of the askers
don't really care. They are really asking "How do I make Cygwin fork
work?" So, I don't think that it is really FAQ-appropriate to dive
too deep here.
I'm definitely a fan of brevity. My main motivation for all the verbage
was so that users who read the faq aren't as shocked when they do all of
the above and fork still fails more often than they'd like, and so
they'd have some idea of which steps are most applicable to their situation.

Two sections might work very well: "How can I prevent fork failures?"
and "Why does fork() still fail after I run rebaseall?" Does that sound
good to you?
Post by Christopher Faylor
And, again, we don't want to tell people to use non-POSIX solutions
except as a last resort. Telling people to rewrite their source code
flies in the face of what Cygwin is trying to do.
(And, yes, I presciently can hear the argument to the above paragraph
coming)
You would prefer that it remain an unadvertized last resort, then?

I guess the idea is less imposing to me after porting lots of code
between linux/gcc and solaris/suncc.

Off topic: to be honest, I'd *love* it if bash, make, and gcc used spawn
instead of fork+exec when compiled under cygwin, though I don't know how
I/O redirection would fit in.

Ryan
Corinna Vinschen
2011-08-19 16:13:22 UTC
Permalink
Post by Ryan Johnson
BTW, I hope that rebasing wouldn't require every invocation of
setup.exe to shut down all cygwin processes... I really like how
right now I can pull new packages some configure script needs,
without having to shut down 4-5 sessions of emacs.
Seriously, you could propose a patch to rebase, which leaves blocked
DLLs alone and only tries to rebase the colliding ones, if any. Plus a
patch to rebaseall to add a flag
"--don't-test-for-ash-only-just-ignore-what-you-can't-change". Well, a
single char option might be better...

The rebase sources are available by CVS:

$ cvs -d "pserver:anoncvs-9JcytcrH/bA+***@public.gmane.org:/cvs/cygwin-apps co rebase

Send patches to the cygwin-apps list.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2011-08-19 16:21:17 UTC
Permalink
Post by Corinna Vinschen
Post by Ryan Johnson
BTW, I hope that rebasing wouldn't require every invocation of
setup.exe to shut down all cygwin processes... I really like how
right now I can pull new packages some configure script needs,
without having to shut down 4-5 sessions of emacs.
Seriously, you could propose a patch to rebase, which leaves blocked
DLLs alone and only tries to rebase the colliding ones, if any. Plus a
patch to rebaseall to add a flag
"--don't-test-for-ash-only-just-ignore-what-you-can't-change". Well, a
single char option might be better...
Send patches to the cygwin-apps list.
As I mentioned in cygwin-apps, I think a --backup option of some kind
would be useful so you could restore to a previous state.

Why won't anyone implement my idea????????????????!!!!!!!!!!!!

cgf
Christopher Faylor
2011-08-19 16:24:58 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Ryan Johnson
BTW, I hope that rebasing wouldn't require every invocation of
setup.exe to shut down all cygwin processes... I really like how
right now I can pull new packages some configure script needs,
without having to shut down 4-5 sessions of emacs.
Seriously, you could propose a patch to rebase, which leaves blocked
DLLs alone and only tries to rebase the colliding ones, if any. Plus a
patch to rebaseall to add a flag
"--don't-test-for-ash-only-just-ignore-what-you-can't-change". Well, a
single char option might be better...
Send patches to the cygwin-apps list.
As I mentioned in cygwin-apps, I think a --backup option of some kind
would be useful so you could restore to a previous state.
Why won't anyone implement my idea????????????????!!!!!!!!!!!!
!!!!!
Ryan Johnson
2011-08-19 16:31:45 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Ryan Johnson
BTW, I hope that rebasing wouldn't require every invocation of
setup.exe to shut down all cygwin processes... I really like how
right now I can pull new packages some configure script needs,
without having to shut down 4-5 sessions of emacs.
Seriously, you could propose a patch to rebase, which leaves blocked
DLLs alone and only tries to rebase the colliding ones, if any. Plus a
patch to rebaseall to add a flag
"--don't-test-for-ash-only-just-ignore-what-you-can't-change". Well, a
single char option might be better...
Send patches to the cygwin-apps list.
As I mentioned in cygwin-apps, I think a --backup option of some kind
would be useful so you could restore to a previous state.
Why won't anyone implement my idea????????????????!!!!!!!!!!!!
!!!!!
No doubt those last five bangs will mark the tipping point for somebody
who's teetering on the edge of volunteering >;)

Unfortunately my semester is about to start and free time will be a
luxury for the next four months.

Ryan
Corinna Vinschen
2011-08-19 16:40:39 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Ryan Johnson
BTW, I hope that rebasing wouldn't require every invocation of
setup.exe to shut down all cygwin processes... I really like how
right now I can pull new packages some configure script needs,
without having to shut down 4-5 sessions of emacs.
Seriously, you could propose a patch to rebase, which leaves blocked
DLLs alone and only tries to rebase the colliding ones, if any. Plus a
patch to rebaseall to add a flag
"--don't-test-for-ash-only-just-ignore-what-you-can't-change". Well, a
single char option might be better...
Send patches to the cygwin-apps list.
As I mentioned in cygwin-apps, I think a --backup option of some kind
would be useful so you could restore to a previous state.
Why won't anyone implement my idea????????????????!!!!!!!!!!!!
I still don't see how this would be useful. I see occasionally
situations in which rebase *seems* to have broken a DLL. Or, to put it
more carefully, some DLL was suddenly broken, without being able to lay
a finger on the actual cause. However, reverting the DLL base address
to the former state never worked as a fix for me. Only reinstalling the
DLL worked, and a subsequent rebasing did not break the DLL. So, to
repeat myself, I don't see how a reverse or backup mode would be useful.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2011-08-19 17:09:36 UTC
Permalink
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Ryan Johnson
BTW, I hope that rebasing wouldn't require every invocation of
setup.exe to shut down all cygwin processes... I really like how
right now I can pull new packages some configure script needs,
without having to shut down 4-5 sessions of emacs.
Seriously, you could propose a patch to rebase, which leaves blocked
DLLs alone and only tries to rebase the colliding ones, if any. Plus a
patch to rebaseall to add a flag
"--don't-test-for-ash-only-just-ignore-what-you-can't-change". Well, a
single char option might be better...
Send patches to the cygwin-apps list.
As I mentioned in cygwin-apps, I think a --backup option of some kind
would be useful so you could restore to a previous state.
Why won't anyone implement my idea????????????????!!!!!!!!!!!!
I still don't see how this would be useful. I see occasionally
situations in which rebase *seems* to have broken a DLL. Or, to put it
more carefully, some DLL was suddenly broken, without being able to lay
a finger on the actual cause. However, reverting the DLL base address
to the former state never worked as a fix for me. Only reinstalling the
DLL worked, and a subsequent rebasing did not break the DLL. So, to
repeat myself, I don't see how a reverse or backup mode would be useful.
I mean a real backup, as in a copy the old dll somewhere.

This was really in the context of someone who had previously set up
their system just the way they wanted it who could be surprised when
setup.exe kindly helps them out by resetting the base address of all of
their dlls. Since we can't easily ask "Do you want to do this?" from
a setup post-install script it just seemed to me that we have to have
a foolproof recovery if things blow up.

cgf

Christopher Faylor
2011-08-19 16:14:20 UTC
Permalink
Post by Ryan Johnson
Post by Christopher Faylor
I appreciate your thoroughness but I think there are way too many words
above. The FAQ should be solution-oriented. If it is important to
discuss the details behind why fork() fails then maybe another section
could be added. Otherwise, I'd prefer to see something which shows the
error messages and then, as briefly as possible, shows solutions.
While people do ask "Why does fork fail?", the majority of the askers
don't really care. They are really asking "How do I make Cygwin fork
work?" So, I don't think that it is really FAQ-appropriate to dive
too deep here.
I'm definitely a fan of brevity. My main motivation for all the verbage
was so that users who read the faq aren't as shocked when they do all of
the above and fork still fails more often than they'd like, and so
they'd have some idea of which steps are most applicable to their situation.
Two sections might work very well: "How can I prevent fork failures?"
and "Why does fork() still fail after I run rebaseall?" Does that sound
good to you?
That sounds better, yes. I think the former should be solution-oriented
and as succinct as possible. The latter will be something that we can
point to when people claim to have run the former while ignoring any
follow-on text. (They will, of course, first, have tried to send
html-only email to the list and will have complained to postmaster that
the mailing list software thought they were spammers)
Post by Ryan Johnson
Post by Christopher Faylor
And, again, we don't want to tell people to use non-POSIX solutions
except as a last resort. Telling people to rewrite their source code
flies in the face of what Cygwin is trying to do.
(And, yes, I presciently can hear the argument to the above paragraph
coming)
You would prefer that it remain an unadvertized last resort, then?
Yes. I would rather not tell people to rewrite their code.
Post by Ryan Johnson
I guess the idea is less imposing to me after porting lots of code
between linux/gcc and solaris/suncc.
Off topic: to be honest, I'd *love* it if bash, make, and gcc used
spawn instead of fork+exec when compiled under cygwin, though I don't
know how I/O redirection would fit in.
You know about MinGW, right?

cgf
Charles Wilson
2011-08-19 16:06:57 UTC
Permalink
Post by Ryan Johnson
I propose to add an entry to cygwin's faq.using which covers fork
failures.
Good idea.
Post by Ryan Johnson
[spawn stuff in /usr/include/process.h]
As described upthread, the declarations in this header are declaring
*cygwin* implementations of spawn() functions. To get to the native
windows versions, you'd need to
1) include /usr/i686-pc-mingw32/sys-root/mingw/include/process.h
2) and link against /usr/i686-pc-mingw32/sys-root/mingw/lib/libmsvcrt.a

-- which is to say, use the mingw compiler. And that takes you right
out of "cygwin" solutions...
Post by Ryan Johnson
Why does fork fail so often on my system?
2. Rebase your system (see /usr/share/doc/Cygwin/rebase-3.0.1.README).
Don't refer to the file by its version number; it's either going to
change, or disappear as soon as Jason gets back. Maybe:
/usr/share/doc/Cygwin/rebase-x.y.z.README
or
/usr/share/doc/Cygwin/rebase*.README
Post by Ryan Johnson
3. With Vista and later, use peflagsall to set the TS-aware bit on all
cygwin dlls
As Corinna mentioned, you probably mean the dynamic base bit.
But...this usually causes more harm that good IIRC -- see
http://cygwin.com/ml/cygwin-apps/2011-06/msg00070.html
for a summary and some links to other discussions.
Post by Ryan Johnson
4. If you have access to the source code of the offending application
(this applies to all cygwin packages), consider replacing calls to
fork() with calls to the spawn family of functions. These are a native
(= reliable and highly efficient) replacement for fork+exec, which is by
far the most common usage of fork(), and are documented at
http://msdn.microsoft.com/en-us/library/20y988d2%28v=VS.100%29.aspx.
As stated earlier, recommending the use of msvcrt spawn*() is very
un-cygwin -- even if it were possible to do so (see above wrt
/usr/i686-pc-mingw32/sys-root/mingw/*). Simply using *cygwin's*
(deprecated) spawn*() functions won't help much either unless the user
REALLY knows what they are doing; and if THAT were the case, (a) they
wouldn't need this FAQ, and (b) they'd know to use the POSIX *_spawn()
functions instead.

--
Chuck
Christopher Faylor
2011-08-19 16:18:14 UTC
Permalink
Post by Charles Wilson
As stated earlier, recommending the use of msvcrt spawn*() is very
un-cygwin -- even if it were possible to do so (see above wrt
/usr/i686-pc-mingw32/sys-root/mingw/*). Simply using *cygwin's*
(deprecated) spawn*() functions won't help much either unless the user
REALLY knows what they are doing; and if THAT were the case, (a) they
wouldn't need this FAQ, and (b) they'd know to use the POSIX *_spawn()
functions instead.
Bingo. Right.

Those are even better reasons for not including the description than
"It's un-Cygwin". The implications of converting from the use of
fork()/exec() to spawn requires expertise so casually mentioning this as
a "solution" is bound to generate "How exactly do I do that? Do I just
replace all calls to fork() with spawn()?" type questions.

cgf
Loading...