Christopher Faylor
2012-07-02 16:01:57 UTC
[redirecting to cygwin-developers]
Thanks for confirming. I suspect that this snapshot really only masks the
problem. The stack was getting corrupted by something but I was having a
really hard time figuring out what was doing it.
Compiling path.cc without optimization or passwd.cc without
-fomit-frame-pointer "fixed" the problem but clearly something is wrong.
I tried building Cygwin with stack probes and that made the DLL
unrunnable. I tried instrumenting it with -finstrument-functions and
that made the problem go away.
The problem is that something, somewhere along the line, replaces a
frame pointer in the stack with 0x10c (268) and the return address with
zero. So, eventually, when a function returns, %ebp is set to 0x10c and
the program jumps to address zero. That is one manifestation. In others
the 0x10c is still there but it is interpreted as a pointer to something
and dereferencing it causes a SEGV.
0x10c is 256 + 12 but I haven't been able to find that anywhere in the
source.
Your test triggered a problem which has existed in Cygwin since last November.
I noticed it in snapshots going back to 2012-11-14. But, when I tried to
build a version of Cygwin from before that time, it still manifested the
problem. I did change to gcc 4.5.3 around that time so I'm thinking that
either this version of gcc exposed a problem in Cygwin or Cygwin has exposed
a problem in this version gcc. There was an odd problem in select() where
it seemed like constructors weren't being properly run on a local variable
when alloca was used in the same function.
cgf
Sorry, Marco. Nevermind. I duplicated this. No need to upload anything.
I'm still working on it.
it seems solved on 20120702 snapshotsI'm still working on it.
problem. The stack was getting corrupted by something but I was having a
really hard time figuring out what was doing it.
Compiling path.cc without optimization or passwd.cc without
-fomit-frame-pointer "fixed" the problem but clearly something is wrong.
I tried building Cygwin with stack probes and that made the DLL
unrunnable. I tried instrumenting it with -finstrument-functions and
that made the problem go away.
The problem is that something, somewhere along the line, replaces a
frame pointer in the stack with 0x10c (268) and the return address with
zero. So, eventually, when a function returns, %ebp is set to 0x10c and
the program jumps to address zero. That is one manifestation. In others
the 0x10c is still there but it is interpreted as a pointer to something
and dereferencing it causes a SEGV.
0x10c is 256 + 12 but I haven't been able to find that anywhere in the
source.
Your test triggered a problem which has existed in Cygwin since last November.
I noticed it in snapshots going back to 2012-11-14. But, when I tried to
build a version of Cygwin from before that time, it still manifested the
problem. I did change to gcc 4.5.3 around that time so I'm thinking that
either this version of gcc exposed a problem in Cygwin or Cygwin has exposed
a problem in this version gcc. There was an odd problem in select() where
it seemed like constructors weren't being properly run on a local variable
when alloca was used in the same function.
cgf