Discussion:
[64bit] emacs is unable to call subprocesses if display-time-mode is set
Ken Brown
2013-03-30 10:54:05 UTC
Permalink
When you set display-time-mode in emacs, the mode line near the bottom
of the screen shows the current time. The code that does this involves
setting itimers.

After I set display-time-mode, every attempt to start a subprocess
within emacs fails. Steps to reproduce:

1. Install my build of 64-bit emacs, which was just uploaded to
64bit/release.

2. Start emacs via `emacs -Q' in a Cygwin terminal.

3. You should now be in the *scratch* buffer. Set display-time-mode:

<alt-x>display-time-mode<ret>

[You should see the time displayed in the mode line.]

4. Type the following text in the *scratch* buffer, position the cursor
at the end, and type `<cntl-j>':

(call-process "/bin/ls" nil t t)

emacs will report "Can't exec program: /bin/ls".

I tried to step through the emacs code in gdb, but gdb became
unresponsive after a while and I had to kill it with the Task Manager.

I also tried strace, with the following results:

(a) If I attach strace to a running emacs process and then carry out
steps 3 and 4 above, the emacs output in step 4 changes to "Segmentation
fault". The strace output does in fact show a SEGV. I've posted the
strace output from one such run at

http://sanibeltranquility.com/cygwin/strace.out

(b) If instead I run emacs under strace from the beginning, the bug
disappears.

Ken
Ken Brown
2013-03-30 11:01:49 UTC
Permalink
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the bottom
of the screen shows the current time. The code that does this involves
setting itimers.
After I set display-time-mode, every attempt to start a subprocess
1. Install my build of 64-bit emacs, which was just uploaded to
64bit/release.
2. Start emacs via `emacs -Q' in a Cygwin terminal.
<alt-x>display-time-mode<ret>
[You should see the time displayed in the mode line.]
4. Type the following text in the *scratch* buffer, position the cursor
(call-process "/bin/ls" nil t t)
emacs will report "Can't exec program: /bin/ls".
I tried to step through the emacs code in gdb, but gdb became
unresponsive after a while and I had to kill it with the Task Manager.
(a) If I attach strace to a running emacs process and then carry out
steps 3 and 4 above, the emacs output in step 4 changes to "Segmentation
fault". The strace output does in fact show a SEGV. I've posted the
strace output from one such run at
http://sanibeltranquility.com/cygwin/strace.out
(b) If instead I run emacs under strace from the beginning, the bug
disappears.
I forgot to say that I'm running cygwin-1.7.18-14 on Windows 7.

Ken
Corinna Vinschen
2013-03-30 11:17:14 UTC
Permalink
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...

Other than that, this is a bit too complicated for the Easter weekend.
I'll start to look into it when I'm back to work on Tuesday, ok?

Thanks for the report.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ken Brown
2013-03-30 14:27:42 UTC
Permalink
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
Other than that, this is a bit too complicated for the Easter weekend.
I'll start to look into it when I'm back to work on Tuesday, ok?
Sure, that's fine. I won't have a chance to try to make a simple
testcase before then anyway.

Ken
Ken Brown
2013-04-01 12:48:50 UTC
Permalink
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.

I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.

Ken
Ken Brown
2013-04-01 16:04:22 UTC
Permalink
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.
I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.
Sorry for yet another email, but I just wanted to let you know that you
shouldn't put much time into this unless something jumps out at you.

I just looked at this with gdb again and noticed that the function
`Fcall_process' [which is the C function that implements the lisp
function `call-process'] is being called with an argument nargs =
4305072226, which is 0x1009A3062; the value should be 4. nargs is of
type ptrdiff_t. I'll try to figure out why this is happening.

Ken
Corinna Vinschen
2013-04-02 09:11:05 UTC
Permalink
Post by Ken Brown
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.
I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.
Sorry for yet another email, but I just wanted to let you know that
you shouldn't put much time into this unless something jumps out at
you.
I just looked at this with gdb again and noticed that the function
`Fcall_process' [which is the C function that implements the lisp
function `call-process'] is being called with an argument nargs =
4305072226, which is 0x1009A3062; the value should be 4. nargs is
of type ptrdiff_t. I'll try to figure out why this is happening.
Thanks, Ken! I hold myself back for the time being. Btw., your
copyright assignment has arrived.


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ken Brown
2013-04-02 18:04:20 UTC
Permalink
Post by Ken Brown
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.
I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.
Sorry for yet another email, but I just wanted to let you know that you
shouldn't put much time into this unless something jumps out at you.
I just looked at this with gdb again and noticed that the function
`Fcall_process' [which is the C function that implements the lisp
function `call-process'] is being called with an argument nargs =
4305072226, which is 0x1009A3062; the value should be 4. nargs is of
type ptrdiff_t. I'll try to figure out why this is happening.
It turns out that gdb is giving me bogus information. I don't know if
that's caused by a gdb bug, a Cygwin bug, an emacs bug, or something
else. But I inserted a printf command into Fcall_process, and it is
indeed passed the value of 4 for `nargs', not the crazy value that gdb
reported. I'll see if I can come up with a simple test case for this
gdb problem, but it may or may not be related to the bug I reported.

If you wouldn't mind taking a look at the original bug when you get a
chance, maybe you can spot something using strace or whatever other
tools you have. (BTW, I just retested with cygwin-1.7.18-15, and the
bug is still there.) If you want to confirm the gdb issue, install
emacs-debuginfo and run gdb with a breakpoint at Fcall_process before
carrying out my recipe.

Thanks.

Ken
Corinna Vinschen
2013-04-02 19:00:27 UTC
Permalink
Post by Ken Brown
Post by Ken Brown
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.
I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.
Sorry for yet another email, but I just wanted to let you know that you
shouldn't put much time into this unless something jumps out at you.
I just looked at this with gdb again and noticed that the function
`Fcall_process' [which is the C function that implements the lisp
function `call-process'] is being called with an argument nargs =
4305072226, which is 0x1009A3062; the value should be 4. nargs is of
type ptrdiff_t. I'll try to figure out why this is happening.
It turns out that gdb is giving me bogus information. I don't know
if that's caused by a gdb bug, a Cygwin bug, an emacs bug, or
something else.
GDB sometimes can't show correct information if you didn't step into the
function deep enoughs since the debug information isn't complete. A
single step to the next line most of the time fixes that.
Post by Ken Brown
If you wouldn't mind taking a look at the original bug when you get
a chance, maybe you can spot something using strace or whatever
other tools you have. (BTW, I just retested with cygwin-1.7.18-15,
and the bug is still there.) If you want to confirm the gdb issue,
install emacs-debuginfo and run gdb with a breakpoint at
Fcall_process before carrying out my recipe.
I can try tomorrow, but a testcase is ultimately more helpful. The
strace doesn't contain a lot of info, except that the crash occurs in
the function cmalloc, which allocates space on the cygheap. It's not
clear what function has been called at this point, though.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ken Brown
2013-04-02 19:41:42 UTC
Permalink
Post by Corinna Vinschen
Post by Ken Brown
Post by Ken Brown
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.
I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.
Sorry for yet another email, but I just wanted to let you know that you
shouldn't put much time into this unless something jumps out at you.
I just looked at this with gdb again and noticed that the function
`Fcall_process' [which is the C function that implements the lisp
function `call-process'] is being called with an argument nargs =
4305072226, which is 0x1009A3062; the value should be 4. nargs is of
type ptrdiff_t. I'll try to figure out why this is happening.
It turns out that gdb is giving me bogus information. I don't know
if that's caused by a gdb bug, a Cygwin bug, an emacs bug, or
something else.
GDB sometimes can't show correct information if you didn't step into the
function deep enoughs since the debug information isn't complete. A
single step to the next line most of the time fixes that.
Thanks, I didn't know that. I just tried and sure enough gdb reported
nargs == 4 after one step.
Post by Corinna Vinschen
Post by Ken Brown
If you wouldn't mind taking a look at the original bug when you get
a chance, maybe you can spot something using strace or whatever
other tools you have. (BTW, I just retested with cygwin-1.7.18-15,
and the bug is still there.) If you want to confirm the gdb issue,
install emacs-debuginfo and run gdb with a breakpoint at
Fcall_process before carrying out my recipe.
I can try tomorrow, but a testcase is ultimately more helpful.
Thanks. In the meantime, I'll keep trying to make a test case.

Ken
Ken Brown
2013-04-02 22:57:14 UTC
Permalink
Post by Corinna Vinschen
Post by Ken Brown
Post by Ken Brown
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.
I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.
Sorry for yet another email, but I just wanted to let you know that you
shouldn't put much time into this unless something jumps out at you.
I just looked at this with gdb again and noticed that the function
`Fcall_process' [which is the C function that implements the lisp
function `call-process'] is being called with an argument nargs =
4305072226, which is 0x1009A3062; the value should be 4. nargs is of
type ptrdiff_t. I'll try to figure out why this is happening.
It turns out that gdb is giving me bogus information. I don't know
if that's caused by a gdb bug, a Cygwin bug, an emacs bug, or
something else.
GDB sometimes can't show correct information if you didn't step into the
function deep enoughs since the debug information isn't complete. A
single step to the next line most of the time fixes that.
Post by Ken Brown
If you wouldn't mind taking a look at the original bug when you get
a chance, maybe you can spot something using strace or whatever
other tools you have. (BTW, I just retested with cygwin-1.7.18-15,
and the bug is still there.) If you want to confirm the gdb issue,
install emacs-debuginfo and run gdb with a breakpoint at
Fcall_process before carrying out my recipe.
I can try tomorrow, but a testcase is ultimately more helpful. The
strace doesn't contain a lot of info, except that the crash occurs in
the function cmalloc, which allocates space on the cygheap. It's not
clear what function has been called at this point, though.
How did you figure out that the crash occurs in cmalloc? I tried
addr2line, but it gave me no information:

$ addr2line -e /bin/cygwin1.dll 1800429F4
??:?

Anyway, I tried to run emacs under gdb with a breakpoint at cmalloc.
But after I pressed c enough times to get back to the running emacs,
emacs became unresponsive, and I had to kill gdb with the Task Manager.
The same steps in 32-bit Cygwin worked fine. This might just be
another symptom of the bug.

Ken
Corinna Vinschen
2013-04-03 11:41:02 UTC
Permalink
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
Post by Ken Brown
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.
I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.
Sorry for yet another email, but I just wanted to let you know that you
shouldn't put much time into this unless something jumps out at you.
I just looked at this with gdb again and noticed that the function
`Fcall_process' [which is the C function that implements the lisp
function `call-process'] is being called with an argument nargs =
4305072226, which is 0x1009A3062; the value should be 4. nargs is of
type ptrdiff_t. I'll try to figure out why this is happening.
It turns out that gdb is giving me bogus information. I don't know
if that's caused by a gdb bug, a Cygwin bug, an emacs bug, or
something else.
GDB sometimes can't show correct information if you didn't step into the
function deep enoughs since the debug information isn't complete. A
single step to the next line most of the time fixes that.
Post by Ken Brown
If you wouldn't mind taking a look at the original bug when you get
a chance, maybe you can spot something using strace or whatever
other tools you have. (BTW, I just retested with cygwin-1.7.18-15,
and the bug is still there.) If you want to confirm the gdb issue,
install emacs-debuginfo and run gdb with a breakpoint at
Fcall_process before carrying out my recipe.
I can try tomorrow, but a testcase is ultimately more helpful. The
strace doesn't contain a lot of info, except that the crash occurs in
the function cmalloc, which allocates space on the cygheap. It's not
clear what function has been called at this point, though.
How did you figure out that the crash occurs in cmalloc? I tried
$ addr2line -e /bin/cygwin1.dll 1800429F4
??:?
Are you sure the stackdump is from the same version of the Cygwin
DLL you're running now?
Post by Ken Brown
Anyway, I tried to run emacs under gdb with a breakpoint at cmalloc.
But after I pressed c enough times to get back to the running emacs,
emacs became unresponsive, and I had to kill gdb with the Task
Manager. The same steps in 32-bit Cygwin worked fine. This might
just be another symptom of the bug.
Maybe. Using GDB on 64 bit seems to be a bit shaky right now, but
I hope this changes over time.

I didn't try emacs myself yet, but I will today.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ken Brown
2013-04-03 13:49:12 UTC
Permalink
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
Post by Ken Brown
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
When you set display-time-mode in emacs, the mode line near the
bottom of the screen shows the current time. The code that does
this involves setting itimers.
Can you extrace a simple testcase from the itimer code? That would help
a lot to track down this case. I'm a bit scared of emacs...
I was wrong about itimers. It turns out that emacs uses two different
kinds of timers. One type is defined in C code and uses itimers, and
the other type is defined in Lisp code. It's the latter that's involved
here. So it won't be easy to make a test case in plain C.
I'm also finding that the order in which I do things affects whether or
not the bug shows up. For example, if I start a shell within emacs
before setting display-time-mode, the bug disappears. I'll keep looking
at the emacs code, but maybe you'll be able to see something in the
strace output.
Sorry for yet another email, but I just wanted to let you know that you
shouldn't put much time into this unless something jumps out at you.
I just looked at this with gdb again and noticed that the function
`Fcall_process' [which is the C function that implements the lisp
function `call-process'] is being called with an argument nargs =
4305072226, which is 0x1009A3062; the value should be 4. nargs is of
type ptrdiff_t. I'll try to figure out why this is happening.
It turns out that gdb is giving me bogus information. I don't know
if that's caused by a gdb bug, a Cygwin bug, an emacs bug, or
something else.
GDB sometimes can't show correct information if you didn't step into the
function deep enoughs since the debug information isn't complete. A
single step to the next line most of the time fixes that.
Post by Ken Brown
If you wouldn't mind taking a look at the original bug when you get
a chance, maybe you can spot something using strace or whatever
other tools you have. (BTW, I just retested with cygwin-1.7.18-15,
and the bug is still there.) If you want to confirm the gdb issue,
install emacs-debuginfo and run gdb with a breakpoint at
Fcall_process before carrying out my recipe.
I can try tomorrow, but a testcase is ultimately more helpful. The
strace doesn't contain a lot of info, except that the crash occurs in
the function cmalloc, which allocates space on the cygheap. It's not
clear what function has been called at this point, though.
How did you figure out that the crash occurs in cmalloc? I tried
$ addr2line -e /bin/cygwin1.dll 1800429F4
??:?
Are you sure the stackdump is from the same version of the Cygwin
DLL you're running now?
I didn't get a stackdump. I was relying on the following line from the
strace output:

Process 4928, exception c0000005 at 00000001800429F4

I just reproduced that today, with the same address, so there's no
question of the DLL version.

Ken
Corinna Vinschen
2013-04-03 14:05:35 UTC
Permalink
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
How did you figure out that the crash occurs in cmalloc? I tried
$ addr2line -e /bin/cygwin1.dll 1800429F4
??:?
Are you sure the stackdump is from the same version of the Cygwin
DLL you're running now?
I didn't get a stackdump. I was relying on the following line from
Process 4928, exception c0000005 at 00000001800429F4
I just reproduced that today, with the same address, so there's no
question of the DLL version.
Hmm. Did you install the cygwin-debuginfo package? If all else
fails, try `addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg'. I get

$ $ addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg 1800429F4
/usr/src/debug/cygwin-1.7.18-15/winsup/cygwin/cygheap.cc:298

I can reproduce the issue but it's tricky to debug. The exception
occurs in the forked process and GDB can't follow fork on Cygwin.

Btw., can you create an emacs which is built without optimization
and non-stripped this would simplify debuigging a bit...


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2013-04-03 14:17:08 UTC
Permalink
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
How did you figure out that the crash occurs in cmalloc? I tried
$ addr2line -e /bin/cygwin1.dll 1800429F4
??:?
Are you sure the stackdump is from the same version of the Cygwin
DLL you're running now?
I didn't get a stackdump. I was relying on the following line from
Process 4928, exception c0000005 at 00000001800429F4
I just reproduced that today, with the same address, so there's no
question of the DLL version.
Hmm. Did you install the cygwin-debuginfo package? If all else
fails, try `addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg'. I get
$ $ addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg 1800429F4
/usr/src/debug/cygwin-1.7.18-15/winsup/cygwin/cygheap.cc:298
Is it time to scrap the malloc functions in cygheap and roll them into
the standard malloc? I think modern mallocs allow you to segregate
regions they way we'd want to for cygheap.

cgf
Corinna Vinschen
2013-04-03 14:28:04 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
How did you figure out that the crash occurs in cmalloc? I tried
$ addr2line -e /bin/cygwin1.dll 1800429F4
??:?
Are you sure the stackdump is from the same version of the Cygwin
DLL you're running now?
I didn't get a stackdump. I was relying on the following line from
Process 4928, exception c0000005 at 00000001800429F4
I just reproduced that today, with the same address, so there's no
question of the DLL version.
Hmm. Did you install the cygwin-debuginfo package? If all else
fails, try `addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg'. I get
$ $ addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg 1800429F4
/usr/src/debug/cygwin-1.7.18-15/winsup/cygwin/cygheap.cc:298
Is it time to scrap the malloc functions in cygheap and roll them into
the standard malloc? I think modern mallocs allow you to segregate
regions they way we'd want to for cygheap.
We should do that at one point if possible, but in theory it should at
least work as is for now. A crash in cmalloc might also be a crash
in another malloc implementation :-P


Still investigating,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ken Brown
2013-04-03 18:00:14 UTC
Permalink
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
How did you figure out that the crash occurs in cmalloc? I tried
$ addr2line -e /bin/cygwin1.dll 1800429F4
??:?
Are you sure the stackdump is from the same version of the Cygwin
DLL you're running now?
I didn't get a stackdump. I was relying on the following line from
Process 4928, exception c0000005 at 00000001800429F4
I just reproduced that today, with the same address, so there's no
question of the DLL version.
Hmm. Did you install the cygwin-debuginfo package? If all else
fails, try `addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg'. I get
$ $ addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg 1800429F4
/usr/src/debug/cygwin-1.7.18-15/winsup/cygwin/cygheap.cc:298
Ah, that works. I thought addr2line was supposed to know where to find
cygwin1.dbg.
Post by Corinna Vinschen
I can reproduce the issue but it's tricky to debug. The exception
occurs in the forked process and GDB can't follow fork on Cygwin.
Btw., can you create an emacs which is built without optimization
and non-stripped this would simplify debuigging a bit...
It was already built without optimization. I'm working on building a
non-stripped version, but cygport isn't cooperating. If you put
"RESTRICT=strip" into the .cygport file, then cygport doesn't package
the sources. So you get a binary with debugging symbols, but you don't
get the corresponding sources.

I'll send Yaakov a patch, but in the meantime I'm working around this.
It shouldn't be long.

Ken
Corinna Vinschen
2013-04-03 19:03:54 UTC
Permalink
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
Post by Ken Brown
How did you figure out that the crash occurs in cmalloc? I tried
$ addr2line -e /bin/cygwin1.dll 1800429F4
??:?
Are you sure the stackdump is from the same version of the Cygwin
DLL you're running now?
I didn't get a stackdump. I was relying on the following line from
Process 4928, exception c0000005 at 00000001800429F4
I just reproduced that today, with the same address, so there's no
question of the DLL version.
Hmm. Did you install the cygwin-debuginfo package? If all else
fails, try `addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg'. I get
$ $ addr2line -e /usr/lib/debug/usr/bin/cygwin1.dbg 1800429F4
/usr/src/debug/cygwin-1.7.18-15/winsup/cygwin/cygheap.cc:298
Ah, that works. I thought addr2line was supposed to know where to
find cygwin1.dbg.
Post by Corinna Vinschen
I can reproduce the issue but it's tricky to debug. The exception
occurs in the forked process and GDB can't follow fork on Cygwin.
Btw., can you create an emacs which is built without optimization
and non-stripped this would simplify debuigging a bit...
It was already built without optimization. I'm working on building
a non-stripped version, but cygport isn't cooperating. If you put
"RESTRICT=strip" into the .cygport file, then cygport doesn't
package the sources. So you get a binary with debugging symbols,
but you don't get the corresponding sources.
I'll send Yaakov a patch, but in the meantime I'm working around
this. It shouldn't be long.
I'm still debugging this and something is very fishy when building
the environment for a process-to-exec. I tracked it down to a
specific string duplication in Cygwin's build_env function which
seems to overwrite administrative data on the cygheap for some
reason I didn't quite follow yet.

This may take some time...


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2013-04-03 20:02:08 UTC
Permalink
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
I can reproduce the issue but it's tricky to debug. The exception
occurs in the forked process and GDB can't follow fork on Cygwin.
Btw., can you create an emacs which is built without optimization
and non-stripped this would simplify debuigging a bit...
It was already built without optimization. I'm working on building
a non-stripped version, but cygport isn't cooperating. If you put
"RESTRICT=strip" into the .cygport file, then cygport doesn't
package the sources. So you get a binary with debugging symbols,
but you don't get the corresponding sources.
I'll send Yaakov a patch, but in the meantime I'm working around
this. It shouldn't be long.
I'm still debugging this and something is very fishy when building
the environment for a process-to-exec. I tracked it down to a
specific string duplication in Cygwin's build_env function which
seems to overwrite administrative data on the cygheap for some
reason I didn't quite follow yet.
This may take some time...
I found it. When using the display-time-mode option, emacs opens and
reads /proc/loadavg. The problem was that the buffer allocated in
format_proc_loadavg is too small, so the subsequent sprintf overwrites
unrelated data on the cygheap.

In fact, this problem occurs on 32 bit as well, so I fixed it in CVS
HEAD in the first place. It's kind of a miracle that this has never
been encountered in the 32 bit version before. The problem exists
since at least Cygwin 1.7.10.

I'm going to build a new 64 bit Cygwin right now, which I will upload
in half an hour or so. Please give it a try.


Thanks for the report and the instructions how to track down the
problem.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2013-04-03 20:48:12 UTC
Permalink
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
I can reproduce the issue but it's tricky to debug. The exception
occurs in the forked process and GDB can't follow fork on Cygwin.
Btw., can you create an emacs which is built without optimization
and non-stripped this would simplify debuigging a bit...
It was already built without optimization. I'm working on building
a non-stripped version, but cygport isn't cooperating. If you put
"RESTRICT=strip" into the .cygport file, then cygport doesn't
package the sources. So you get a binary with debugging symbols,
but you don't get the corresponding sources.
I'll send Yaakov a patch, but in the meantime I'm working around
this. It shouldn't be long.
I'm still debugging this and something is very fishy when building
the environment for a process-to-exec. I tracked it down to a
specific string duplication in Cygwin's build_env function which
seems to overwrite administrative data on the cygheap for some
reason I didn't quite follow yet.
This may take some time...
I found it. When using the display-time-mode option, emacs opens and
reads /proc/loadavg. The problem was that the buffer allocated in
format_proc_loadavg is too small, so the subsequent sprintf overwrites
unrelated data on the cygheap.
In fact, this problem occurs on 32 bit as well, so I fixed it in CVS
HEAD in the first place. It's kind of a miracle that this has never
been encountered in the 32 bit version before. The problem exists
since at least Cygwin 1.7.10.
Sounds like a classic Schroedinbug [1]... thanks for the fix.

[1] http://www.catb.org/jargon/html/S/schroedinbug.html

Ryan
Corinna Vinschen
2013-04-03 20:49:04 UTC
Permalink
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
I can reproduce the issue but it's tricky to debug. The exception
occurs in the forked process and GDB can't follow fork on Cygwin.
Btw., can you create an emacs which is built without optimization
and non-stripped this would simplify debuigging a bit...
It was already built without optimization. I'm working on building
a non-stripped version, but cygport isn't cooperating. If you put
"RESTRICT=strip" into the .cygport file, then cygport doesn't
package the sources. So you get a binary with debugging symbols,
but you don't get the corresponding sources.
I'll send Yaakov a patch, but in the meantime I'm working around
this. It shouldn't be long.
I'm still debugging this and something is very fishy when building
the environment for a process-to-exec. I tracked it down to a
specific string duplication in Cygwin's build_env function which
seems to overwrite administrative data on the cygheap for some
reason I didn't quite follow yet.
This may take some time...
I found it. When using the display-time-mode option, emacs opens and
reads /proc/loadavg. The problem was that the buffer allocated in
format_proc_loadavg is too small, so the subsequent sprintf overwrites
unrelated data on the cygheap.
In fact, this problem occurs on 32 bit as well, so I fixed it in CVS
HEAD in the first place. It's kind of a miracle that this has never
been encountered in the 32 bit version before. The problem exists
since at least Cygwin 1.7.10.
I'm going to build a new 64 bit Cygwin right now, which I will upload
in half an hour or so. Please give it a try.
I just uploaded 1.7.18-16 to the 64 bit repo.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Ken Brown
2013-04-03 22:02:12 UTC
Permalink
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Ken Brown
Post by Corinna Vinschen
I can reproduce the issue but it's tricky to debug. The exception
occurs in the forked process and GDB can't follow fork on Cygwin.
Btw., can you create an emacs which is built without optimization
and non-stripped this would simplify debuigging a bit...
It was already built without optimization. I'm working on building
a non-stripped version, but cygport isn't cooperating. If you put
"RESTRICT=strip" into the .cygport file, then cygport doesn't
package the sources. So you get a binary with debugging symbols,
but you don't get the corresponding sources.
I'll send Yaakov a patch, but in the meantime I'm working around
this. It shouldn't be long.
I'm still debugging this and something is very fishy when building
the environment for a process-to-exec. I tracked it down to a
specific string duplication in Cygwin's build_env function which
seems to overwrite administrative data on the cygheap for some
reason I didn't quite follow yet.
This may take some time...
I found it. When using the display-time-mode option, emacs opens and
reads /proc/loadavg. The problem was that the buffer allocated in
format_proc_loadavg is too small, so the subsequent sprintf overwrites
unrelated data on the cygheap.
In fact, this problem occurs on 32 bit as well, so I fixed it in CVS
HEAD in the first place. It's kind of a miracle that this has never
been encountered in the 32 bit version before. The problem exists
since at least Cygwin 1.7.10.
I'm going to build a new 64 bit Cygwin right now, which I will upload
in half an hour or so. Please give it a try.
I just uploaded 1.7.18-16 to the 64 bit repo.
The bug is gone. Thank you!

Ken

Loading...