Discussion:
/dev/clipboard corrupted
Thomas Wolff
2012-07-03 16:29:19 UTC
Permalink
[taking this thread to cygwin-developers]
You know, we just love STCs. Send you small test program here, plus a
short instruction how you created the clipboard content and how to
call
the testcase to see the problem.
Sure, so here it is. Open clipboard.txt with notepad, ^A^C to copy
all, then run the program to see bytes skipped.
Actually it seems to skip as many bytes per read() as there were
additional UTF-8 bytes (more bytes than characters) in the preceding
read block.
Checking the code again, variable pos seems to be used both as an
index into the clipboard (WCHAR) and an offset to the resulting
string length (char) which would explain the effect (not having
checked all the details though as I'm not familiar with the used
APIs).
Thanks for the testcase. I applied a patch which is supposed to fix the
problem. It should be in the next developer snapshot. Please give it
a try.
The patch (loaded from CVS) seems to almost fix the issue but also
another bug crept in.

* Looking at the code is quite confusing as long as I assumed
sys_wcstombs would work like wcstombs;
the latter is obviously designed to convert complete nul-terminated
wcs strings only as
there is no way to control the number of wcs characters, neither as
consumed in the result
(as your new comment also mentions) nor as setting a limit -
the latter is apparently different, telling from the comment in
strfuncs.cc.
I had tried a patch using wctomb instead, as follows, but it didn't
work;
maybe for some reason the standard functions cannot be used in this
context?
int outlen = 0;
/* Make sure buffer has room for max. length UTF-8 (or GB18030/UHC)
plus final NUL;
this does not work if the total buffer is shorter,
so some read-ahead will be needed for a complete solution */
while (outlen < (int) len - 4 && pos < (int) glen /* IS THIS
CORRECT? */ )
{
int ret1 = wctomb ((char *) ptr + outlen, buf [pos]);
if (ret1 == -1)
{
((char *) ptr) [outlen] = 0x7F; /* ?? */
ret1 = 1;
}
pos ++; /* clipboard buffer position */
outlen += ret1; /* output size */
}
ret = outlen;

* The current (CVS) code will not work if even the first character to be
converted
needs more bytes than the buffer provides, e.g. if the application
calls read() with length 1 only.
Some extra buffering would be needed to make it work then.

* I assume the current code will also fail in non-UTF-8 locales;
if the wcs block being converted contains a non-convertible character,
it would abort since wcstombs returns -1
(assuming here that sys_wcstombs behaves alike in this respect)
and not even deliver the characters before the failing one.

* I had previously observed that with a read size of n only n-1 bytes
would be delivered
and thought this was on purpose because wcstombs appends a final nul
to its result.
Now n bytes are returned (if available) and in fact the byte behind
the read() buffer is
overwritten (see modified test program).

------
Thomas
Corinna Vinschen
2012-07-04 08:15:03 UTC
Permalink
Post by Thomas Wolff
[taking this thread to cygwin-developers]
You know, we just love STCs. Send you small test program here, plus a
short instruction how you created the clipboard content and how to
call
the testcase to see the problem.
Sure, so here it is. Open clipboard.txt with notepad, ^A^C to copy
all, then run the program to see bytes skipped.
Actually it seems to skip as many bytes per read() as there were
additional UTF-8 bytes (more bytes than characters) in the preceding
read block.
Checking the code again, variable pos seems to be used both as an
index into the clipboard (WCHAR) and an offset to the resulting
string length (char) which would explain the effect (not having
checked all the details though as I'm not familiar with the used
APIs).
Thanks for the testcase. I applied a patch which is supposed to fix the
problem. It should be in the next developer snapshot. Please give it
a try.
The patch (loaded from CVS) seems to almost fix the issue but also
another bug crept in.
* Looking at the code is quite confusing as long as I assumed
sys_wcstombs would work like wcstombs;
the latter is obviously designed to convert complete
nul-terminated wcs strings only as
there is no way to control the number of wcs characters, neither
as consumed in the result
(as your new comment also mentions) nor as setting a limit -
the latter is apparently different, telling from the comment in
strfuncs.cc.
I had tried a patch using wctomb instead, as follows, but it
didn't work;
maybe for some reason the standard functions cannot be used in
this context?
You didn't tell how it fails. You *could* use the standard functions,
but be aware that they are locale-dependent and your testcase does not
call setlocale, so it's running in the C locale. Also, don't use
wctomb. Ever. It's not thread-safe. Use wcrtomb with an explicit
local shift state variable instead. Also, len - 4 should be
len - MB_CUR_MAX to be independent of the max length of the current
codeset.
Post by Thomas Wolff
int outlen = 0;
/* Make sure buffer has room for max. length UTF-8 (or GB18030/UHC)
plus final NUL;
this does not work if the total buffer is shorter,
so some read-ahead will be needed for a complete solution */
while (outlen < (int) len - 4 && pos < (int) glen /* IS THIS
CORRECT? */ )
{
int ret1 = wctomb ((char *) ptr + outlen, buf [pos]);
if (ret1 == -1)
{
((char *) ptr) [outlen] = 0x7F; /* ?? */
ret1 = 1;
}
pos ++; /* clipboard buffer position */
outlen += ret1; /* output size */
}
ret = outlen;
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
Post by Thomas Wolff
* I assume the current code will also fail in non-UTF-8 locales;
if the wcs block being converted contains a non-convertible character,
it would abort since wcstombs returns -1
(assuming here that sys_wcstombs behaves alike in this respect)
It doesn't. In fact, sys_cp_wcstombs was designed to never fail
to convert a string. See the source code in strfunc.cc, line 447ff.
Post by Thomas Wolff
and not even deliver the characters before the failing one.
* I had previously observed that with a read size of n only n-1
bytes would be delivered
and thought this was on purpose because wcstombs appends a final
nul to its result.
Now n bytes are returned (if available) and in fact the byte
behind the read() buffer is
overwritten (see modified test program).
It's not Cygwin overwriting the byte, your testcase is...
Post by Thomas Wolff
------
Thomas
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char * * argv)
{
char * fn = "/dev/clipboard";
int fd = open (fn, O_RDONLY | O_BINARY, 0);
if (fd < 0) {
exit (fd);
}
int out_tty = isatty (1);
int filebuflen = 100;
argc --;
if (argc > 0) {
int ret = sscanf (argv [1], "%d", & filebuflen);
}
char * filebuf = malloc (filebuflen + 1);
filebuf [filebuflen] = 77;
fprintf (stderr, "filebuflen %d (overflow byte %d)\n", filebuflen, filebuf [filebuflen]);
int n;
do {
n = read (fd, filebuf, filebuflen);
if (out_tty) {
filebuf [n] = 0;
Hmm, what if n == filebuf?
Post by Thomas Wolff
printf ("read %d bytes: <%s> (overflow byte %d)\n", n, filebuf, filebuf [filebuflen]);
} else {
fprintf (stderr, "read %d bytes\n", n);
write (1, filebuf, n);
}
} while (n > 0);
close (fd);
}
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2012-07-04 08:27:07 UTC
Permalink
Post by Corinna Vinschen
Post by Thomas Wolff
filebuf [filebuflen] = 77;
fprintf (stderr, "filebuflen %d (overflow byte %d)\n", filebuflen, filebuf [filebuflen]);
int n;
do {
n = read (fd, filebuf, filebuflen);
if (out_tty) {
filebuf [n] = 0;
Hmm, what if n == filebuf?
Grr. Make that "n == filebuflen".

Try this testcase instead:

==== SNIP ====
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int
main (int argc, char * * argv)
{
char * fn = "/dev/clipboard";
int fd = open (fn, O_RDONLY | O_BINARY, 0);
if (fd < 0) {
exit (fd);
}

int out_tty = isatty (1);

int filebuflen = 100;
argc --;
if (argc > 0) {
int ret = sscanf (argv [1], "%d", & filebuflen);
}
fprintf (stderr, "filebuflen %d\n", filebuflen);
char * filebuf = malloc (filebuflen + 1);

int n;
do {
memset (filebuf, '\x01', filebuflen + 1);
n = read (fd, filebuf, filebuflen);
if (filebuf[n] != '\x01')
printf ("OVERWRITTEN\n");
if (out_tty) {
filebuf [n] = 0;
printf ("read %d bytes: <%s>\n", n, filebuf);
} else {
fprintf (stderr, "read %d bytes\n", n);
write (1, filebuf, n);
}
} while (n > 0);

close (fd);
}
==== SNAP ====


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2012-07-04 09:03:15 UTC
Permalink
Post by Corinna Vinschen
Post by Thomas Wolff
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
No, not yet. This isn't exactly a regression from former behaviour.
Please provide a patch or remind me in a few weeks again.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Thomas Wolff
2012-07-08 23:15:12 UTC
Permalink
Post by Corinna Vinschen
Post by Thomas Wolff
...
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
...
Post by Corinna Vinschen
No, not yet. This isn't exactly a regression from former behaviour.
Please provide a patch or remind me in a few weeks again.
About the buffering I may send a patch when I find time.
Post by Corinna Vinschen
Post by Thomas Wolff
* I assume the current code will also fail in non-UTF-8 locales;
if the wcs block being converted contains a non-convertible character,
it would abort since wcstombs returns -1
(assuming here that sys_wcstombs behaves alike in this respect)
It doesn't. In fact, sys_cp_wcstombs was designed to never fail
to convert a string. See the source code in strfunc.cc, line 447ff.
Tested meanwhile. My assumption was too speculative, sorry.
Post by Corinna Vinschen
Post by Thomas Wolff
* I had previously observed that with a read size of n only n-1
bytes would be delivered
and thought this was on purpose because wcstombs appends a final
nul to its result.
Now n bytes are returned (if available) and in fact the byte
behind the read() buffer is
overwritten (see modified test program).
It's not Cygwin overwriting the byte, your testcase is...
Both were, actually...
Post by Corinna Vinschen
Post by Thomas Wolff
...
n = read (fd, filebuf, filebuflen);
if (out_tty) {
filebuf [n] = 0;
Hmm, what if n == filebuflen?
I admit my test case was bogus in this respect *BLUSH*.
However, yet the error is there, as a fixed test case reveals.
Also your own testcase does print out "OVERWRITTEN".
This complies with the comment in strfuncs.cc:sys_cp_wcstombs:
- The functions always create 0-terminated results, no matter what.
The fix is easy:
if (pos < glen)
{
len --; /* leave space for final NUL */
...

------
Thomas
Corinna Vinschen
2012-07-19 09:41:58 UTC
Permalink
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
...
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
...
Post by Corinna Vinschen
No, not yet. This isn't exactly a regression from former behaviour.
Please provide a patch or remind me in a few weeks again.
About the buffering I may send a patch when I find time.
Post by Corinna Vinschen
Post by Thomas Wolff
* I assume the current code will also fail in non-UTF-8 locales;
if the wcs block being converted contains a non-convertible character,
it would abort since wcstombs returns -1
(assuming here that sys_wcstombs behaves alike in this respect)
It doesn't. In fact, sys_cp_wcstombs was designed to never fail
to convert a string. See the source code in strfunc.cc, line 447ff.
Tested meanwhile. My assumption was too speculative, sorry.
Post by Corinna Vinschen
Post by Thomas Wolff
* I had previously observed that with a read size of n only n-1
bytes would be delivered
and thought this was on purpose because wcstombs appends a final
nul to its result.
Now n bytes are returned (if available) and in fact the byte
behind the read() buffer is
overwritten (see modified test program).
It's not Cygwin overwriting the byte, your testcase is...
Both were, actually...
Post by Corinna Vinschen
Post by Thomas Wolff
...
n = read (fd, filebuf, filebuflen);
if (out_tty) {
filebuf [n] = 0;
Hmm, what if n == filebuflen?
I admit my test case was bogus in this respect *BLUSH*.
However, yet the error is there, as a fixed test case reveals.
Also your own testcase does print out "OVERWRITTEN".
I don't see this. I use the same "Scoloplos" testcase and my
code does not print "OVERWRITTEN".


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Thomas Wolff
2012-07-20 22:14:12 UTC
Permalink
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
...
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
...
Post by Corinna Vinschen
No, not yet. This isn't exactly a regression from former behaviour.
Please provide a patch or remind me in a few weeks again.
About the buffering I may send a patch when I find time.
(Gave it a short try but no success yet.)
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
* I had previously observed that with a read size of n only n-1
bytes would be delivered
and thought this was on purpose because wcstombs appends a final
nul to its result.
Now n bytes are returned (if available) and in fact the byte
behind the read() buffer is
overwritten (see modified test program).
It's not Cygwin overwriting the byte, your testcase is...
Both were, actually...
Post by Corinna Vinschen
Post by Thomas Wolff
...
n = read (fd, filebuf, filebuflen);
if (out_tty) {
filebuf [n] = 0;
Hmm, what if n == filebuflen?
I admit my test case was bogus in this respect *BLUSH*.
However, yet the error is there, as a fixed test case reveals.
Also your own testcase does print out "OVERWRITTEN".
I don't see this. I use the same "Scoloplos" testcase and my
code does not print "OVERWRITTEN".
Beware that for the test case, you need to do a Windows paste; cp
testfile /dev/clipboard will not do because you would take the
"CYGWIN_NATIVE" branch then. Copy from notepad or mouse-copy from mintty
will do.
Thomas
Corinna Vinschen
2012-07-21 09:12:24 UTC
Permalink
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
...
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
...
Post by Corinna Vinschen
No, not yet. This isn't exactly a regression from former behaviour.
Please provide a patch or remind me in a few weeks again.
About the buffering I may send a patch when I find time.
(Gave it a short try but no success yet.)
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
* I had previously observed that with a read size of n only n-1
bytes would be delivered
and thought this was on purpose because wcstombs appends a final
nul to its result.
Now n bytes are returned (if available) and in fact the byte
behind the read() buffer is
overwritten (see modified test program).
It's not Cygwin overwriting the byte, your testcase is...
Both were, actually...
Post by Corinna Vinschen
Post by Thomas Wolff
...
n = read (fd, filebuf, filebuflen);
if (out_tty) {
filebuf [n] = 0;
Hmm, what if n == filebuflen?
I admit my test case was bogus in this respect *BLUSH*.
However, yet the error is there, as a fixed test case reveals.
Also your own testcase does print out "OVERWRITTEN".
I don't see this. I use the same "Scoloplos" testcase and my
code does not print "OVERWRITTEN".
Beware that for the test case, you need to do a Windows paste; cp
testfile /dev/clipboard will not do because you would take the
"CYGWIN_NATIVE" branch then. Copy from notepad or mouse-copy from
mintty will do.
No, it doesn't. The clipboard was filled with Ctrl-A, Ctrl-C from
within notepad.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Gregg Levine
2012-07-21 17:12:35 UTC
Permalink
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
...
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
...
Post by Corinna Vinschen
No, not yet. This isn't exactly a regression from former behaviour.
Please provide a patch or remind me in a few weeks again.
About the buffering I may send a patch when I find time.
(Gave it a short try but no success yet.)
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
* I had previously observed that with a read size of n only n-1
bytes would be delivered
and thought this was on purpose because wcstombs appends a final
nul to its result.
Now n bytes are returned (if available) and in fact the byte
behind the read() buffer is
overwritten (see modified test program).
It's not Cygwin overwriting the byte, your testcase is...
Both were, actually...
Post by Corinna Vinschen
Post by Thomas Wolff
...
n = read (fd, filebuf, filebuflen);
if (out_tty) {
filebuf [n] = 0;
Hmm, what if n == filebuflen?
I admit my test case was bogus in this respect *BLUSH*.
However, yet the error is there, as a fixed test case reveals.
Also your own testcase does print out "OVERWRITTEN".
I don't see this. I use the same "Scoloplos" testcase and my
code does not print "OVERWRITTEN".
Beware that for the test case, you need to do a Windows paste; cp
testfile /dev/clipboard will not do because you would take the
"CYGWIN_NATIVE" branch then. Copy from notepad or mouse-copy from
mintty will do.
No, it doesn't. The clipboard was filled with Ctrl-A, Ctrl-C from
within notepad.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Hello!
Group I just might be materializing into this discussion without all
of the facts, but every time I've used the clipboard functions to
paste something into a terminal window, such as the storage point for
a project via Google Code, it just worked.

Meaning that I first created a directory for the stuff, changed into
it, and then pasted the location for its code and then watched as it
retrieved the code from that hosting entity. So far this has worked
twice for me. And incidentally Corinna this was all done before your
timely update was announced and is now being retrieved. (Thank you for
that one.)

-----
Gregg C Levine gregg.drwho8-***@public.gmane.org
"This signature fought the Time Wars, time and again."
Christopher Faylor
2012-07-21 17:19:36 UTC
Permalink
Post by Gregg Levine
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
...
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
...
Post by Corinna Vinschen
No, not yet. This isn't exactly a regression from former behaviour.
Please provide a patch or remind me in a few weeks again.
About the buffering I may send a patch when I find time.
(Gave it a short try but no success yet.)
Post by Corinna Vinschen
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
* I had previously observed that with a read size of n only n-1
bytes would be delivered
and thought this was on purpose because wcstombs appends a final
nul to its result.
Now n bytes are returned (if available) and in fact the byte
behind the read() buffer is
overwritten (see modified test program).
It's not Cygwin overwriting the byte, your testcase is...
Both were, actually...
Post by Corinna Vinschen
Post by Thomas Wolff
...
n = read (fd, filebuf, filebuflen);
if (out_tty) {
filebuf [n] = 0;
Hmm, what if n == filebuflen?
I admit my test case was bogus in this respect *BLUSH*.
However, yet the error is there, as a fixed test case reveals.
Also your own testcase does print out "OVERWRITTEN".
I don't see this. I use the same "Scoloplos" testcase and my
code does not print "OVERWRITTEN".
Beware that for the test case, you need to do a Windows paste; cp
testfile /dev/clipboard will not do because you would take the
"CYGWIN_NATIVE" branch then. Copy from notepad or mouse-copy from
mintty will do.
No, it doesn't. The clipboard was filled with Ctrl-A, Ctrl-C from
within notepad.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Hello!
Group I just might be materializing into this discussion without all
of the facts,
You should not be attempting to respond to this thread without
understanding the facts. This isn't a chatty mailing list for relating
experiences. We're dealing with real issues here.
Post by Gregg Levine
Meaning that I first created a directory for the stuff, changed into
it, and then pasted the location for its code and then watched as it
retrieved the code from that hosting entity. So far this has worked
twice for me. And incidentally Corinna this was all done before your
timely update was announced and is now being retrieved. (Thank you for
that one.)
I'm sure Corinna appreciates the appreciation but this really has
nothing to do with cygwin-developers. If you want to talk about your
Cygwin experience, use the main cygwin list.
Thomas Wolff
2012-08-14 20:54:47 UTC
Permalink
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
...
* The current (CVS) code will not work if even the first character
to be converted
needs more bytes than the buffer provides, e.g. if the
application calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
Yes, indeed. I'll have a look.
...
Post by Corinna Vinschen
No, not yet. This isn't exactly a regression from former behaviour.
Please provide a patch or remind me in a few weeks again.
About the buffering I may send a patch when I find time.
Made a patch today, sending to cygwin-patches.
Post by Thomas Wolff
Post by Corinna Vinschen
Post by Thomas Wolff
* I had previously observed that with a read size of n only n-1
bytes would be delivered
and thought this was on purpose because wcstombs appends a final
nul to its result.
Now n bytes are returned (if available) and in fact the byte
behind the read() buffer is
overwritten (see modified test program).
It's not Cygwin overwriting the byte, your testcase is...
Both were, actually...
...
I can't reproduce my previous observation with the current snapshop.
Apparently it works now,
together with my patch for all cases.
------
Thomas

Loading...