Thomas Wolff
2012-07-03 16:29:19 UTC
[taking this thread to cygwin-developers]
call
all, then run the program to see bytes skipped.
Actually it seems to skip as many bytes per read() as there were
additional UTF-8 bytes (more bytes than characters) in the preceding
read block.
Checking the code again, variable pos seems to be used both as an
index into the clipboard (WCHAR) and an offset to the resulting
string length (char) which would explain the effect (not having
checked all the details though as I'm not familiar with the used
APIs).
Thanks for the testcase. I applied a patch which is supposed to fix the
problem. It should be in the next developer snapshot. Please give it
a try.
The patch (loaded from CVS) seems to almost fix the issue but also
another bug crept in.
* Looking at the code is quite confusing as long as I assumed
sys_wcstombs would work like wcstombs;
the latter is obviously designed to convert complete nul-terminated
wcs strings only as
there is no way to control the number of wcs characters, neither as
consumed in the result
(as your new comment also mentions) nor as setting a limit -
the latter is apparently different, telling from the comment in
strfuncs.cc.
I had tried a patch using wctomb instead, as follows, but it didn't
work;
maybe for some reason the standard functions cannot be used in this
context?
int outlen = 0;
/* Make sure buffer has room for max. length UTF-8 (or GB18030/UHC)
plus final NUL;
this does not work if the total buffer is shorter,
so some read-ahead will be needed for a complete solution */
while (outlen < (int) len - 4 && pos < (int) glen /* IS THIS
CORRECT? */ )
{
int ret1 = wctomb ((char *) ptr + outlen, buf [pos]);
if (ret1 == -1)
{
((char *) ptr) [outlen] = 0x7F; /* ?? */
ret1 = 1;
}
pos ++; /* clipboard buffer position */
outlen += ret1; /* output size */
}
ret = outlen;
* The current (CVS) code will not work if even the first character to be
converted
needs more bytes than the buffer provides, e.g. if the application
calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
* I assume the current code will also fail in non-UTF-8 locales;
if the wcs block being converted contains a non-convertible character,
it would abort since wcstombs returns -1
(assuming here that sys_wcstombs behaves alike in this respect)
and not even deliver the characters before the failing one.
* I had previously observed that with a read size of n only n-1 bytes
would be delivered
and thought this was on purpose because wcstombs appends a final nul
to its result.
Now n bytes are returned (if available) and in fact the byte behind
the read() buffer is
overwritten (see modified test program).
------
Thomas
You know, we just love STCs. Send you small test program here, plus a
short instruction how you created the clipboard content and how to
short instruction how you created the clipboard content and how to
the testcase to see the problem.
Sure, so here it is. Open clipboard.txt with notepad, ^A^C to copyall, then run the program to see bytes skipped.
Actually it seems to skip as many bytes per read() as there were
additional UTF-8 bytes (more bytes than characters) in the preceding
read block.
Checking the code again, variable pos seems to be used both as an
index into the clipboard (WCHAR) and an offset to the resulting
string length (char) which would explain the effect (not having
checked all the details though as I'm not familiar with the used
APIs).
problem. It should be in the next developer snapshot. Please give it
a try.
another bug crept in.
* Looking at the code is quite confusing as long as I assumed
sys_wcstombs would work like wcstombs;
the latter is obviously designed to convert complete nul-terminated
wcs strings only as
there is no way to control the number of wcs characters, neither as
consumed in the result
(as your new comment also mentions) nor as setting a limit -
the latter is apparently different, telling from the comment in
strfuncs.cc.
I had tried a patch using wctomb instead, as follows, but it didn't
work;
maybe for some reason the standard functions cannot be used in this
context?
int outlen = 0;
/* Make sure buffer has room for max. length UTF-8 (or GB18030/UHC)
plus final NUL;
this does not work if the total buffer is shorter,
so some read-ahead will be needed for a complete solution */
while (outlen < (int) len - 4 && pos < (int) glen /* IS THIS
CORRECT? */ )
{
int ret1 = wctomb ((char *) ptr + outlen, buf [pos]);
if (ret1 == -1)
{
((char *) ptr) [outlen] = 0x7F; /* ?? */
ret1 = 1;
}
pos ++; /* clipboard buffer position */
outlen += ret1; /* output size */
}
ret = outlen;
* The current (CVS) code will not work if even the first character to be
converted
needs more bytes than the buffer provides, e.g. if the application
calls read() with length 1 only.
Some extra buffering would be needed to make it work then.
* I assume the current code will also fail in non-UTF-8 locales;
if the wcs block being converted contains a non-convertible character,
it would abort since wcstombs returns -1
(assuming here that sys_wcstombs behaves alike in this respect)
and not even deliver the characters before the failing one.
* I had previously observed that with a read size of n only n-1 bytes
would be delivered
and thought this was on purpose because wcstombs appends a final nul
to its result.
Now n bytes are returned (if available) and in fact the byte behind
the read() buffer is
overwritten (see modified test program).
------
Thomas