Discussion:
Implementing aio_* and lio_* (async i/o) on Cygwin
Mark Geisert
2017-12-12 09:11:45 UTC
Permalink
I've got a proof-of-concept implementation working but haven't integrated
it into my local Cygwin source tree yet. This implementation uses a pool
of pthreads to issue reads and writes asynchronously. Using threads
allows to operate transparently with the usual Cygwin-supplied file
descriptors/handles that are set up for synchronous-only operation.

If I were to use Win32 overlapped i/o, as Corinna suggested as a
possibility, there would have to be some supporting changes to Cygwin
internals, I think. Either file descriptors would have to always permit
overlapped usage, or open() and/or fcntl() would have to allow some new
flag to specify overlapped capability when creating a new fd/handle.

There is an O_ASYNC flag in some non-POSIX Unixes but what little
documentation I can find online makes it sound like an inexact match to
the need here. It's used to signal (via SIGIO arrival) that space is
available for a write, or that data is available for a read. What it
would need to do for aio_* semantics is to signal that a write or a read
has completed. For disk files one *might* consider these two viewpoints
equal but I'd like to hear if anybody thinks differently.

I'm also unsure one can add overlapped capability to an existing Win32
handle. If you can, then we could have fcntl() add or remove it from a fd
on the fly. If you can't add it to an existing handle, then an open()
would have to ask for the capability when creating an fd/handle.

Wouldn't it be easier (less modification of existing code) to just use the
thread model?

Any thoughts on any part of this would be appreciated.
Thanks,

..mark

P.S. Google search aio_read(), aio_write() man pages for background.
P.P.S. I could put my PoC code up on my GitHub space for inspection.
Corinna Vinschen
2017-12-12 14:00:38 UTC
Permalink
I've got a proof-of-concept implementation working but haven't integrated it
into my local Cygwin source tree yet. This implementation uses a pool of
pthreads to issue reads and writes asynchronously. Using threads allows to
operate transparently with the usual Cygwin-supplied file
descriptors/handles that are set up for synchronous-only operation.
If I were to use Win32 overlapped i/o, as Corinna suggested as a
possibility, there would have to be some supporting changes to Cygwin
internals, I think. Either file descriptors would have to always permit
overlapped usage, or open() and/or fcntl() would have to allow some new flag
to specify overlapped capability when creating a new fd/handle.
If you use threads, you will have to change your code to use cygthreads
instead of pthreads. Pthreads are a userspace concept, cygthreads are
used for internal tasks.

However, using async IO, the only change necessary would be to change
prw_open to open an async handle, and pread/pwrite to wait for the
result if the current I/O is NOT async. It may be necessary to add a
parameter or two to the pread/pwrite fhandler methods, but all the rest
of the functionality could be in the callers.

Please stop thinking in Win32 overlapped I/O. I have no idea why
Microsoft uses this term. Under the ntdll hood, a handle is either
synchronous (Keeping track of file position, waiting for read/write to
complete) or asynchronous (Not keeping track of file position, returning
STATUS_PENDING rather than waiting for completion, based on using the
FILE_SYNCHRONOUS_IO_{NON}ALERT.

On async handles, you can wait for an event object to be signalled, or
you can ask the NtReadFile/NtWriteFile functions to call a completion
routine.
There is an O_ASYNC flag in some non-POSIX Unixes but what little
documentation
No, if we had to create a new open flag for this functionality it would
be wrong.
need here. It's used to signal (via SIGIO arrival) that space is available
for a write, or that data is available for a read. What it would need to do
for aio_* semantics is to signal that a write or a read has completed.
Yes, but with an arbitrary signal. Alternatively aio can start a
pthread on completion, basically all the stuff sigevent is supposed to
handle.
I'm also unsure one can add overlapped capability to an existing Win32
handle.
That's what prw_open is for.
Wouldn't it be easier (less modification of existing code) to just use the
thread model?
Not sure. You will have to change the threading stuff anyway and what's
working in user space is pretty different from how you do it in Cygwin.
Also, wouldn't it be nice to use the methods already provided by the OS
for just such an opportunity? Changing the pread/pwrite/prw_open
methods to support aio_read/aio_write should be pretty straightforward.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Mark Geisert
2017-12-13 06:48:18 UTC
Permalink
Hi Corinna,
Post by Corinna Vinschen
I've got a proof-of-concept implementation working but haven't integrated it
into my local Cygwin source tree yet. This implementation uses a pool of
pthreads to issue reads and writes asynchronously. Using threads allows to
operate transparently with the usual Cygwin-supplied file
descriptors/handles that are set up for synchronous-only operation.
If I were to use Win32 overlapped i/o, as Corinna suggested as a
possibility, there would have to be some supporting changes to Cygwin
internals, I think. Either file descriptors would have to always permit
overlapped usage, or open() and/or fcntl() would have to allow some new flag
to specify overlapped capability when creating a new fd/handle.
If you use threads, you will have to change your code to use cygthreads
instead of pthreads. Pthreads are a userspace concept, cygthreads are
used for internal tasks.
Understood in general. I had originally implemented a libaio that was
built on top of Cygwin, so there was an "app" viewpoint in my thinking.
But I'm coming to understand what you're talking about: really doing aio_*
and lio_* as Cygwin syscalls using the same capabilities within Cygwin
that other syscalls use. So cygthreads it is, only if they're needed.
Post by Corinna Vinschen
However, using async IO, the only change necessary would be to change
prw_open to open an async handle, and pread/pwrite to wait for the
result if the current I/O is NOT async. It may be necessary to add a
parameter or two to the pread/pwrite fhandler methods, but all the rest
of the functionality could be in the callers.
Please stop thinking in Win32 overlapped I/O. I have no idea why
Microsoft uses this term. Under the ntdll hood, a handle is either
synchronous (Keeping track of file position, waiting for read/write to
complete) or asynchronous (Not keeping track of file position, returning
STATUS_PENDING rather than waiting for completion, based on using the
FILE_SYNCHRONOUS_IO_{NON}ALERT.
On async handles, you can wait for an event object to be signalled, or
you can ask the NtReadFile/NtWriteFile functions to call a completion
routine.
OK. I understand.
Post by Corinna Vinschen
There is an O_ASYNC flag in some non-POSIX Unixes but what little
documentation
No, if we had to create a new open flag for this functionality it would
be wrong.
need here. It's used to signal (via SIGIO arrival) that space is available
for a write, or that data is available for a read. What it would need to do
for aio_* semantics is to signal that a write or a read has completed.
Yes, but with an arbitrary signal. Alternatively aio can start a
pthread on completion, basically all the stuff sigevent is supposed to
handle.
I'm also unsure one can add overlapped capability to an existing Win32
handle.
That's what prw_open is for.
Ah, the light of enlightenment has lit.
Post by Corinna Vinschen
Wouldn't it be easier (less modification of existing code) to just use the
thread model?
Not sure. You will have to change the threading stuff anyway and what's
working in user space is pretty different from how you do it in Cygwin.
Also, wouldn't it be nice to use the methods already provided by the OS
for just such an opportunity? Changing the pread/pwrite/prw_open
methods to support aio_read/aio_write should be pretty straightforward.
Yes, I've got the idea now. I'll go work on this direction and report
back when I have something, modulo holidays and stuff.

Right now I have a patch to implement sigtimedwait() since it's similar to
the already implemented sigwait() and sigwaitinfo(). I also have a patch
that extends support for getting and setting cygthread and pthread names
via cygwin_external(). I'll submit these two patches shortly.

Thanks for the patient internals explanations, as always.

..mark
Corinna Vinschen
2017-12-13 10:02:12 UTC
Permalink
Post by Mark Geisert
Hi Corinna,
[...]
Right now I have a patch to implement sigtimedwait() since it's similar to
the already implemented sigwait() and sigwaitinfo().
Sorry for skipping over everything else, but that's really cool!
Post by Mark Geisert
I also have a patch
that extends support for getting and setting cygthread and pthread names via
cygwin_external(). I'll submit these two patches shortly.
That sounds a bit weird. Why would you want to set cygthread names
via cygwin_external? Those are not exposed to userspace.

And getting and setting pthread names is already implemented via
the Linux-like calls pthread_getname_np / pthread_setname_np...


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Mark Geisert
2017-12-14 05:13:20 UTC
Permalink
Post by Corinna Vinschen
Post by Mark Geisert
Hi Corinna,
[...]
Right now I have a patch to implement sigtimedwait() since it's similar to
the already implemented sigwait() and sigwaitinfo().
Sorry for skipping over everything else, but that's really cool!
Thanks :), I needed it for the userspace aio_suspend() and lio_listio().
Post by Corinna Vinschen
Post by Mark Geisert
I also have a patch
that extends support for getting and setting cygthread and pthread names via
cygwin_external(). I'll submit these two patches shortly.
That sounds a bit weird. Why would you want to set cygthread names
via cygwin_external? Those are not exposed to userspace.
And getting and setting pthread names is already implemented via
the Linux-like calls pthread_getname_np / pthread_setname_np...
The thread name patch to cygwin_internal() [sic] is for the benefit of
strace, the forthcoming cygmon, and any other utility that might want to
display thread names of some other process. The pthread part of the patch
does use those functions you mentioned; the cygthread part uses the
existing cygthread::name() and a new ::setname() I supply.

..mark
Mark Geisert
2018-03-20 05:53:11 UTC
Permalink
Post by Corinna Vinschen
However, using async IO, the only change necessary would be to change
prw_open to open an async handle, and pread/pwrite to wait for the
result if the current I/O is NOT async. It may be necessary to add a
parameter or two to the pread/pwrite fhandler methods, but all the rest
of the functionality could be in the callers.
I see what you're suggesting; seems sly in a nice way :).

A small part I'm missing is in interfacing to the layer above this. Are the
aio_* functions supposed to be implemented as "real" syscalls (in syscalls.cc)?
Or should they be implemented in their own aio.cc (which is where I have them
ATM) and call pread()/pwrite() to do their dirty work? I'm unsure how "central"
a syscall has to be to merit syscalls.cc location.

Regardless of which file the code is in, I was thinking I should copy
syscalls.cc's pread() contents into aio_read(), for example, to start with.
Then add extra param(s) as needed to the fhandler method call as you suggested.

The problem with having aio_read() call pread() directly is how to tell pread()
we want an async read. That's why I was suggesting O_ASYNC previously, though I
now understand that's the wrong direction to go. What is the correct direction?
Thanks much,

..mark
Corinna Vinschen
2018-03-20 10:10:58 UTC
Permalink
Post by Mark Geisert
Post by Corinna Vinschen
However, using async IO, the only change necessary would be to change
prw_open to open an async handle, and pread/pwrite to wait for the
result if the current I/O is NOT async. It may be necessary to add a
parameter or two to the pread/pwrite fhandler methods, but all the rest
of the functionality could be in the callers.
I see what you're suggesting; seems sly in a nice way :).
A small part I'm missing is in interfacing to the layer above this. Are the
aio_* functions supposed to be implemented as "real" syscalls (in
syscalls.cc)? Or should they be implemented in their own aio.cc (which is
where I have them ATM) and call pread()/pwrite() to do their dirty work?
I'm unsure how "central" a syscall has to be to merit syscalls.cc location.
syscalls.cc is historical. Cygwin has grown over a single file to keep
"syscalls" anyway, so having the interfaces in an aio.cc file doesn't
hurt.
Post by Mark Geisert
Regardless of which file the code is in, I was thinking I should copy
syscalls.cc's pread() contents into aio_read(), for example, to start with.
Then add extra param(s) as needed to the fhandler method call as you suggested.
That's definitely ok. When I was talking about pread/pwrite in our
preliminary discussion on cygwin-patches, I was always talking about the
fhandler_disk_file::pread and fhandler_disk_file::pwrite methods in
fact, not the syscalls of the same name. I'm sorry I was unclear there :}
Post by Mark Geisert
The problem with having aio_read() call pread() directly is how to tell
pread() we want an async read. That's why I was suggesting O_ASYNC
previously, though I now understand that's the wrong direction to go. What
is the correct direction?
Looks like you're on the right track, just go ahead. We can fix the
finer details later.


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Mark Geisert
2018-03-29 05:57:43 UTC
Permalink
Post by Corinna Vinschen
Post by Mark Geisert
Regardless of which file the code is in, I was thinking I should copy
syscalls.cc's pread() contents into aio_read(), for example, to start with.
Then add extra param(s) as needed to the fhandler method call as you suggested.
That's definitely ok. When I was talking about pread/pwrite in our
preliminary discussion on cygwin-patches, I was always talking about the
fhandler_disk_file::pread and fhandler_disk_file::pwrite methods in
fact, not the syscalls of the same name. I'm sorry I was unclear there :}
[...]
Post by Corinna Vinschen
Looks like you're on the right track, just go ahead. We can fix the
finer details later.
OK, I've posted a 3-part patch set to cygwin-patches. This is still a WIP so it
doesn't yet include what's been decided above. It currently queues async ops to
cygthread worker threads.

I plan to update aio_read() and aio_write() to try to launch async ops inline,
and only if they fail with ESPIPE would they be queued for worker thread action.
(E.g., ops on sockets or other devices without pread/pwrite support.)

I think parts 1 and 2 of the patch set are essentially finished, unless there
are issues to be corrected. But part 3 will need some more work done.
Thanks,

..mark
Corinna Vinschen
2018-04-03 07:57:17 UTC
Permalink
Hi Mark,
Post by Mark Geisert
Post by Corinna Vinschen
Post by Mark Geisert
Regardless of which file the code is in, I was thinking I should copy
syscalls.cc's pread() contents into aio_read(), for example, to start with.
Then add extra param(s) as needed to the fhandler method call as you suggested.
That's definitely ok. When I was talking about pread/pwrite in our
preliminary discussion on cygwin-patches, I was always talking about the
fhandler_disk_file::pread and fhandler_disk_file::pwrite methods in
fact, not the syscalls of the same name. I'm sorry I was unclear there :}
[...]
Post by Corinna Vinschen
Looks like you're on the right track, just go ahead. We can fix the
finer details later.
OK, I've posted a 3-part patch set to cygwin-patches. This is still a WIP
so it doesn't yet include what's been decided above. It currently queues
async ops to cygthread worker threads.
I plan to update aio_read() and aio_write() to try to launch async ops
inline, and only if they fail with ESPIPE would they be queued for worker
thread action. (E.g., ops on sockets or other devices without pread/pwrite
support.)
I think parts 1 and 2 of the patch set are essentially finished, unless
there are issues to be corrected. But part 3 will need some more work done.
Thanks,
Thank you! I'm a bit low on available time ATM so please be patient.
I'll check your patch as time permits.

A word in terms of your commit messages. Just providing the fact in the
commit header is a bit low on details. Don't be shy to improve your
commit messages with some details, what you did, why you did it, what to
look for.

Some testcase (here on cygwin-developers, not as patch) would be
nice, too.


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Mark Geisert
2018-04-03 09:14:31 UTC
Permalink
Hi Corinna,
I appreciate the update. I'm about to debug a revised implementation that
takes into account your most recent comments. So...
Post by Corinna Vinschen
Hi Mark,
Post by Mark Geisert
Post by Corinna Vinschen
Post by Mark Geisert
Regardless of which file the code is in, I was thinking I should copy
syscalls.cc's pread() contents into aio_read(), for example, to start with.
Then add extra param(s) as needed to the fhandler method call as you suggested.
That's definitely ok. When I was talking about pread/pwrite in our
preliminary discussion on cygwin-patches, I was always talking about the
fhandler_disk_file::pread and fhandler_disk_file::pwrite methods in
fact, not the syscalls of the same name. I'm sorry I was unclear there :}
[...]
Post by Corinna Vinschen
Looks like you're on the right track, just go ahead. We can fix the
finer details later.
OK, I've posted a 3-part patch set to cygwin-patches. This is still a WIP
so it doesn't yet include what's been decided above. It currently queues
async ops to cygthread worker threads.
I plan to update aio_read() and aio_write() to try to launch async ops
inline, and only if they fail with ESPIPE would they be queued for worker
thread action. (E.g., ops on sockets or other devices without pread/pwrite
support.)
I think parts 1 and 2 of the patch set are essentially finished, unless
there are issues to be corrected. But part 3 will need some more work done.
Thanks,
Thank you! I'm a bit low on available time ATM so please be patient.
I'll check your patch as time permits.
No worries. Take the "part 3" patch with a large grain of salt if/when
you get to it.
Post by Corinna Vinschen
A word in terms of your commit messages. Just providing the fact in the
commit header is a bit low on details. Don't be shy to improve your
commit messages with some details, what you did, why you did it, what to
look for.
Yes, absolutely. I neglected to say what each patch part does. I seem to
be stumbling over every patch posting convention one by one :-?. Onward!
Post by Corinna Vinschen
Some testcase (here on cygwin-developers, not as patch) would be
nice, too.
I do have a couple already. One is a (fairly large) test app I have that
times various methods of copying the heap from one process to a child.
AIO is one of those methods. I could whittle that down to something
using only AIO. And the Linux man page for aio(7) has a sample program
that can test AIO on something other than disk files.
Thanks & Regards,

..mark
Mark Geisert
2018-04-19 09:21:46 UTC
Permalink
Post by Corinna Vinschen
Some testcase (here on cygwin-developers, not as patch) would be
nice, too.
I do have a couple already. One is a (fairly large) test app I have that times
various methods of copying the heap from one process to a child. AIO is one of
those methods. I could whittle that down to something using only AIO. And the
Linux man page for aio(7) has a sample program that can test AIO on something
other than disk files.
In addition to those two test programs, there's iozone. That supports AIO
operations but would need to be ported to Cygwin. IMNSHO iozone needs a -NG
re-write. It is 5 source files, no header files, and 1K of its 30K lines are
#ifdef's. And what it calls a Windows build is actually a Cygwin (32-bits)
build. But it's there if needed, modulo some work porting it.

I'm using the Linux man page aio(7) example to debug the AIO code now for the
non-diskfile case.

The "heap transfer" program I mentioned earlier, heapxfer, allows me to specify
heap size and number of simultaneous AIOs. Simple cases, such as staying within
AIO_MAX AIOs, work fine. I recently finished debugging a testcase writing 1GB
of data to a file using 512 AIOs. So the first AIO_MAX AIOs launched as inline
AIOs, while the remainder were queued. Then as worker threads became available,
they launched inline AIOs themselves. Found a couple of nits but it's working now.

Comments or questions welcome. I'll keep going on the testing.
Cheers,

..mark
Mark Geisert
2018-06-12 08:18:09 UTC
Permalink
Updating my previous post...
Post by Mark Geisert
Post by Corinna Vinschen
Some testcase (here on cygwin-developers, not as patch) would be
nice, too.
I do have a couple already. One is a (fairly large) test app I have that times
various methods of copying the heap from one process to a child. AIO is one of
those methods. I could whittle that down to something using only AIO. And the
Linux man page for aio(7) has a sample program that can test AIO on something
other than disk files.
The man page sample program, which does a single aio_read() on each file/device
named in args, seems to work for all cases except the specific one demonstrated
there, which is more than one /dev/stdin. On Cygwin, satisfying the first
aio_read() causes the second to be satisfied with no input. On Linux, the
program waits for each aio_read() to be satisfied in sequence.
Post by Mark Geisert
In addition to those two test programs, there's iozone. That supports AIO
operations but would need to be ported to Cygwin. IMNSHO iozone needs a -NG
re-write. It is 5 source files, no header files, and 1K of its 30K lines are
#ifdef's. And what it calls a Windows build is actually a Cygwin (32-bits)
build. But it's there if needed, modulo some work porting it.
I have ported iozone to 64-bit Cygwin (not re-written) and I can see it will be
very helpful in stress-testing the AIO code. At the moment I'm debugging an odd
strace message after many thousands of I/Os:
0 [sig] iozone 12248 wait_sig: garbled signal pipe data nb 176, sig 0
which seems to say the code is internally sending "signal 0", but there's no
obvious way that could be occurring.
Post by Mark Geisert
The "heap transfer" program I mentioned earlier, heapxfer, allows me to specify
heap size and number of simultaneous AIOs. Simple cases, such as staying within
AIO_MAX AIOs, work fine. I recently finished debugging a testcase writing 1GB
of data to a file using 512 AIOs. So the first AIO_MAX AIOs launched as inline
AIOs, while the remainder were queued. Then as worker threads became available,
they launched inline AIOs themselves. Found a couple of nits but it's working now.
Most recently with heapxfer I've been testing aio_write()s of 2047MB in 2047
AIOs on my 2Core/4Thread system. Found and fixed another obscure buglet.

I will be AFK June 15..25 but wanted to post status since it's been a while
since my last posting. Comments welcome but in any case I'll keep testing.
Thanks & Regards,

..mark
Corinna Vinschen
2018-06-12 09:03:02 UTC
Permalink
Post by Mark Geisert
Updating my previous post...
Post by Mark Geisert
Post by Corinna Vinschen
Some testcase (here on cygwin-developers, not as patch) would be
nice, too.
I do have a couple already. One is a (fairly large) test app I have that times
various methods of copying the heap from one process to a child. AIO is one of
those methods. I could whittle that down to something using only AIO. And the
Linux man page for aio(7) has a sample program that can test AIO on something
other than disk files.
The man page sample program, which does a single aio_read() on each
file/device named in args, seems to work for all cases except the specific
one demonstrated there, which is more than one /dev/stdin. On Cygwin,
satisfying the first aio_read() causes the second to be satisfied with no
input. On Linux, the program waits for each aio_read() to be satisfied in
sequence.
Post by Mark Geisert
In addition to those two test programs, there's iozone. That supports AIO
operations but would need to be ported to Cygwin. IMNSHO iozone needs a -NG
re-write. It is 5 source files, no header files, and 1K of its 30K lines are
#ifdef's. And what it calls a Windows build is actually a Cygwin (32-bits)
build. But it's there if needed, modulo some work porting it.
I have ported iozone to 64-bit Cygwin (not re-written) and I can see it will
be very helpful in stress-testing the AIO code. At the moment I'm debugging
0 [sig] iozone 12248 wait_sig: garbled signal pipe data nb 176, sig 0
which seems to say the code is internally sending "signal 0", but there's no
obvious way that could be occurring.
This is weird indeed.
Post by Mark Geisert
Post by Mark Geisert
The "heap transfer" program I mentioned earlier, heapxfer, allows me to specify
heap size and number of simultaneous AIOs. Simple cases, such as staying within
AIO_MAX AIOs, work fine. I recently finished debugging a testcase writing 1GB
of data to a file using 512 AIOs. So the first AIO_MAX AIOs launched as inline
AIOs, while the remainder were queued. Then as worker threads became available,
they launched inline AIOs themselves. Found a couple of nits but it's working now.
Most recently with heapxfer I've been testing aio_write()s of 2047MB in 2047
AIOs on my 2Core/4Thread system. Found and fixed another obscure buglet.
I will be AFK June 15..25 but wanted to post status since it's been a while
since my last posting. Comments welcome but in any case I'll keep testing.
Thanks & Regards,
No worries and enjoy your vacation. Apart from the few nits I really
like the code you provided!


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Loading...