Discussion:
STC for libapr1 failure
Ryan Johnson
2011-08-26 13:04:31 UTC
Permalink
A lock on a file is represented by an event object. Process A holds the
lock corresponding with event a. Process B tries to lock, but the lock
of process A blocks that. So B now waits for event a, until it gets
signalled. Now A unlocks, thus signalling event a and closing the handle
afterwards. But A's time slice isn't up yet, so it tries again to lock
the file, before B returned from the wait for a. And here a wrong
condition fails to recognize the situation. It finds the event object,
but since it's recognized as "that's me", it doesn't treat the event as
a blocking factor. This in turn is the allowance to create its own lock
event object. However, the object still exists, since b has still an
open handle to it. So creating the event fails, and rightfully so.
What I don't have is an idea how to fix this problem correctly. I have
to think about that. Stay tuned.
If I understand correctly, the file locking protocol is to create an
event having the same name as the file, in the same directory as that
file, and use that to synchronize locks on the file? And the problem
arises if a process releases the lock, signals waiting processes, then
immediately declares victory and closes the event handle because the
lock is now "free" ?

Nasty problem. Comes up with database locking implementations as well:
how to know when it's safe to garbage collect a lock's storage, when the
synchronization primitive that would normally tell you is embedded in
that same storage...

The simplest solution I've come up with so far is to keep around a set
of permanent mutex locks, and assign every (file/database) lock to one
mutex, e.g. by hashing. Then, locks can only be created or destroyed by
a process holding the owner lock, which eliminates the race window where
we know the lock is free but don't know whether its storage is still
valid. Normally 4-8 mutex locks is plenty to avoid bottlenecks.

Thoughts?
Ryan
Corinna Vinschen
2011-08-26 15:47:08 UTC
Permalink
Post by Ryan Johnson
A lock on a file is represented by an event object. Process A holds the
lock corresponding with event a. Process B tries to lock, but the lock
of process A blocks that. So B now waits for event a, until it gets
signalled. Now A unlocks, thus signalling event a and closing the handle
afterwards. But A's time slice isn't up yet, so it tries again to lock
the file, before B returned from the wait for a. And here a wrong
condition fails to recognize the situation. It finds the event object,
but since it's recognized as "that's me", it doesn't treat the event as
a blocking factor. This in turn is the allowance to create its own lock
event object. However, the object still exists, since b has still an
open handle to it. So creating the event fails, and rightfully so.
What I don't have is an idea how to fix this problem correctly. I have
to think about that. Stay tuned.
If I understand correctly, the file locking protocol is to create an
event having the same name as the file, in the same directory as
that file, and use that to synchronize locks on the file? And the
problem arises if a process releases the lock, signals waiting
processes, then immediately declares victory and closes the event
handle because the lock is now "free" ?
It's more complicated. Have a look into flock.cc. The code which
handles locks on a file are guarded by a file-specific mutex. The locks
themsleves are represented by an event object which name is supposed to
identify a given lock. Blocking locks for a given scenario are fetched
by enumerating the given lock objects for a file and then testing if
any one of them results in a blocking condition. The problem here is
that process A doesn't know anymore that it has hold the lock at one
point and tries to create the exact same event again.

But the overall situation is more complicated, especially in the case of
BSD flock locks due to their semantics. It loooks like I have to add a
per-lock mutex to the whole thing and abandon some really dump assumptions
I made at one point...


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Loading...