Ryan Johnson
2011-08-26 13:04:31 UTC
A lock on a file is represented by an event object. Process A holds the
lock corresponding with event a. Process B tries to lock, but the lock
of process A blocks that. So B now waits for event a, until it gets
signalled. Now A unlocks, thus signalling event a and closing the handle
afterwards. But A's time slice isn't up yet, so it tries again to lock
the file, before B returned from the wait for a. And here a wrong
condition fails to recognize the situation. It finds the event object,
but since it's recognized as "that's me", it doesn't treat the event as
a blocking factor. This in turn is the allowance to create its own lock
event object. However, the object still exists, since b has still an
open handle to it. So creating the event fails, and rightfully so.
What I don't have is an idea how to fix this problem correctly. I have
to think about that. Stay tuned.
If I understand correctly, the file locking protocol is to create anlock corresponding with event a. Process B tries to lock, but the lock
of process A blocks that. So B now waits for event a, until it gets
signalled. Now A unlocks, thus signalling event a and closing the handle
afterwards. But A's time slice isn't up yet, so it tries again to lock
the file, before B returned from the wait for a. And here a wrong
condition fails to recognize the situation. It finds the event object,
but since it's recognized as "that's me", it doesn't treat the event as
a blocking factor. This in turn is the allowance to create its own lock
event object. However, the object still exists, since b has still an
open handle to it. So creating the event fails, and rightfully so.
What I don't have is an idea how to fix this problem correctly. I have
to think about that. Stay tuned.
event having the same name as the file, in the same directory as that
file, and use that to synchronize locks on the file? And the problem
arises if a process releases the lock, signals waiting processes, then
immediately declares victory and closes the event handle because the
lock is now "free" ?
Nasty problem. Comes up with database locking implementations as well:
how to know when it's safe to garbage collect a lock's storage, when the
synchronization primitive that would normally tell you is embedded in
that same storage...
The simplest solution I've come up with so far is to keep around a set
of permanent mutex locks, and assign every (file/database) lock to one
mutex, e.g. by hashing. Then, locks can only be created or destroyed by
a process holding the owner lock, which eliminates the race window where
we know the lock is free but don't know whether its storage is still
valid. Normally 4-8 mutex locks is plenty to avoid bottlenecks.
Thoughts?
Ryan