Post by Corinna VinschenPost by Jeffrey AltmanI tried representing AFS Symlinks using a Microsoft assigned Reparse
Point Tag. The downside of following that approach was that Cygwin does
not properly handle Reparse Point Tags that it does not recognize. By
discarding the RP attribute and preserving the other reparse point stat
information (timestamps, attributes, size, etc) it introduces data
corrupting behaviors into Cygwin applications.
You never explained why this happens and at which point in the code. So
far it was the right thing to do, and I'm pretty sure you know why. I
don't change that, unless you can show me where and when this leads to
wrong behaviour. I asked for these details but you didn't offer an
explanation besides the fact itself so far. And it would have been
no problem to add a special handling for AFS in the cases where it went
wrong. I guess this is kind of a moot point, now that you converted
to native symlinks, but this had to be said.
I highlighted the bad section of code in the patch Christopher commented
on. The code in question is:
path.cc symlink_info::check_reparse_point() final else block.
/* Maybe it's a reparse point, but it's certainly not one we recognize.
Drop REPARSE attribute so we don't try to use the flag accidentally.
It's just some arbitrary file or directory for us. */
fileattr &= ~FILE_ATTRIBUTE_REPARSE_POINT;
As a result of this change, the timestamps and size of the reparse point
are reported to the application instead of the reparse point target's
stat information.
There are two options that I believe could be implemented here in place
of discarding the reparse point attribute:
1. Open the reparse point target and read its stat information. Replace
the stat information of the reparse point with that of the target.
2. Open the reparse point target. Perform a FileNameInformation query
to obtain the actual path of the target. Replace the reparse point with
a virtual symlink using the FileNameInformation response as the target path.
I believe the 2nd option is the better of the two because it is possible
for a file system driver to implement CreateFile followed by
FileNameInformation queries without requiring that the target be
accessed unless the target is required to determine authorization.
In either case, appropriate checks for reparse points as targets of
reparse points and recursion must be implemented.
Post by Corinna VinschenPost by Jeffrey AltmanHowever, there is a very clear test that can be applied to determine
when Microsoft Symlinks should be generated in preference to Cygwin
[...]
There are probably additional approaches but none of them are clean and
transparent. The second two involved significantly more complexity then
maintaining support within Cygwin's path.cc and could potentially
introduce incompatibilities with future Cygwin path.cc changes.
No test could be sufficient to switch on native symlinks automatically.
We were all very excited when it became clear that Microsoft introduced
native symlinks on NTFS with Vista, and I was early on playing around
with them and to try integrating them into Cygwin. My local testcase
uses DeviceIoControl to workaround any restrictions imposed by
CreateSymbolicLink. And I'm still playing around with them every now
and then, thinking that we could use them, but the restrictions are
disappointing me each time anew.
There are some downsides to native symlinks which make them hard to
justify, if not downright useless in a POSIX environment.
- The inability of normal users to create symlinks by default.
This can be worked around by changing the policy, but it's still a
PITA. Normal users don't know about the policy, some of them don't
even have the "Local Security Policy" MMC snap in. Even in a
corporate environment it requires to change the policy settings and
we all know how admins don't like to *soften* a policy. But let's
say we can help along with a FAQ entry.
Working around the policy by issuing DeviceIoControl() operations is
possible but will open another can of worms. I do not believe that
Cygwin should provide a backdoor.
Post by Corinna Vinschen- Native symlinks are marked as file or directory.
This has been added clearly for the benefit of Windows Explorer.
But it's a PITA as well because it destroys interoperability. It's
common that POSIX symlinks are created before the target exists.
How on earth should the symlink(2) function know if the target is
supposed to be a dir or a file. But Explorer as well as CMD will do
the wrong thing if the symlink is using a non-matching dir/file
marker.
The target of the symlink must be resolved and the
FILE_ATTRIBUTE_DIRECTORY flag set appropriately for all
GetFileAttribute[Ex] and Find*File[Ex] operation responses. It is the
inclusion of stat information in the directory enumeration output which
mandates this behavior.
Given the inclusion of stat information and the fact that reparse points
can refer to objects that have a very high latency to access, it is a
reasonable design choice to require the reparse point expose the
FILE_ATTRIBUTE_DIRECTORY bit that the target will have.
I have come to the conclusion that given the need to provide stat
information in the directory enumeration, the implementation of reparse
points is sane. The implementation permits directory enumeration to be
fast by not requiring the target objects be opened. For example, a
reparse point to an object stored in a HSM may take hours to load.
Another is a reparse point to a backup snapshot which may require
extended time to restore before it can be accessed.
Post by Corinna Vinschen- Only Windows paths are stored.
In a POSIX env a symlink created by POSIX tools should point to a
POSIX path. For instance, mount points change the fact where a
symlink actually points to and the symlink should not still
magically work afterwards.
But, hey, native symlinks store the path twice, the SubstituteName
and the PrintName. Shouldn't it be possible to store the Windows
path in one of them and the POSIX path in the other? Yes and no.
It's possible to write into these members whatever you like, but for
some weird reason, both members have to be Windows paths to work for
native Windows tools.
But we could store the POSIX path with backslashes, thus working
around the issue, no? No. An absolute path starting with a
backslash is possible, but the Windows tools will evaluate it as
root-relative to the current drive. cd to another drive in cmd,
and interop is broken again.
It took me a long time to understand how these fields are used. The
field names were poorly chosen.
The SubstituteName is a path that is used as-is by the Multiple UNC
Provider to redirect a request to the correct file system for
processing. This is always an absolute path. In other words, this is
the kernel version of the path.
The PrintName is a user-land UNC path or relative path which is not only
intended for user readability but also for user-land tools such as
robocopy to use when moving a symlink from one location to another.
When storing absolute paths, you must store them as absolute paths from
the device namespace not from the drive namespace. For example, here is
a symlink stored in AFS which refers to C:\.
[\\afs\yfs\user\jaltman]junction local_disk
\\afs\yfs\user\jaltman\local_disk: SYMBOLIC LINK
Print Name : c:\
Substitute Name: \??\c:\
And here is a symlink stored in c:\ which refers to the root of AFS.
C:\afs: SYMBOLIC LINK
Print Name : \\afs\all
Substitute Name: \??\UNC\afs\all
The output is from the SysInternal's tool, junction v1.06.
Note the inclusion of \??\UNC prior to UNC references and \??\<drive>:\
for DOS device name references. The DOS device maps to a volume name
and you could provide a link to the volume instead of the DOS device if
that was desirable.
Does this help?
Post by Corinna Vinschen- Remote and local symlinks may behave different in different environments.
Apart from the security policy, symlinks are also affected by an
fsutil setting. The admins can decide if symlinks work at all, or
if symlinks don't work depending on their own location and the
location of the target they are pointing to (local->local, local->remote,
remote->local, remote->remote)
So it's possible that local->local symlinks can be resolved while
opening local->remote symlinks simply fail with ugly status codes.
How on earth do you integrate that reliably into an environment in
which a symlink is a plain and simple thing, readable and writable
by everyone, whereever located, just apart from parent dir permissions.
Post by Jeffrey AltmanAs I see it, as flawed as Microsoft Symlinks are they are the common
interface that enables mixed applications to communicate with one
another. As such, where they can be used, they should be used. What is
the point of cross-platform support if mixed platform applications
cannot transparently share the data?
Cygwin is a POSIX environment in the first place. Interop is fine,
but if it collides with POSIX, we're clearly favoring POSIX.
Understood. Which is why I haven't suggested that cygwin symlinks be
replaced by microsoft symlinks in cases where they cannot be used safely.
Post by Corinna VinschenHaving said that.
Chris and I had a private discussion (not the first one on the subject!)
and we're willing to revisit the use of native symlinks in Cygwin but
it will be a while before that happens. A change to the path handling
code like this is not something that we'd consider for 1.7.18 which is
long overdue anyway.
Understood.
Post by Corinna VinschenWhat I will do is to add a new CYGWIN environment variable option, along
the lines of the winsymlinks option(*), or, which is very likely the
more elgant solution, a mount option, which will result in trying to
create native symlinks first, and a Cygwin symlink only if creating
a native symlink failed. That should help you along.
An environment variable should address James' use case. For creating
Symlinks in AFS a test for File System name "AFSRDRFsd" in the volume
information can be used as an indicator that DOS SYSTEM attribute is not
supported.
Post by Corinna VinschenCorinna
(*) In your blog you were musing why Cygwin supports lnk files but
not native symlinks. Here's the answer: lnk files support using
POSIX paths.
Whereby POSIX paths you mean specifying a path with forward slashes and
without also indicating the type of the object.
Thank you.
Jeffrey Altman