Discussion:
Performance optimization in av::fixup - use buffered IO, not mapped file
Daniel Colascione
2012-12-11 00:58:33 UTC
Permalink
Emacs "make bootstrap" runs Emacs as a compiler, generating .elc files from .el
files. The build system runs Emacs once for each .el file we compile, of which
there are thousands. Now, Emacs takes about a two seconds to start on my system,
so compiling thousands of files takes a while; the actual .el to .elc
compilation is nearly instantaneous.

According to xperf, Emacs spends most of its startup time re-reading emacs.exe
code pages from disk.

~/edev/trunk.nox/src
$ time ./emacs --batch -Q --eval '(kill-emacs)'

real 0m2.236s
user 0m0.015s
sys 0m0.015s

~/edev/trunk.nox/src
$ time ./emacs --batch -Q --eval '(kill-emacs)'

real 0m2.343s
user 0m0.062s
sys 0m0.016s

We shouldn't need to read this file more than once. After the first run, the
system should be able to read emacs.exe from the standby list, not the disk.

Now, if we run emacs.exe from cmd, not bash, that's exactly what happens:

C:\Users\dancol\edev\trunk.nox\src
type bench-emacs.cmd
@echo off
echo %TIME%
.\emacs --batch -Q --eval "(kill-emacs)"
echo %TIME%

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs
16:39:46.31
16:39:48.73

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs
16:39:50.91
16:39:50.96

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs
16:39:51.32
16:39:51.37

I came up with a simple test case that reproduces in cmd the behavior I see when
I run Emacs from bash. I've reproduced the program below. Here, I've compiled
a.exe with -DSLOW:

C:\Users\dancol\edev\trunk.nox\src
type .\bench-emacs2.cmd
@echo off
%TMP%\a.exe emacs.exe
echo %TIME%
.\emacs --batch -Q --eval "(kill-emacs)"
echo %TIME%

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs2
Success
16:41:55.12
16:41:57.24

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs2
Success
16:41:57.62
16:41:59.69

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs2
Success
16:42:00.05
16:42:02.20

Here's the program that generates a.exe:

#define UNICODE 1
#define _UNICODE 1
#include <windows.h>
#include <stdio.h>

int
main(int argc, char* argv[])
{
HANDLE file;
HANDLE section;
PVOID view;
LARGE_INTEGER size;
BYTE Buffer[64*1024];
DWORD BytesRead;

file = CreateFileA(argv[1],
SYNCHRONIZE | GENERIC_READ,
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);

if (file == INVALID_HANDLE_VALUE) {
fprintf(stderr, "CreateFile: 0x%lx\n", GetLastError());
return 1;
}

if (!GetFileSizeEx(file, &size)) {
fprintf(stderr, "GetFileSizeEx: 0x%lx\n", GetLastError());
return 1;
}

if (size.QuadPart > 64*1024) {
size.LowPart = 64*1024;
}

#if defined FAST
if (!ReadFile(file, Buffer, sizeof (Buffer), &BytesRead, NULL)) {
fprintf(stderr, "ReadFile: 0x%lx\n", GetLastError());
}

printf("Read %lu bytes\n", BytesRead);
#elif defined SLOW
section = CreateFileMapping(file, NULL, PAGE_READONLY, 0, 64*1024, NULL);
if (!section) {
fprintf(stderr, "CreateFileMapping: 0x%lx\n", GetLastError());
return 1;
}
#else
#error Define FAST or SLOW
#endif

printf("Success\n");
return 0;
}

As you can see, a.exe merely creates a section object for emacs.exe; it doesn't
even map it into memory. Still, after running a.exe on emacs.exe, the system
reloads all emacs.exe's code pages the next time we run emacs.exe.

If we build a.exe with -DFAST instead of -DSLOW, then a.exe grabs the first 64k
of emacs.exe using ordinary, buffered ReadFile instead of trying to create a
section object. When compiled this way, a.exe seems to have no effect on Emacs
startup time:

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs
16:48:38.25
16:48:40.54

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs
16:48:42.03
16:48:42.08

C:\Users\dancol\edev\trunk.nox\src
.\bench-emacs
16:48:42.38
16:48:42.43

a.exe with -DSLOW mimics what av::fixup does when trying to determine whether an
executable is a Cygwin program. If av::fixup used ordinary ReadFile instead of
memory-mapped IO, program start performance would increase drastically, at least
for my workload.

I'm running 2K8R2. I'm not running any AV products, disk scanners, or other
exotic pieces of software. CYGWIN=detect_bloda reports nothing.

$ uname -a
CYGWIN_NT-6.1-WOW64 xyzzy 1.7.17(0.262/5/3) 2012-10-19 14:39 i686 Cygwin
Ryan Johnson
2012-12-11 01:36:23 UTC
Permalink
Post by Daniel Colascione
Emacs "make bootstrap" runs Emacs as a compiler, generating .elc files from .el
files. The build system runs Emacs once for each .el file we compile, of which
there are thousands. Now, Emacs takes about a two seconds to start on my system,
so compiling thousands of files takes a while; the actual .el to .elc
compilation is nearly instantaneous.
According to xperf, Emacs spends most of its startup time re-reading emacs.exe
code pages from disk.
<snip>
Post by Daniel Colascione
We shouldn't need to read this file more than once. After the first run, the
system should be able to read emacs.exe from the standby list, not the disk.
GCC bootstrap has *exactly* the same problem! Loading the binary from
./xgcc for each line of all those configure scripts takes longer than
everything else put together; I could never figure out what was wrong,
since the stage1 and stage2 binaries are "only" about 90MB, and
stripping (down to 25MB) didn't help at all.

I never reported it because I figured there was no way cygwin could make
Windows decide what files to cache or not cache... but it would
definitely be nice to see the problem go away.
<snip>
Post by Daniel Colascione
I came up with a simple test case that reproduces in cmd the behavior I see when
I run Emacs from bash. I've reproduced the program below. Here, I've compiled
I tried to test on my machine (w7 x64), but I can't get it to compile
(GetFileSizeEx not found at link time):

i686-pc-mingw32-gcc emacs-slow.c -DSLOW -lkernel32

I realize I'm probably just a windows compilation noob, but what command
did you use when compiling?

Ryan
Daniel Colascione
2012-12-11 01:41:34 UTC
Permalink
Post by Daniel Colascione
We shouldn't need to read this file more than once. After the first run, the
system should be able to read emacs.exe from the standby list, not the disk.
GCC bootstrap has *exactly* the same problem! Loading the binary from ./xgcc for
each line of all those configure scripts takes longer than everything else put
together; I could never figure out what was wrong, since the stage1 and stage2
binaries are "only" about 90MB, and stripping (down to 25MB) didn't help at all.
Right: stripping wouldn't affect the number of pages demand-faulted from the image.
I never reported it because I figured there was no way cygwin could make Windows
decide what files to cache or not cache... but it would definitely be nice to
see the problem go away.
This behavior is definitely surprising. It's sort of an evil-genie way to
provide cache coherency, I suppose.
<snip>
Post by Daniel Colascione
I came up with a simple test case that reproduces in cmd the behavior I see when
I run Emacs from bash. I've reproduced the program below. Here, I've compiled
I tried to test on my machine (w7 x64), but I can't get it to compile
i686-pc-mingw32-gcc emacs-slow.c -DSLOW -lkernel32
I just used i686-w64-mingw32-gcc. I didn't even need -lkernel32.
Ryan Johnson
2012-12-11 02:32:12 UTC
Permalink
Post by Daniel Colascione
Post by Daniel Colascione
We shouldn't need to read this file more than once. After the first run, the
system should be able to read emacs.exe from the standby list, not the disk.
GCC bootstrap has *exactly* the same problem! Loading the binary from ./xgcc for
each line of all those configure scripts takes longer than everything else put
together; I could never figure out what was wrong, since the stage1 and stage2
binaries are "only" about 90MB, and stripping (down to 25MB) didn't help at all.
Right: stripping wouldn't affect the number of pages demand-faulted from the image.
I never reported it because I figured there was no way cygwin could make Windows
decide what files to cache or not cache... but it would definitely be nice to
see the problem go away.
This behavior is definitely surprising. It's sort of an evil-genie way to
provide cache coherency, I suppose.
<snip>
Post by Daniel Colascione
I came up with a simple test case that reproduces in cmd the behavior I see when
I run Emacs from bash. I've reproduced the program below. Here, I've compiled
I tried to test on my machine (w7 x64), but I can't get it to compile
i686-pc-mingw32-gcc emacs-slow.c -DSLOW -lkernel32
I just used i686-w64-mingw32-gcc. I didn't even need -lkernel32.
OK, that compiles, but I can't repro:

$ time emacs-nox --batch -Q --eval '(kill-emacs)'

real 0m0.057s
user 0m0.031s
sys 0m0.015s

$ time emacs-nox --batch -Q --eval '(kill-emacs)'

real 0m0.058s
user 0m0.031s
sys 0m0.015s

$ time ./emacs-slow $(cygpath -wa $(which emacs-nox.exe))
Success

real 0m0.111s
user 0m0.000s
sys 0m0.060s

$ time emacs-nox --batch -Q --eval '(kill-emacs)'

real 0m0.058s
user 0m0.015s
sys 0m0.030s

$ time ./emacs-fast $(cygpath -wa $(which emacs-nox.exe))
Read 65536 bytes
Success

real 0m0.148s
user 0m0.015s
sys 0m0.075s

$ time emacs-nox --batch -Q --eval '(kill-emacs)'

real 0m0.058s
user 0m0.030s
sys 0m0.015s

Ideas?
Ryan
Daniel Colascione
2012-12-11 03:51:47 UTC
Permalink
Post by Ryan Johnson
Post by Daniel Colascione
I never reported it because I figured there was no way cygwin could make Windows
decide what files to cache or not cache... but it would definitely be nice to
see the problem go away.
This behavior is definitely surprising. It's sort of an evil-genie way to
provide cache coherency, I suppose.
<snip>
Post by Daniel Colascione
I came up with a simple test case that reproduces in cmd the behavior I see when
I run Emacs from bash. I've reproduced the program below. Here, I've compiled
I tried to test on my machine (w7 x64), but I can't get it to compile
i686-pc-mingw32-gcc emacs-slow.c -DSLOW -lkernel32
I just used i686-w64-mingw32-gcc. I didn't even need -lkernel32.
$ time emacs-nox --batch -Q --eval '(kill-emacs)'
real 0m0.057s
user 0m0.031s
sys 0m0.015s
$ time emacs-nox --batch -Q --eval '(kill-emacs)'
real 0m0.058s
user 0m0.031s
sys 0m0.015s
$ time ./emacs-slow $(cygpath -wa $(which emacs-nox.exe))
Success
real 0m0.111s
user 0m0.000s
sys 0m0.060s
$ time emacs-nox --batch -Q --eval '(kill-emacs)'
real 0m0.058s
user 0m0.015s
sys 0m0.030s
$ time ./emacs-fast $(cygpath -wa $(which emacs-nox.exe))
Read 65536 bytes
Success
real 0m0.148s
user 0m0.015s
sys 0m0.075s
$ time emacs-nox --batch -Q --eval '(kill-emacs)'
real 0m0.058s
user 0m0.030s
sys 0m0.015s
Ideas?
Ryan
I can't repro it with the stock emacs-nox or emacs-X11 either, even as
administrator. But I can consistently repro it with any Emacs I build. More
bizarrely, if I take an emacs binary I've build and hardlink it into /bin, I can
repro the problem. If I _copy_ the binary instead of hardlinking it, I can't
repro, even though the copy has the same hard link count and ACL as the
hardlinked version.

The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it. Maybe the process of
modifying the binary during unexec borks the system's ASLR cache; I can't
reproduce the problem with a simple testcase that copies and modifies the
binary, but my test doesn't actually change the binary --- it just writes back
to the destination binary exactly what we read from it.
Ryan Johnson
2012-12-11 14:52:57 UTC
Permalink
Post by Daniel Colascione
Post by Daniel Colascione
I never reported it because I figured there was no way cygwin could make Windows
decide what files to cache or not cache... but it would definitely be nice to
see the problem go away.
This behavior is definitely surprising. It's sort of an evil-genie way to
provide cache coherency, I suppose.
<snip>
Post by Daniel Colascione
I came up with a simple test case that reproduces in cmd the behavior I see when
I run Emacs from bash. I've reproduced the program below. Here, I've compiled
I tried to test on my machine (w7 x64), but I can't get it to compile
i686-pc-mingw32-gcc emacs-slow.c -DSLOW -lkernel32
I just used i686-w64-mingw32-gcc. I didn't even need -lkernel32.
I can't repro it with the stock emacs-nox or emacs-X11 either, even as
administrator. But I can consistently repro it with any Emacs I build. More
bizarrely, if I take an emacs binary I've build and hardlink it into /bin, I can
repro the problem. If I _copy_ the binary instead of hardlinking it, I can't
repro, even though the copy has the same hard link count and ACL as the
hardlinked version.
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it. Maybe the process of
modifying the binary during unexec borks the system's ASLR cache; I can't
reproduce the problem with a simple testcase that copies and modifies the
binary, but my test doesn't actually change the binary --- it just writes back
to the destination binary exactly what we read from it.
That doesn't explain why gcc stage1/2 compilers (in ./xgcc) would have
the problem. At least, I don't think gcc self-modifies when invoked from
./configure...

But you're right. Once the compiler has been officially installed
somewhere (not necessarily /usr/bin) it runs normally. Very weird.

Ryan
Daniel Colascione
2012-12-11 16:45:29 UTC
Permalink
Post by Ryan Johnson
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it. Maybe the process of
modifying the binary during unexec borks the system's ASLR cache; I can't
reproduce the problem with a simple testcase that copies and
modifies the
binary, but my test doesn't actually change the binary --- it just writes back
to the destination binary exactly what we read from it.
That doesn't explain why gcc stage1/2 compilers (in ./xgcc) would have
the problem. At least, I don't think gcc self-modifies when invoked
from ./configure...
But you're right. Once the compiler has been officially installed
somewhere (not necessarily /usr/bin) it runs normally. Very weird.
I don't have an explanation, but one workaround would be to patch the
gcc and Emacs configuration scripts to copy their bootstrap binaries
before using them. (I'd rather just have av::fixup use buffered IO,
though, since that workaround would apply generally.)
Daniel Colascione
2012-12-12 01:06:24 UTC
Permalink
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Daniel Colascione
2012-12-12 03:13:04 UTC
Permalink
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.

Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse. Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.


Index: fhandler.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/fhandler.cc,v
retrieving revision 1.429
diff -u -r1.429 fhandler.cc
--- fhandler.cc 16 Aug 2012 23:34:43 -0000 1.429
+++ fhandler.cc 12 Dec 2012 03:10:31 -0000
@@ -821,32 +821,6 @@
FILE_POSITION_INFORMATION fpi;
FILE_STANDARD_INFORMATION fsi;

- if (did_lseek ())
- {
- did_lseek (false); /* don't do it again */
-
- if (!(get_flags () & O_APPEND)
- && NT_SUCCESS (NtQueryInformationFile (get_output_handle (),
- &io, &fsi, sizeof fsi,
- FileStandardInformation))
- && NT_SUCCESS (NtQueryInformationFile (get_output_handle (),
- &io, &fpi, sizeof fpi,
- FilePositionInformation))
- && fpi.CurrentByteOffset.QuadPart
- >= fsi.EndOfFile.QuadPart + (128 * 1024)
- && (pc.fs_flags () & FILE_SUPPORTS_SPARSE_FILES))
- {
- /* If the file system supports sparse files and the application
- is writing after a long seek beyond EOF, convert the file to
- a sparse file. */
- NTSTATUS status;
- status = NtFsControlFile (get_output_handle (), NULL, NULL, NULL,
- &io, FSCTL_SET_SPARSE, NULL, 0, NULL, 0);
- debug_printf ("%p = NtFsControlFile(%S, FSCTL_SET_SPARSE)",
- status, pc.get_nt_native_path ());
- }
- }
-
if (wbinary ())
res = raw_write (ptr, len);
else
@@ -1069,10 +1043,6 @@
}
_off64_t res = fpi.CurrentByteOffset.QuadPart;

- /* When next we write(), we will check to see if *this* seek went beyond
- the end of the file and if so, potentially sparsify the file. */
- did_lseek (true);
-
/* If this was a SEEK_CUR with offset 0, we still might have
readahead that we have to take into account when calculating
the actual position for the application. */
Index: fhandler.h
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/fhandler.h,v
retrieving revision 1.475
diff -u -r1.475 fhandler.h
--- fhandler.h 13 Oct 2012 12:34:17 -0000 1.475
+++ fhandler.h 12 Dec 2012 03:10:31 -0000
@@ -138,10 +138,6 @@
unsigned wbinary : 1; /* binary write mode */
unsigned wbinset : 1; /* binary write mode explicitly set */
unsigned nohandle : 1; /* No handle associated with fhandler. */
- unsigned did_lseek : 1; /* set when lseek is called as a flag that
- _write should check if we've moved
- beyond EOF, zero filling or making
- file sparse if so. */
unsigned query_open : 3; /* open file without requesting either
read or write access */
unsigned close_on_exec : 1; /* close-on-exec */
@@ -151,7 +147,7 @@
public:
status_flags () :
rbinary (0), rbinset (0), wbinary (0), wbinset (0), nohandle (0),
- did_lseek (0), query_open (no_query), close_on_exec (0),
+ query_open (no_query), close_on_exec (0),
need_fork_fixup (0), isclosed (0)
{}
} status, open_status;
@@ -243,7 +239,6 @@
IMPLEMENT_STATUS_FLAG (bool, wbinset)
IMPLEMENT_STATUS_FLAG (bool, rbinset)
IMPLEMENT_STATUS_FLAG (bool, nohandle)
- IMPLEMENT_STATUS_FLAG (bool, did_lseek)
IMPLEMENT_STATUS_FLAG (query_state, query_open)
IMPLEMENT_STATUS_FLAG (bool, close_on_exec)
IMPLEMENT_STATUS_FLAG (bool, need_fork_fixup)
Corinna Vinschen
2012-12-12 09:32:07 UTC
Permalink
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch?
It's ok with me to remove this code, but there's a small problem. On
2012-08-17, you wrote off-list, that you're going to send the signed
copyright assignment form. I never got the ok from my manager. Did you
miss to send the CA, or did my manager miss to inform me?


Corinna

P.S: A patch also needs a ChangeLog entry...
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-12-12 16:02:02 UTC
Permalink
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch?
It's ok with me to remove this code, but there's a small problem. On
2012-08-17, you wrote off-list, that you're going to send the signed
copyright assignment form. I never got the ok from my manager. Did you
miss to send the CA, or did my manager miss to inform me?
Honest question: can the deletion of someone else's code actually be
copyrighted or claimed as IP? Or is the problem that the patch itself is
copyrighted?

I only wonder because---ignoring deletions---the patch changes precisely
one line of code, which I would have thought was small enough not to
need copyright assignment.

Ryan
Daniel Colascione
2012-12-12 21:03:00 UTC
Permalink
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch?
It's ok with me to remove this code
On second thought, this patch probably isn't the best idea. Windows might fix
this behavior. Maybe it's better to add a per-OS-version flag.
Post by Corinna Vinschen
, but there's a small problem. On
2012-08-17, you wrote off-list, that you're going to send the signed
copyright assignment form. I never got the ok from my manager. Did you
miss to send the CA, or did my manager miss to inform me?
I did send it. Can you check with management? If it never arrived, I can send
another copy easily enough.
Corinna Vinschen
2012-12-13 08:29:42 UTC
Permalink
Post by Daniel Colascione
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch?
It's ok with me to remove this code
On second thought, this patch probably isn't the best idea. Windows might fix
this behavior. Maybe it's better to add a per-OS-version flag.
Post by Corinna Vinschen
, but there's a small problem. On
2012-08-17, you wrote off-list, that you're going to send the signed
copyright assignment form. I never got the ok from my manager. Did you
miss to send the CA, or did my manager miss to inform me?
I did send it. Can you check with management? If it never arrived, I can send
another copy easily enough.
I'll check.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2013-01-09 14:28:30 UTC
Permalink
Hi Daniel,
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch?
It's ok with me to remove this code
On second thought, this patch probably isn't the best idea. Windows might fix
this behavior. Maybe it's better to add a per-OS-version flag.
Post by Corinna Vinschen
, but there's a small problem. On
2012-08-17, you wrote off-list, that you're going to send the signed
copyright assignment form. I never got the ok from my manager. Did you
miss to send the CA, or did my manager miss to inform me?
I did send it. Can you check with management? If it never arrived, I can send
another copy easily enough.
I'll check.
my manager searched and asked everywhere he could think of, but he
didn't find your CA. Sorry for the hassle, but would you mind to send
it again?


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2013-11-28 16:56:15 UTC
Permalink
Hi Daniel,
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Corinna Vinschen
, but there's a small problem. On
2012-08-17, you wrote off-list, that you're going to send the signed
copyright assignment form. I never got the ok from my manager. Did you
miss to send the CA, or did my manager miss to inform me?
I did send it. Can you check with management? If it never arrived, I can send
another copy easily enough.
I'll check.
my manager searched and asked everywhere he could think of, but he
didn't find your CA. Sorry for the hassle, but would you mind to send
it again?
you never followed up on this. As you might have seen, we imported the
posix_spawn implementation from newlib, using fork/execve, into Cygwin
for the time being, so, even if it's slow, we can at least provide the
API already.

Are you still interested to work on integrating posix_spawn into Cygwin
using the underlying child_info_spawn class? If so, we would be happy
to get another copyright assignment from you.


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Daniel Colascione
2013-12-03 16:17:20 UTC
Permalink
Hi Corinna,
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Corinna Vinschen
, but there's a small problem. On
2012-08-17, you wrote off-list, that you're going to send the signed
copyright assignment form. I never got the ok from my manager. Did you
miss to send the CA, or did my manager miss to inform me?
I did send it. Can you check with management? If it never arrived, I can send
another copy easily enough.
I'll check.
my manager searched and asked everywhere he could think of, but he
didn't find your CA. Sorry for the hassle, but would you mind to send
it again?
you never followed up on this. As you might have seen, we imported the
posix_spawn implementation from newlib, using fork/execve, into Cygwin
for the time being, so, even if it's slow, we can at least provide the
API already.
Are you still interested to work on integrating posix_spawn into Cygwin
using the underlying child_info_spawn class? If so, we would be happy
to get another copyright assignment from you.
I am still interested; I may have some time in the near future to do the
work. In the meantime, I'll try to send another copyright assignment form.
Corinna Vinschen
2013-12-03 16:35:04 UTC
Permalink
Post by Daniel Colascione
Hi Corinna,
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Corinna Vinschen
, but there's a small problem. On
2012-08-17, you wrote off-list, that you're going to send the signed
copyright assignment form. I never got the ok from my manager. Did you
miss to send the CA, or did my manager miss to inform me?
I did send it. Can you check with management? If it never arrived, I can send
another copy easily enough.
I'll check.
my manager searched and asked everywhere he could think of, but he
didn't find your CA. Sorry for the hassle, but would you mind to send
it again?
you never followed up on this. As you might have seen, we imported the
posix_spawn implementation from newlib, using fork/execve, into Cygwin
for the time being, so, even if it's slow, we can at least provide the
API already.
Are you still interested to work on integrating posix_spawn into Cygwin
using the underlying child_info_spawn class? If so, we would be happy
to get another copyright assignment from you.
I am still interested; I may have some time in the near future to do
the work. In the meantime, I'll try to send another copyright
assignment form.
Good, we're looking forward to it. I'll inform my manager to keep an
eye on your CA, this time :}


Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Maintainer cygwin AT cygwin DOT com
Red Hat
Eric Blake
2012-12-12 13:11:28 UTC
Permalink
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse.
Eww. That would be a regression for coreutils, and a waste of disk
space for files where sparse is a benefit.
Post by Daniel Colascione
Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
Why can't we instead use posix_fallocate() as a means of identifying a
file that must not be sparse, and then just patch the compiler to use
posix_fallocate() to never generate a sparse executable (but let all
other sparse files continue to behave as normal)?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Corinna Vinschen
2012-12-12 13:22:37 UTC
Permalink
Post by Daniel Colascione
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse.
Eww. That would be a regression for coreutils, [...]
Really? How so?
Post by Daniel Colascione
Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
Why can't we instead use posix_fallocate() as a means of identifying a
file that must not be sparse, and then just patch the compiler to use
posix_fallocate() to never generate a sparse executable (but let all
other sparse files continue to behave as normal)?
posix_fallocate is not allowed to generate sparse files, due to the
following restriction:

"If posix_fallocate() returns successfully, subsequent writes to the
specified file data shall not fail due to the lack of free space on
the file system storage media."

See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html

Therefore only ftruncate and lseek potentially generate sparse files.

On second thought, I don't quite understand what you mean by "use
posix_fallocate() as a means of identifying a file that must not be
sparse". Can you explain, please?


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Eric Blake
2012-12-12 14:04:23 UTC
Permalink
Post by Corinna Vinschen
Post by Daniel Colascione
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse.
Eww. That would be a regression for coreutils, [...]
Really? How so?
When using 'cp --sparse=always', coreutils relies on lseek() to create
sparse files. Removing this code from cygwin would mean that coreutils
now has to be rewritten to explicitly ftruncate() instead of lseek() for
creating sparse files.
Post by Corinna Vinschen
Post by Daniel Colascione
Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
Why can't we instead use posix_fallocate() as a means of identifying a
file that must not be sparse, and then just patch the compiler to use
posix_fallocate() to never generate a sparse executable (but let all
other sparse files continue to behave as normal)?
posix_fallocate is not allowed to generate sparse files, due to the
"If posix_fallocate() returns successfully, subsequent writes to the
specified file data shall not fail due to the lack of free space on
the file system storage media."
See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html
Therefore only ftruncate and lseek potentially generate sparse files.
On second thought, I don't quite understand what you mean by "use
posix_fallocate() as a means of identifying a file that must not be
sparse". Can you explain, please?
Since we know that an executable must NOT be sparse in order to make it
more efficient with the Windows loader, then gcc should use
posix_fallocate() to guarantee that the file is NOT sparse, even if it
happens to issue a sequence of lseek() that would default to making it
sparse without the fallocate.

In other words, I'm proposing that we delete nothing from cygwin1.dll,
and instead fix the problem apps (gcc, emacs unexec) that actually
create executables, so that the files they create are non-sparse because
we have proven that they should not be sparse for performance reasons.
Meanwhile, all non-executable files (such as virtual machine disk
images, which are typically much bigger than executables, and where
being sparse really does matter) do not have to jump through extra hoops
of using ftruncate() when plain lseek() would do to keep them sparse.

Oh, and while I'm thinking about it, it would be nice to copy Linux'
fallocate(FALLOC_FL_PUNCH_HOLE) for punching holes into already-existing
files, rather than only being able to create holes by sequentially
building a file with each new hole possible only as the file size is
extended.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Corinna Vinschen
2012-12-12 14:31:01 UTC
Permalink
Post by Eric Blake
Post by Corinna Vinschen
Eww. That would be a regression for coreutils, [...]
Really? How so?
When using 'cp --sparse=always', coreutils relies on lseek() to create
sparse files. Removing this code from cygwin would mean that coreutils
now has to be rewritten to explicitly ftruncate() instead of lseek() for
creating sparse files.
On Cygwin only or on Linux as well?
Post by Eric Blake
Post by Corinna Vinschen
Why can't we instead use posix_fallocate() as a means of identifying a
file that must not be sparse, and then just patch the compiler to use
posix_fallocate() to never generate a sparse executable (but let all
other sparse files continue to behave as normal)?
posix_fallocate is not allowed to generate sparse files, due to the
"If posix_fallocate() returns successfully, subsequent writes to the
specified file data shall not fail due to the lack of free space on
the file system storage media."
See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html
Therefore only ftruncate and lseek potentially generate sparse files.
On second thought, I don't quite understand what you mean by "use
posix_fallocate() as a means of identifying a file that must not be
sparse". Can you explain, please?
Since we know that an executable must NOT be sparse in order to make it
more efficient with the Windows loader, then gcc should use
posix_fallocate() to guarantee that the file is NOT sparse, even if it
happens to issue a sequence of lseek() that would default to making it
sparse without the fallocate.
In other words, I'm proposing that we delete nothing from cygwin1.dll,
and instead fix the problem apps (gcc, emacs unexec) that actually
create executables, so that the files they create are non-sparse because
we have proven that they should not be sparse for performance reasons.
Meanwhile, all non-executable files (such as virtual machine disk
images, which are typically much bigger than executables, and where
being sparse really does matter) do not have to jump through extra hoops
of using ftruncate() when plain lseek() would do to keep them sparse.
Couldn't Devil's advocate also argue that coreutils are wrong?
Post by Eric Blake
Oh, and while I'm thinking about it, it would be nice to copy Linux'
fallocate(FALLOC_FL_PUNCH_HOLE) for punching holes into already-existing
files, rather than only being able to create holes by sequentially
building a file with each new hole possible only as the file size is
extended.
Hmm, that might be possible by utilising the FSCTL_SET_SPARSE and
FSCTL_SET_ZERO_DATA DeviceIoControl codes. However, we don't export
fallocate at all right now. This is a clear case of PHC(*)


Corinna

(*) Patches happily considered.
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Eric Blake
2012-12-12 15:16:53 UTC
Permalink
Post by Corinna Vinschen
Post by Eric Blake
Post by Corinna Vinschen
Eww. That would be a regression for coreutils, [...]
Really? How so?
When using 'cp --sparse=always', coreutils relies on lseek() to create
sparse files. Removing this code from cygwin would mean that coreutils
now has to be rewritten to explicitly ftruncate() instead of lseek() for
creating sparse files.
On Cygwin only or on Linux as well?
On cygwin only.
Post by Corinna Vinschen
Post by Eric Blake
of using ftruncate() when plain lseek() would do to keep them sparse.
Couldn't Devil's advocate also argue that coreutils are wrong?
If ftruncate() is the only way on cygwin to make a sparse file, I
suppose coreutils could adapt.
Post by Corinna Vinschen
Post by Eric Blake
Oh, and while I'm thinking about it, it would be nice to copy Linux'
fallocate(FALLOC_FL_PUNCH_HOLE) for punching holes into already-existing
files, rather than only being able to create holes by sequentially
building a file with each new hole possible only as the file size is
extended.
Hmm, that might be possible by utilising the FSCTL_SET_SPARSE and
FSCTL_SET_ZERO_DATA DeviceIoControl codes. However, we don't export
fallocate at all right now. This is a clear case of PHC(*)
Corinna
(*) Patches happily considered.
Yep, I thought as much on this one :)
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Corinna Vinschen
2012-12-12 15:37:17 UTC
Permalink
Post by Eric Blake
Post by Corinna Vinschen
Post by Eric Blake
Post by Corinna Vinschen
Eww. That would be a regression for coreutils, [...]
Really? How so?
When using 'cp --sparse=always', coreutils relies on lseek() to create
sparse files. Removing this code from cygwin would mean that coreutils
now has to be rewritten to explicitly ftruncate() instead of lseek() for
creating sparse files.
On Cygwin only or on Linux as well?
On cygwin only.
Oh, erm... I asked the wrong question, apparently. What I meant to ask
was if cp uses lseek-only on Cygwin only. But, no, it uses lseek in the
general case and expects the attempt to sparsify the file to be honored.
Ok.
Post by Eric Blake
Post by Corinna Vinschen
Post by Eric Blake
of using ftruncate() when plain lseek() would do to keep them sparse.
Couldn't Devil's advocate also argue that coreutils are wrong?
If ftruncate() is the only way on cygwin to make a sparse file, I
suppose coreutils could adapt.
I guess so, but I was more interested to try to do the right thing in
Cygwin in the first place. Given cp's behaviour, and reading up on the
issue (again) seems to imply that Cygwin is basically doing the right
thing. The crude check for a minimum 128K hole was already a trade off
between getting automatic sparse files, as on ext FS, and Windows' NTFS
driver slowness in handling sparse files.

FTR: Given this slowness, there might be some value in having a mount
option to suppress sparse file generation.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-12-12 15:39:53 UTC
Permalink
Post by Eric Blake
Post by Corinna Vinschen
Post by Eric Blake
Post by Daniel Colascione
Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
Why can't we instead use posix_fallocate() as a means of identifying a
file that must not be sparse, and then just patch the compiler to use
posix_fallocate() to never generate a sparse executable (but let all
other sparse files continue to behave as normal)?
posix_fallocate is not allowed to generate sparse files, due to the
following restriction: "If posix_fallocate() returns successfully,
subsequent writes to the specified file data shall not fail due to
the lack of free space on the file system storage media." See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html
Therefore only ftruncate and lseek potentially generate sparse files.
On second thought, I don't quite understand what you mean by "use
posix_fallocate() as a means of identifying a file that must not be
sparse". Can you explain, please?
Since we know that an executable must NOT be sparse in order to make it
more efficient with the Windows loader, then gcc should use
posix_fallocate() to guarantee that the file is NOT sparse, even if it
happens to issue a sequence of lseek() that would default to making it
sparse without the fallocate.
In other words, I'm proposing that we delete nothing from cygwin1.dll,
and instead fix the problem apps (gcc, emacs unexec) that actually
create executables, so that the files they create are non-sparse because
we have proven that they should not be sparse for performance reasons.
Meanwhile, all non-executable files (such as virtual machine disk
images, which are typically much bigger than executables, and where
being sparse really does matter) do not have to jump through extra hoops
of using ftruncate() when plain lseek() would do to keep them sparse.
Does gcc/ld/whatever know the final file size before the first write?

You have to posix_fallocate the entire file before any write that might
create a hole, because the sparse flag poisons the loader, and persists
even if all gaps are later filled. For example, if I invoke the
following commands:

cp --sparse=always $(which emacs-nox) sparse
cp --sparse=never $(which emacs-nox) dense
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
cp --sparse=never dense sparse
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
du dense sparse
Post by Eric Blake
sparse
real 0m1.791s
dense
real 0m0.606s
sparse
real 0m3.158s
dense
real 0m0.081s
16728 dense
16768 sparse
Given that we're talking about cygwin-specific patches for emacs and
binutils anyway, would it be better to add a cygwin-specific fcntl call
that clears the file's sparse flag?

Ryan
Corinna Vinschen
2012-12-12 15:56:07 UTC
Permalink
Post by Ryan Johnson
Post by Eric Blake
In other words, I'm proposing that we delete nothing from cygwin1.dll,
and instead fix the problem apps (gcc, emacs unexec) that actually
create executables, so that the files they create are non-sparse because
we have proven that they should not be sparse for performance reasons.
Meanwhile, all non-executable files (such as virtual machine disk
images, which are typically much bigger than executables, and where
being sparse really does matter) do not have to jump through extra hoops
of using ftruncate() when plain lseek() would do to keep them sparse.
Does gcc/ld/whatever know the final file size before the first write?
You have to posix_fallocate the entire file before any write that
might create a hole, because the sparse flag poisons the loader, and
persists even if all gaps are later filled. For example, if I invoke
cp --sparse=always $(which emacs-nox) sparse
cp --sparse=never $(which emacs-nox) dense
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
cp --sparse=never dense sparse
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
du dense sparse
Post by Eric Blake
sparse
real 0m1.791s
dense
real 0m0.606s
sparse
real 0m3.158s
dense
real 0m0.081s
16728 dense
16768 sparse
Given that we're talking about cygwin-specific patches for emacs and
binutils anyway, would it be better to add a cygwin-specific fcntl
call that clears the file's sparse flag?
This is not supported in pre-Vista OSes.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-12-12 16:02:28 UTC
Permalink
Post by Ryan Johnson
Post by Eric Blake
Post by Corinna Vinschen
Post by Eric Blake
Post by Daniel Colascione
Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
Why can't we instead use posix_fallocate() as a means of identifying a
file that must not be sparse, and then just patch the compiler to use
posix_fallocate() to never generate a sparse executable (but let all
other sparse files continue to behave as normal)?
posix_fallocate is not allowed to generate sparse files, due to the
following restriction: "If posix_fallocate() returns successfully,
subsequent writes to the specified file data shall not fail due to
the lack of free space on the file system storage media." See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html
Therefore only ftruncate and lseek potentially generate sparse
files. On second thought, I don't quite understand what you mean by
"use posix_fallocate() as a means of identifying a file that must
not be sparse". Can you explain, please?
Since we know that an executable must NOT be sparse in order to make it
more efficient with the Windows loader, then gcc should use
posix_fallocate() to guarantee that the file is NOT sparse, even if it
happens to issue a sequence of lseek() that would default to making it
sparse without the fallocate.
In other words, I'm proposing that we delete nothing from cygwin1.dll,
and instead fix the problem apps (gcc, emacs unexec) that actually
create executables, so that the files they create are non-sparse because
we have proven that they should not be sparse for performance reasons.
Meanwhile, all non-executable files (such as virtual machine disk
images, which are typically much bigger than executables, and where
being sparse really does matter) do not have to jump through extra hoops
of using ftruncate() when plain lseek() would do to keep them sparse.
Does gcc/ld/whatever know the final file size before the first write?
You have to posix_fallocate the entire file before any write that
might create a hole, because the sparse flag poisons the loader, and
persists even if all gaps are later filled.
Heh... hit send too soon.

Alternatively, a quick experiment verifies that calling pwrite instead
of lseek+write bypasses the sparse-ifying "optimization." If emacs and
binutils always seek before writing, it might be as simple as patching
them to use pwrite instead. That would even improve performance on other
platforms with pwrite, by cutting the syscall count in half.

A quick scan of binutils sources suggests that all*** file writes go
through libiberty/simple-object.c:simple_object_internal_write, which
indeed uses an lseek+write pair. As long as gcc uses libiberty to write
out executables as well, it should pick up the fix automatically.

The emacs (24.0.96) unexecw.c copies the entire executable file over
using read/write pairs (= not sparse), and then patches the output using
seek/write pairs. Again, an easy conversion to use pread instead.

*** I'm actually not sure what gold does, but we don't care because it
doesn't target cygwin anyway.

Ryan
Eric Blake
2012-12-12 16:47:23 UTC
Permalink
Post by Ryan Johnson
Does gcc/ld/whatever know the final file size before the first write?
No, but does it need to? posix_fallocate() does not change file
contents; it merely says that anywhere there was previously a hole must
now be guaranteed to be backed by disk. So gcc would write the file as
usual, and then just before close()ing the fd, do a final
posix_fallocate(fd, 0, len) with len determined by the final file size.
Post by Ryan Johnson
You have to posix_fallocate the entire file before any write that might
create a hole, because the sparse flag poisons the loader,
Is there really a flag stuck into the file when it becomes sparse?
Post by Ryan Johnson
and persists
even if all gaps are later filled. For example, if I invoke the
cp --sparse=always $(which emacs-nox) sparse
cp --sparse=never $(which emacs-nox) dense
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
cp --sparse=never dense sparse
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
du dense sparse
This doesn't point to a flag in the file, so much as cached information
(the file system is remembering that 'sparse' used to be sparse, even if
it is no longer sparse). But your point about a file being cached at
some point while it is sparse, even if it is later made non-sparse, is
interesting.
Post by Ryan Johnson
Post by Daniel Colascione
sparse
real 0m1.791s
dense
real 0m0.606s
sparse
real 0m3.158s
dense
real 0m0.081s
16728 dense
16768 sparse
Given that we're talking about cygwin-specific patches for emacs and
binutils anyway, would it be better to add a cygwin-specific fcntl call
that clears the file's sparse flag?
What flag is there to clear? Your cp demonstration showed that even
when we do a byte-for-byte copy of every byte (and the file is
non-sparse), the file system cache remembers that it used to be sparse.
How do we defeat that file system cache?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Ryan Johnson
2012-12-12 16:57:09 UTC
Permalink
Post by Eric Blake
Post by Ryan Johnson
Does gcc/ld/whatever know the final file size before the first write?
No, but does it need to? posix_fallocate() does not change file
contents; it merely says that anywhere there was previously a hole must
now be guaranteed to be backed by disk. So gcc would write the file as
usual, and then just before close()ing the fd, do a final
posix_fallocate(fd, 0, len) with len determined by the final file size.
Post by Ryan Johnson
You have to posix_fallocate the entire file before any write that might
create a hole, because the sparse flag poisons the loader,
Is there really a flag stuck into the file when it becomes sparse?
Post by Ryan Johnson
and persists
even if all gaps are later filled. For example, if I invoke the
cp --sparse=always $(which emacs-nox) sparse
cp --sparse=never $(which emacs-nox) dense
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
cp --sparse=never dense sparse
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
du dense sparse
This doesn't point to a flag in the file, so much as cached information
(the file system is remembering that 'sparse' used to be sparse, even if
it is no longer sparse). But your point about a file being cached at
some point while it is sparse, even if it is later made non-sparse, is
interesting.
Post by Ryan Johnson
Post by Daniel Colascione
sparse
real 0m1.791s
dense
real 0m0.606s
sparse
real 0m3.158s
dense
real 0m0.081s
16728 dense
16768 sparse
Given that we're talking about cygwin-specific patches for emacs and
binutils anyway, would it be better to add a cygwin-specific fcntl call
that clears the file's sparse flag?
What flag is there to clear? Your cp demonstration showed that even
when we do a byte-for-byte copy of every byte (and the file is
non-sparse), the file system cache remembers that it used to be sparse.
How do we defeat that file system cache?
See [1] and $2.3.59 of [2]. There is a metadata flag associated with the
file itself, independent of the file's contents. You can even truncate a
sparse file to zero bytes, then grow it back out with dd, and it will
remain "sparse."

[1] http://msdn.microsoft.com/en-us/library/aa365564.aspx
[2]
http://download.microsoft.com/download/9/5/E/95EF66AF-9026-4BB0-A41D-A4F81802D92C/%5BMS-FSCC%5D.pdf

Ryan
Corinna Vinschen
2012-12-12 16:58:27 UTC
Permalink
Post by Eric Blake
Post by Ryan Johnson
Does gcc/ld/whatever know the final file size before the first write?
No, but does it need to? posix_fallocate() does not change file
contents; it merely says that anywhere there was previously a hole must
now be guaranteed to be backed by disk. So gcc would write the file as
usual, and then just before close()ing the fd, do a final
posix_fallocate(fd, 0, len) with len determined by the final file size.
Post by Ryan Johnson
You have to posix_fallocate the entire file before any write that might
create a hole, because the sparse flag poisons the loader,
Is there really a flag stuck into the file when it becomes sparse?
Yes. And, as I wrote, you can't remove it pre-Vista.
Post by Eric Blake
Post by Ryan Johnson
cp --sparse=never $(which emacs-nox) dense
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
cp --sparse=never dense sparse
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
du dense sparse
This doesn't point to a flag in the file, so much as cached information
(the file system is remembering that 'sparse' used to be sparse, even if
it is no longer sparse). But your point about a file being cached at
some point while it is sparse, even if it is later made non-sparse, is
interesting.
Post by Ryan Johnson
Post by Daniel Colascione
sparse
real 0m1.791s
dense
real 0m0.606s
sparse
real 0m3.158s
dense
real 0m0.081s
16728 dense
16768 sparse
Given that we're talking about cygwin-specific patches for emacs and
binutils anyway, would it be better to add a cygwin-specific fcntl call
that clears the file's sparse flag?
What flag is there to clear? Your cp demonstration showed that even
when we do a byte-for-byte copy of every byte (and the file is
non-sparse), the file system cache remembers that it used to be sparse.
How do we defeat that file system cache?
Another question is, is that behaviour reproducible? Does it happen the
second time the "new" non-sparse sparse file is called? You don't even
know if the slowness is a result of writing the file is still in flight.
Windows caching can be pretty slow at times, but it recovers quickly
if a file is used again, usually.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-12-12 17:06:08 UTC
Permalink
Post by Corinna Vinschen
Post by Eric Blake
Post by Ryan Johnson
Does gcc/ld/whatever know the final file size before the first write?
No, but does it need to? posix_fallocate() does not change file
contents; it merely says that anywhere there was previously a hole must
now be guaranteed to be backed by disk. So gcc would write the file as
usual, and then just before close()ing the fd, do a final
posix_fallocate(fd, 0, len) with len determined by the final file size.
Post by Ryan Johnson
You have to posix_fallocate the entire file before any write that might
create a hole, because the sparse flag poisons the loader,
Is there really a flag stuck into the file when it becomes sparse?
Yes. And, as I wrote, you can't remove it pre-Vista.
Post by Eric Blake
Post by Ryan Johnson
cp --sparse=never $(which emacs-nox) dense
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
cp --sparse=never dense sparse
for f in sparse dense; do echo $f; time ./$f -Q --batch --eval
'(kill-emacs)'; done
du dense sparse
This doesn't point to a flag in the file, so much as cached information
(the file system is remembering that 'sparse' used to be sparse, even if
it is no longer sparse). But your point about a file being cached at
some point while it is sparse, even if it is later made non-sparse, is
interesting.
Post by Ryan Johnson
Post by Daniel Colascione
sparse
real 0m1.791s
dense
real 0m0.606s
sparse
real 0m3.158s
dense
real 0m0.081s
16728 dense
16768 sparse
Given that we're talking about cygwin-specific patches for emacs and
binutils anyway, would it be better to add a cygwin-specific fcntl call
that clears the file's sparse flag?
What flag is there to clear? Your cp demonstration showed that even
when we do a byte-for-byte copy of every byte (and the file is
non-sparse), the file system cache remembers that it used to be sparse.
How do we defeat that file system cache?
Another question is, is that behaviour reproducible? Does it happen the
second time the "new" non-sparse sparse file is called? You don't even
know if the slowness is a result of writing the file is still in flight.
Windows caching can be pretty slow at times, but it recovers quickly
if a file is used again, usually.
It's painfully reproducible. It takes nearly two hours for a gcc
bootstrap compiler to configure the various bits of the next stage. It's
the same for emacs unexec (as OP reported).

I've seen how slow the cache is, it can take up to a minute before du
reports the actual number of pages in a freshly-copied sparse file. I
thought cp --sparse=always had a bug at first...

Even after du stabilizes, though, the slow loading persists
indefinitely. It doesn't matter how many times or how recently the
binary was last executed, you'll still pay the full cost to pull it off
disk again, easily confirmed with Resource Monitor (the same file being
read by umpteen different processes simultaneously).

$ for i in $(seq 20); do time ./sparse -Q --batch --eval '(kill-emacs)';
done 2>&1 | grep real | awk '{print $2}'
Post by Corinna Vinschen
0m1.714s
0m1.548s
0m1.588s
0m1.570s
0m1.528s
0m1.563s
0m1.512s
0m1.676s
0m1.638s
0m1.663s
0m1.533s
0m1.567s
0m1.466s
0m1.669s
0m1.575s
0m1.489s
0m1.658s
0m1.497s
0m1.515s
0m1.541s
Ryan




Ryan
Christopher Faylor
2012-12-12 17:11:14 UTC
Permalink
Post by Ryan Johnson
It's painfully reproducible. It takes nearly two hours for a gcc
bootstrap compiler to configure the various bits of the next stage. It's
the same for emacs unexec (as OP reported).
I'd like to see a small controlled test case which demonstrates the
problem. If the claims here are all true then it should be very easy to
demonstrate without resorting to bootstrapping the compiler.

And, given my comment about setup.exe, I suspect that this isn't going
to be as alarming an issue for the normal Cygwin user as it is for
people who, e.g., rebuild their own compilers. I'll bet setup.exe
doesn't know anything about sparse files so all of the executables
and dll should not be sparse.

Can anyone confirm or deny that?

cgf
Corinna Vinschen
2012-12-12 17:15:02 UTC
Permalink
Post by Christopher Faylor
Post by Ryan Johnson
It's painfully reproducible. It takes nearly two hours for a gcc
bootstrap compiler to configure the various bits of the next stage. It's
the same for emacs unexec (as OP reported).
I'd like to see a small controlled test case which demonstrates the
problem. If the claims here are all true then it should be very easy to
demonstrate without resorting to bootstrapping the compiler.
And, given my comment about setup.exe, I suspect that this isn't going
to be as alarming an issue for the normal Cygwin user as it is for
people who, e.g., rebuild their own compilers.
Which would speak for adding a "nosparse" mount option.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2012-12-12 17:27:33 UTC
Permalink
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Ryan Johnson
It's painfully reproducible. It takes nearly two hours for a gcc
bootstrap compiler to configure the various bits of the next stage. It's
the same for emacs unexec (as OP reported).
I'd like to see a small controlled test case which demonstrates the
problem. If the claims here are all true then it should be very easy to
demonstrate without resorting to bootstrapping the compiler.
And, given my comment about setup.exe, I suspect that this isn't going
to be as alarming an issue for the normal Cygwin user as it is for
people who, e.g., rebuild their own compilers.
Which would speak for adding a "nosparse" mount option.
...or not. The problem is not to add the mount option, but to make sure
people know that option and use it when "cygwin is slow". This rather
speaks for making nosparse the default and "sparse" the option, I guess.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2012-12-12 19:19:08 UTC
Permalink
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Ryan Johnson
It's painfully reproducible. It takes nearly two hours for a gcc
bootstrap compiler to configure the various bits of the next stage. It's
the same for emacs unexec (as OP reported).
I'd like to see a small controlled test case which demonstrates the
problem. If the claims here are all true then it should be very easy to
demonstrate without resorting to bootstrapping the compiler.
And, given my comment about setup.exe, I suspect that this isn't going
to be as alarming an issue for the normal Cygwin user as it is for
people who, e.g., rebuild their own compilers.
Which would speak for adding a "nosparse" mount option.
...or not. The problem is not to add the mount option, but to make sure
people know that option and use it when "cygwin is slow". This rather
speaks for making nosparse the default and "sparse" the option, I guess.
Actually, if the assertions here are correct then wouldn't mounting /bin
as "cygexec" also make things better?

(Yes, I know that this screws up mingw binaries like strace and cygcheck)

cgf
Corinna Vinschen
2012-12-12 19:35:56 UTC
Permalink
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Ryan Johnson
It's painfully reproducible. It takes nearly two hours for a gcc
bootstrap compiler to configure the various bits of the next stage. It's
the same for emacs unexec (as OP reported).
I'd like to see a small controlled test case which demonstrates the
problem. If the claims here are all true then it should be very easy to
demonstrate without resorting to bootstrapping the compiler.
And, given my comment about setup.exe, I suspect that this isn't going
to be as alarming an issue for the normal Cygwin user as it is for
people who, e.g., rebuild their own compilers.
Which would speak for adding a "nosparse" mount option.
...or not. The problem is not to add the mount option, but to make sure
people know that option and use it when "cygwin is slow". This rather
speaks for making nosparse the default and "sparse" the option, I guess.
Actually, if the assertions here are correct then wouldn't mounting /bin
as "cygexec" also make things better?
No. /bin isn't the problem. Running self-built executables is. If you
have a build system which runs self-built executables as part of the
build process (xgcc), you'd end up with a very slow build, apparently.

Idle musing: It would be interesting to know if building and testing a
gcc toolchain would run much faster if we disable the automatic
spare-file creation in lseek/write.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Christopher Faylor
2012-12-12 19:41:04 UTC
Permalink
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Christopher Faylor
Post by Ryan Johnson
It's painfully reproducible. It takes nearly two hours for a gcc
bootstrap compiler to configure the various bits of the next stage. It's
the same for emacs unexec (as OP reported).
I'd like to see a small controlled test case which demonstrates the
problem. If the claims here are all true then it should be very easy to
demonstrate without resorting to bootstrapping the compiler.
And, given my comment about setup.exe, I suspect that this isn't going
to be as alarming an issue for the normal Cygwin user as it is for
people who, e.g., rebuild their own compilers.
Which would speak for adding a "nosparse" mount option.
...or not. The problem is not to add the mount option, but to make sure
people know that option and use it when "cygwin is slow". This rather
speaks for making nosparse the default and "sparse" the option, I guess.
Actually, if the assertions here are correct then wouldn't mounting /bin
as "cygexec" also make things better?
No. /bin isn't the problem. Running self-built executables is. If you
have a build system which runs self-built executables as part of the
build process (xgcc), you'd end up with a very slow build, apparently.
The operative part of the above wasn't "/bin", it was mounting with the
"cygexec" option. That should bypass the mmap in av::fixup. I was
under the impression that was what screwed things up since it somehow
flushed the cache.

cgf
Ryan Johnson
2012-12-12 21:16:07 UTC
Permalink
Post by Corinna Vinschen
Idle musing: It would be interesting to know if building and testing a
gcc toolchain would run much faster if we disable the automatic
spare-file creation in lseek/write.
gcc-4.7.1

After de-sparsing all executables generated by `make stage1-all-gcc',
the subsequent invocation of `make -j2 stage1-all' generates 475 .o
files in two minutes on my laptop.

Assuming sparse-file penalties of 2s per invocation, that's 16
cpu-minutes saved, or 8.9x speedup for that part of the build.

Building stage2-all-gcc (which includes several ./configure script runs
using the now de-sparsed stage1 gcc) generates near-zero disk activity
and appears to run as fast as the stage1 configure did. There are eight
configure scripts to run, and assuming each has 50 tests that invoke gcc
means another 13 cpu-minutes saved.

Do that two more times, and we've saved 97 cpu-minutes to finish bootstrap.

I've never run the test suite, but it has what, 50k c and c++ files in
it? That would mean 28 cpu-hours saved.

Ryan
Ryan Johnson
2012-12-12 18:42:43 UTC
Permalink
Post by Christopher Faylor
Post by Ryan Johnson
It's painfully reproducible. It takes nearly two hours for a gcc
bootstrap compiler to configure the various bits of the next stage. It's
the same for emacs unexec (as OP reported).
I'd like to see a small controlled test case which demonstrates the
problem. If the claims here are all true then it should be very easy to
demonstrate without resorting to bootstrapping the compiler.
# fast: ~60ms
cp --sparse=never $(which emacs-nox) foo
for i in $(seq 10); do time ./foo -Q --batch --eval '(kill-emacs)'; done

# slow: ~800ms
cp --sparse=always $(which emacs-nox) foo
for i in $(seq 10); do time ./foo -Q --batch --eval '(kill-emacs)'; done

# even slower: ~3.5s (size on disk doubled)
cp --sparse=never $(which emacs-nox) foo
for i in $(seq 10); do time ./foo -Q --batch --eval '(kill-emacs)'; done

# fast again: ~60ms
rm foo
cp --sparse=never $(which emacs-nox) foo
for i in $(seq 10); do time ./foo -Q --batch --eval '(kill-emacs)'; done
Post by Christopher Faylor
And, given my comment about setup.exe, I suspect that this isn't going
to be as alarming an issue for the normal Cygwin user as it is for
people who, e.g., rebuild their own compilers. I'll bet setup.exe
doesn't know anything about sparse files so all of the executables
and dll should not be sparse.
Can anyone confirm or deny that?
Confirmed. The sparse flag is lost during normal copies because cygwin
has to ask for it specifically during an lseek or ftruncate past end.
Clobbering an existing file does not clear the flag, however.

Further, gcc doesn't seem to produce executables with holes in them, it
just writes some of the sections out of order. So even if you did make
setup.exe sparse file aware, it still shouldn't matter except for files
like emacs that do horrible things to themselves (cue post-install script).

Ryan
Daniel Colascione
2012-12-12 21:04:57 UTC
Permalink
Post by Eric Blake
Post by Corinna Vinschen
Post by Daniel Colascione
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse.
Eww. That would be a regression for coreutils, [...]
Really? How so?
When using 'cp --sparse=always', coreutils relies on lseek() to create
sparse files. Removing this code from cygwin would mean that coreutils
now has to be rewritten to explicitly ftruncate() instead of lseek() for
creating sparse files.
Post by Corinna Vinschen
Post by Daniel Colascione
Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
Why can't we instead use posix_fallocate() as a means of identifying a
file that must not be sparse, and then just patch the compiler to use
posix_fallocate() to never generate a sparse executable (but let all
other sparse files continue to behave as normal)?
posix_fallocate is not allowed to generate sparse files, due to the
"If posix_fallocate() returns successfully, subsequent writes to the
specified file data shall not fail due to the lack of free space on
the file system storage media."
See
http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html
Therefore only ftruncate and lseek potentially generate sparse files.
On second thought, I don't quite understand what you mean by "use
posix_fallocate() as a means of identifying a file that must not be
sparse". Can you explain, please?
Since we know that an executable must NOT be sparse in order to make it
more efficient with the Windows loader, then gcc should use
posix_fallocate() to guarantee that the file is NOT sparse, even if it
happens to issue a sequence of lseek() that would default to making it
sparse without the fallocate.
In other words, I'm proposing that we delete nothing from cygwin1.dll,
and instead fix the problem apps (gcc, emacs unexec)
I've already committed to Emacs trunk a fix for the unexec issue. It's probably
not necessary to backport this fix to the 24.3 release: presumably, people who
build Cygwin emacs from source will build the trunk.
Christopher Faylor
2012-12-12 17:03:53 UTC
Permalink
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse. Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
I don't know if this was already done (don't see it in a quick glance at
the archives) but, if this is just a simple case of executable files
being sparse, it seems like an obvious optimization would be to just to
do a, e.g.,

cp --sparse=never -p foo.exe foo.exe.tmp
mv foo.exe.tmp foo.exe

Wouldn't that remove the sparseness and wouldn't you see astounding
performance improvments as a result?

I don't think we should be considering ripping code out of Cygwin
without some actual data to back up claims. Testing something like the
above should make it easier to justify.

I'm actually rather surprised that setup.exe's tar code would maintain an
executable's sparseness.

cgf
Ryan Johnson
2012-12-12 17:11:46 UTC
Permalink
Post by Christopher Faylor
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse. Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
I don't know if this was already done (don't see it in a quick glance at
the archives) but, if this is just a simple case of executable files
being sparse, it seems like an obvious optimization would be to just to
do a, e.g.,
cp --sparse=never -p foo.exe foo.exe.tmp
mv foo.exe.tmp foo.exe
Wouldn't that remove the sparseness and wouldn't you see astounding
performance improvments as a result?
Nope. You'd have to rm foo.exe first.

Doing so fixes the problem nicely, though, as you suggest.
Post by Christopher Faylor
I don't think we should be considering ripping code out of Cygwin
without some actual data to back up claims. Testing something like the
above should make it easier to justify.
I'm actually rather surprised that setup.exe's tar code would maintain an
executable's sparseness.
Setup is fine. It's home-brew stuff that suffers, unless/until invoking
`make install' copies the sparse file to its final destination, losing
the sparse property along the way.

Personally, I'm still in shock that the loader barfs so badly over
sparse files... normal reads via mmap and fread use the fs cache just fine.

Ryan
Christopher Faylor
2012-12-12 19:21:39 UTC
Permalink
Post by Ryan Johnson
Post by Christopher Faylor
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse. Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
I don't know if this was already done (don't see it in a quick glance at
the archives) but, if this is just a simple case of executable files
being sparse, it seems like an obvious optimization would be to just to
do a, e.g.,
cp --sparse=never -p foo.exe foo.exe.tmp
mv foo.exe.tmp foo.exe
Wouldn't that remove the sparseness and wouldn't you see astounding
performance improvments as a result?
Nope. You'd have to rm foo.exe first.
"mv" does that automatically, doesn't it?
Post by Ryan Johnson
Doing so fixes the problem nicely, though, as you suggest.
Post by Christopher Faylor
I don't think we should be considering ripping code out of Cygwin
without some actual data to back up claims. Testing something like the
above should make it easier to justify.
I'm actually rather surprised that setup.exe's tar code would maintain an
executable's sparseness.
Setup is fine. It's home-brew stuff that suffers, unless/until invoking
`make install' copies the sparse file to its final destination, losing
the sparse property along the way.
Personally, I'm still in shock that the loader barfs so badly over
sparse files... normal reads via mmap and fread use the fs cache just fine.
If we're talking about the loader then why are the examples I asked for
using "cp"? That's not really apples-to-apples.

cgf
Ryan Johnson
2012-12-12 20:09:49 UTC
Permalink
Post by Christopher Faylor
Post by Ryan Johnson
Post by Christopher Faylor
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse. Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
I don't know if this was already done (don't see it in a quick glance at
the archives) but, if this is just a simple case of executable files
being sparse, it seems like an obvious optimization would be to just to
do a, e.g.,
cp --sparse=never -p foo.exe foo.exe.tmp
mv foo.exe.tmp foo.exe
Wouldn't that remove the sparseness and wouldn't you see astounding
performance improvments as a result?
Nope. You'd have to rm foo.exe first.
"mv" does that automatically, doesn't it?
Oops. Thinko. I had "cp" in my head.
Post by Christopher Faylor
Post by Ryan Johnson
Doing so fixes the problem nicely, though, as you suggest.
Post by Christopher Faylor
I don't think we should be considering ripping code out of Cygwin
without some actual data to back up claims. Testing something like the
above should make it easier to justify.
I'm actually rather surprised that setup.exe's tar code would maintain an
executable's sparseness.
Setup is fine. It's home-brew stuff that suffers, unless/until invoking
`make install' copies the sparse file to its final destination, losing
the sparse property along the way.
Personally, I'm still in shock that the loader barfs so badly over
sparse files... normal reads via mmap and fread use the fs cache just fine.
If we're talking about the loader then why are the examples I asked for
using "cp"? That's not really apples-to-apples.
I think I'm miscommunicating something here... let me try again.

Problem:

Attempts to execute a file with the NTFS "sparse" attribute force the
Windows loader to bypass the fs cache and fetch the file from disk. Long
start times result, and are a pain if the executable is invoked
frequently. The problem gets worse if you fill in all the holes, because
it's still "sparse" but now has more bytes on disk to fetch. This is
arguably a bug in Windows.

Steps to repro:

1. Arrange for the creation of a sparse file
2. Execute said file, and enjoy the delay while the loader bypasses the
file cache to get the data directly from disk

"cp" with its "--sparse" option is merely an easy way to accomplish step
#1. Step #2 is where all the fun happens.

STC attached. Output on my machine is below.

Workaround: copy the file to strip the flag, disable the sparse file
optimization in lseek(), or replace lseek/write pairs that write out
executable files with calls to pwrite (which apparently lacks the
optimization).


$ ./slow.sh 2>&1 | grep '\(real\|sparse\)'
non-sparse original runs quickly
real 0m0.078s
sparse copy can be read quickly from fs cache
real 0m0.036s
sparse copy slow-to-run in spite of being cached
real 0m2.969s
sparse copy no longer fully cached
real 0m0.911s
filling all holes makes sparse penalty worse
real 0m5.289s

Ryan
Daniel Colascione
2012-12-12 21:20:57 UTC
Permalink
Post by Ryan Johnson
Post by Christopher Faylor
Post by Ryan Johnson
Post by Christopher Faylor
Post by Daniel Colascione
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file. It makes sense that compilers (and Emacs unexec) would create sparse
files as they seek around inside their outputs.
Anyway, the binary is sparse because our linker produces sparse files.
Would the Cygwin developers accept this patch? With it, applications would need
to explicitly use ftruncate to make files sparse. Considering the horrible and
unexpected performance implications of sparse files, I don't think generating
them automatically from a sequence of seeks and writes is the right thing to do.
I don't know if this was already done (don't see it in a quick glance at
the archives) but, if this is just a simple case of executable files
being sparse, it seems like an obvious optimization would be to just to
do a, e.g.,
cp --sparse=never -p foo.exe foo.exe.tmp
mv foo.exe.tmp foo.exe
Wouldn't that remove the sparseness and wouldn't you see astounding
performance improvments as a result?
Nope. You'd have to rm foo.exe first.
"mv" does that automatically, doesn't it?
Oops. Thinko. I had "cp" in my head.
Post by Christopher Faylor
Post by Ryan Johnson
Doing so fixes the problem nicely, though, as you suggest.
Post by Christopher Faylor
I don't think we should be considering ripping code out of Cygwin
without some actual data to back up claims. Testing something like the
above should make it easier to justify.
I'm actually rather surprised that setup.exe's tar code would maintain an
executable's sparseness.
Setup is fine. It's home-brew stuff that suffers, unless/until invoking
`make install' copies the sparse file to its final destination, losing
the sparse property along the way.
Personally, I'm still in shock that the loader barfs so badly over
sparse files... normal reads via mmap and fread use the fs cache just fine.
If we're talking about the loader then why are the examples I asked for
using "cp"? That's not really apples-to-apples.
I think I'm miscommunicating something here... let me try again.
Attempts to execute a file with the NTFS "sparse" attribute force the Windows
loader to bypass the fs cache and fetch the file from disk.
Almost. What actually happens is that the sparse attribute forces the system to
flush the cache _for the whole file_ when av::fixup mmaps the binary, forcing
the subsequent execution of the image we mmapped to reload everything from disk.
Cygexec will work around the problem by avoiding the av::fixup mmap, but we
shouldn't expect users to mount everything, including various build directories,
cygexec. (I also haven't checked whether anything in /bin actually ends up
sparse. It'd be interesting to see.)

We could also work around the problem by making av::fixup use regular buffered
reads instead of mmap. Regular buffered reads don't seem to trigger the cache flush.

Still, I think that creating sparse files only on ftruncate is the right thing
to do, at least for existing OS versions. Basically, almost nobody _uses_ sparse
files in Windows (except us), so the paths that deal with them are optimized for
simplicity and correctness, not performance.
Corinna Vinschen
2012-12-13 09:09:42 UTC
Permalink
Post by Daniel Colascione
Post by Ryan Johnson
I think I'm miscommunicating something here... let me try again.
Attempts to execute a file with the NTFS "sparse" attribute force the Windows
loader to bypass the fs cache and fetch the file from disk.
Almost. What actually happens is that the sparse attribute forces the system to
flush the cache _for the whole file_ when av::fixup mmaps the binary, forcing
the subsequent execution of the image we mmapped to reload everything from disk.
Cygexec will work around the problem by avoiding the av::fixup mmap, but we
shouldn't expect users to mount everything, including various build directories,
cygexec. (I also haven't checked whether anything in /bin actually ends up
sparse. It'd be interesting to see.)
cygexec is not such a good solution in the long run. For 64 bit Cygwin
we *have* to know if the process to execute is a 32 or 64 bit process
so we can't quite avoid the hook_or_detect_cygwin test.
Post by Daniel Colascione
We could also work around the problem by making av::fixup use regular buffered
reads instead of mmap. Regular buffered reads don't seem to trigger the cache flush.
hook_or_detect_cygwin needs a second mapping if the file is big enough,
so rewriting this code to use ReadFile's isn't that simple. The
question here is this. Assuming we change MapViewOfFile to
VirtualAlloc/ReadFile, will that make things slower? If so, it's not
worth it. We would slow down the general case for the border case.

Of course we could use VA/RF only on sparse files since we know if an
executable is sparse, thanks to the FILE_ATTRIBUTE_SPARSE_FILE attribute.
Post by Daniel Colascione
Still, I think that creating sparse files only on ftruncate is the right thing
to do, at least for existing OS versions. Basically, almost nobody _uses_ sparse
files in Windows (except us), so the paths that deal with them are optimized for
simplicity and correctness, not performance.
It is kind of annoying that sparse files are something special in
Windows, rather then the normal case.

After a night's sleep I tend to agree with you, though. I found an old
discussion in the cygwin archives from 2003 which also handled sparse
file slowness. It seems this Windows code hasn't changed a lot since
then.

Cygwin is used a lot for development, and it is famed for its slowness.
If we can notably speed up normal operation by disabling the automatic
sparse handling in lseek/write, I think we should do it. For those
requiring sparseness on NTFS, we can add a "sparse" mount option and
run the code in lseek/write only if that mount flag is set. And
for symmetry we should probbaly do the same in ftruncate.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Daniel Colascione
2012-12-13 09:44:34 UTC
Permalink
Post by Corinna Vinschen
Post by Daniel Colascione
(I also haven't checked whether anything in /bin actually ends up
sparse. It'd be interesting to see.)
I checked. Nothing in my /bin was sparse.
Post by Corinna Vinschen
cygexec is not such a good solution in the long run. For 64 bit Cygwin
we *have* to know if the process to execute is a 32 or 64 bit process
so we can't quite avoid the hook_or_detect_cygwin test.
Today, I run with /bin mounted cygexec. Will I not be able to do
that once I have mixed 32- and 64-bit Cygwin binaries in PATH?
Post by Corinna Vinschen
Post by Daniel Colascione
We could also work around the problem by making av::fixup use regular buffered
reads instead of mmap. Regular buffered reads don't seem to trigger the cache flush.
hook_or_detect_cygwin needs a second mapping if the file is big enough,
Right --- it walks various tables.
Post by Corinna Vinschen
so rewriting this code to use ReadFile's isn't that simple. The
question here is this. Assuming we change MapViewOfFile to
VirtualAlloc
Why VirtualAlloc? For smaller allocations, alloca should suffice,
and for pathologically large ones, we could just use the normal
process heap.
Post by Corinna Vinschen
/ReadFile, will that make things slower?
Not having benchmarked a ReadFile version of av::fixup, it's hard to
make a definitive statement. Still, the ReadFile version of my test
program had no effect on overall execution time. Using ReadFile
instead of a section view shouldn't affect the total number of IOs
either (assuming readahead is disabled by FILE_FLAG_RANDOM_ACCESS).
The performance penalty would be an extra memcpy, which, for the IO
sizes we're talking about, is noise.

This approach still makes me nervous. I'd be worried that 1) using
ReadFile wouldn't actually eliminate the redundant loads in some
cases, and 2) there might be other hidden performance problems with
sparse files. Examples of the latter problem might be cache flushes
caused by programs that mmap sparse data files and antivirus
programs that mmap all executables prior to execution in order to do
some kind of scanning.
Post by Corinna Vinschen
Of course we could use VA/RF only on sparse files since we know if an
executable is sparse, thanks to the FILE_ATTRIBUTE_SPARSE_FILE attribute.
Good point.
Post by Corinna Vinschen
Post by Daniel Colascione
Still, I think that creating sparse files only on ftruncate is the right thing
to do, at least for existing OS versions. Basically, almost nobody _uses_ sparse
files in Windows (except us), so the paths that deal with them are optimized for
simplicity and correctness, not performance.
It is kind of annoying that sparse files are something special in
Windows, rather then the normal case.
After a night's sleep I tend to agree with you, though. I found an old
discussion in the cygwin archives from 2003 which also handled sparse
file slowness. It seems this Windows code hasn't changed a lot since
then.
I saw that thread as well, but I didn't see any mention of sparse
file caching behavior. The thread didn't seem all that productive
(or pleasant).
Post by Corinna Vinschen
Cygwin is used a lot for development, and it is famed for its slowness.
If we can notably speed up normal operation by disabling the automatic
sparse handling in lseek/write, I think we should do it. For those
requiring sparseness on NTFS, we can add a "sparse" mount option and
run the code in lseek/write only if that mount flag is set. And
for symmetry we should probbaly do the same in ftruncate.
What about using the automatic sparse handling in lseek/lwrite and
ftruncate only when the file being operated on is already sparse?
Once a file is marked sparse, we've already taken the performance
hit, so we might as well punch holes where appropriate.

As you mentioned above, we can figure that out very cheaply. Tools
that depend on being able to create sparse files (e.g. cp in some
modes) could be patched to turn on the sparse flag as needed without
having to rewrite all the primary lseek/ftruncate logic.
Corinna Vinschen
2012-12-13 10:39:41 UTC
Permalink
Post by Daniel Colascione
Post by Corinna Vinschen
cygexec is not such a good solution in the long run. For 64 bit Cygwin
we *have* to know if the process to execute is a 32 or 64 bit process
so we can't quite avoid the hook_or_detect_cygwin test.
Today, I run with /bin mounted cygexec. Will I not be able to do
that once I have mixed 32- and 64-bit Cygwin binaries in PATH?
Depends on the way "cygexec" is implemented. At execve time we cannot
pass over internal data (cygheap in the first place) the same way as
today if the parent is 32 and the child is 64 bit or vice versa. In
these cases we need to use another technique, still to be developed.
Whatever this method, obviously this requires us to know the target CPU
of an executable at execve time. This requires us *at least* to call
GetBinaryType, which, I suppose, is nothing else but a CreateFile,
ReadFile,Close with a check for the machine type in the IMAGE_NT_HEADERS
file header... which is pretty much exactly what av:fixup does to
figure out the executable type. See the extended version in the 64 bit
branch.
Post by Daniel Colascione
Post by Corinna Vinschen
so rewriting this code to use ReadFile's isn't that simple. The
question here is this. Assuming we change MapViewOfFile to
VirtualAlloc
Why VirtualAlloc? For smaller allocations, alloca should suffice,
and for pathologically large ones, we could just use the normal
process heap.
alloca is kind of dangerous since we have a lot of stack pressure.
We could use tmp_pathbuf for the first 64K and malloc for the buffer
in hook_or_detect_cygwin.
Post by Daniel Colascione
Post by Corinna Vinschen
/ReadFile, will that make things slower?
Not having benchmarked a ReadFile version of av::fixup, it's hard to
make a definitive statement. Still, the ReadFile version of my test
program had no effect on overall execution time. Using ReadFile
instead of a section view shouldn't affect the total number of IOs
either (assuming readahead is disabled by FILE_FLAG_RANDOM_ACCESS).
The performance penalty would be an extra memcpy, which, for the IO
sizes we're talking about, is noise.
This approach still makes me nervous. I'd be worried that 1) using
ReadFile wouldn't actually eliminate the redundant loads in some
cases, and 2) there might be other hidden performance problems with
sparse files. Examples of the latter problem might be cache flushes
caused by programs that mmap sparse data files and antivirus
programs that mmap all executables prior to execution in order to do
some kind of scanning.
Valid point. One has to wonder why sparse files were implemented at
all, if nobody seems to care for them, of course...

(Same with transactional NTFS, btw. It's the only way to implement
really POSIX compliant unlink() or rename(), but it's implementation
is slow, and it's supposed to die again in the near future. Oh well)
Post by Daniel Colascione
Post by Corinna Vinschen
After a night's sleep I tend to agree with you, though. I found an old
discussion in the cygwin archives from 2003 which also handled sparse
file slowness. It seems this Windows code hasn't changed a lot since
then.
I saw that thread as well, but I didn't see any mention of sparse
file caching behavior. The thread didn't seem all that productive
(or pleasant).
Well, it lead to the 128K test. Memory was still sparse in 2003 ;)
Post by Daniel Colascione
Post by Corinna Vinschen
Cygwin is used a lot for development, and it is famed for its slowness.
If we can notably speed up normal operation by disabling the automatic
sparse handling in lseek/write, I think we should do it. For those
requiring sparseness on NTFS, we can add a "sparse" mount option and
run the code in lseek/write only if that mount flag is set. And
for symmetry we should probbaly do the same in ftruncate.
What about using the automatic sparse handling in lseek/lwrite and
ftruncate only when the file being operated on is already sparse?
That doesn't make sense. If the file is already sparse, there's no
reason to set the sparse flag in write or ftruncate again. Also, if you
set the sparse flag only on already sparse files, you will never be able
to create sparse files.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Václav Zeman
2012-12-13 11:05:26 UTC
Permalink
Post by Corinna Vinschen
Valid point. One has to wonder why sparse files were implemented at
all, if nobody seems to care for them, of course...
The NTFS journal is a sparse file/stream ([1]). So this is one place
where they do care about it. Also, I would guess that feature parity
with other common file systems is another reason why.

[1] http://technet.microsoft.com/en-us/library/cc976808.aspx
--
VZ
Corinna Vinschen
2012-12-13 14:30:36 UTC
Permalink
Post by Corinna Vinschen
Post by Daniel Colascione
Post by Corinna Vinschen
Cygwin is used a lot for development, and it is famed for its slowness.
If we can notably speed up normal operation by disabling the automatic
sparse handling in lseek/write, I think we should do it. For those
requiring sparseness on NTFS, we can add a "sparse" mount option and
run the code in lseek/write only if that mount flag is set. And
for symmetry we should probbaly do the same in ftruncate.
What about using the automatic sparse handling in lseek/lwrite and
ftruncate only when the file being operated on is already sparse?
That doesn't make sense. If the file is already sparse, there's no
reason to set the sparse flag in write or ftruncate again. Also, if you
set the sparse flag only on already sparse files, you will never be able
to create sparse files.
Below is my cut on the issue. It introduces a "sparse" mount option and
sparsifies a file only if this mount option is set. I also slightly
improved the code in fhandler_base::write and ftruncate so that it will
not try to set the sparse flag on already sparse files.

The code compiles, so it's basically correct. It's just not tested. ;)


Corinna


* fhandler.cc (fhandler_base::write): Don't attempt to sparsify
an already sparse file. Drop check for FILE_SUPPORTS_SPARSE_FILES
flag. Explicitely set FILE_ATTRIBUTE_SPARSE_FILE attribute in
cached attributes.
(fhandler_base::lseek): Only set did_lseek if sparseness is supported.
* fhandler_disk_file.cc (fhandler_disk_file::ftruncate): Don't attempt
to sparsify an already sparse file. Explicitely set
FILE_ATTRIBUTE_SPARSE_FILE attribute in cached attributes.
* mount.cc (oopt): Add "sparse" flag.
(fillout_mntent): Ditto.
* path.h (enum path_types): Add PATH_SPARSE.
(path_conv::support_sparse): New method.
(path_conv::fs_flags): Constify.
(path_conv::fs_name_len): Ditto.
include/sys/mount.h: Replace unused MOUNT_MIXED flag with MOUNT_SPARSE.


Index: fhandler.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/fhandler.cc,v
retrieving revision 1.429
diff -u -p -r1.429 fhandler.cc
--- fhandler.cc 16 Aug 2012 23:34:43 -0000 1.429
+++ fhandler.cc 13 Dec 2012 14:29:35 -0000
@@ -817,15 +817,17 @@ ssize_t __stdcall
fhandler_base::write (const void *ptr, size_t len)
{
int res;
- IO_STATUS_BLOCK io;
- FILE_POSITION_INFORMATION fpi;
- FILE_STANDARD_INFORMATION fsi;

if (did_lseek ())
{
+ IO_STATUS_BLOCK io;
+ FILE_POSITION_INFORMATION fpi;
+ FILE_STANDARD_INFORMATION fsi;
+
did_lseek (false); /* don't do it again */

if (!(get_flags () & O_APPEND)
+ && !has_attribute (FILE_ATTRIBUTE_SPARSE_FILE)
&& NT_SUCCESS (NtQueryInformationFile (get_output_handle (),
&io, &fsi, sizeof fsi,
FileStandardInformation))
@@ -833,8 +835,7 @@ fhandler_base::write (const void *ptr, s
&io, &fpi, sizeof fpi,
FilePositionInformation))
&& fpi.CurrentByteOffset.QuadPart
- >= fsi.EndOfFile.QuadPart + (128 * 1024)
- && (pc.fs_flags () & FILE_SUPPORTS_SPARSE_FILES))
+ >= fsi.EndOfFile.QuadPart + (128 * 1024))
{
/* If the file system supports sparse files and the application
is writing after a long seek beyond EOF, convert the file to
@@ -842,6 +843,9 @@ fhandler_base::write (const void *ptr, s
NTSTATUS status;
status = NtFsControlFile (get_output_handle (), NULL, NULL, NULL,
&io, FSCTL_SET_SPARSE, NULL, 0, NULL, 0);
+ if (NT_SUCCESS (status))
+ pc.file_attributes (pc.file_attributes ()
+ | FILE_ATTRIBUTE_SPARSE_FILE);
debug_printf ("%p = NtFsControlFile(%S, FSCTL_SET_SPARSE)",
status, pc.get_nt_native_path ());
}
@@ -1071,7 +1075,8 @@ fhandler_base::lseek (_off64_t offset, i

/* When next we write(), we will check to see if *this* seek went beyond
the end of the file and if so, potentially sparsify the file. */
- did_lseek (true);
+ if (pc.support_sparse ())
+ did_lseek (true);

/* If this was a SEEK_CUR with offset 0, we still might have
readahead that we have to take into account when calculating
Index: fhandler_disk_file.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/fhandler_disk_file.cc,v
retrieving revision 1.380
diff -u -p -r1.380 fhandler_disk_file.cc
--- fhandler_disk_file.cc 31 Oct 2012 15:02:13 -0000 1.380
+++ fhandler_disk_file.cc 13 Dec 2012 14:29:36 -0000
@@ -1189,12 +1189,15 @@ fhandler_disk_file::ftruncate (_off64_t
feofi.EndOfFile.QuadPart = length;
/* Create sparse files only when called through ftruncate, not when
called through posix_fallocate. */
- if (allow_truncate
- && (pc.fs_flags () & FILE_SUPPORTS_SPARSE_FILES)
+ if (allow_truncate && pc.support_sparse ()
+ && !has_attribute (FILE_ATTRIBUTE_SPARSE_FILE)
&& length >= fsi.EndOfFile.QuadPart + (128 * 1024))
{
status = NtFsControlFile (get_handle (), NULL, NULL, NULL, &io,
FSCTL_SET_SPARSE, NULL, 0, NULL, 0);
+ if (NT_SUCCESS (status))
+ pc.file_attributes (pc.file_attributes ()
+ | FILE_ATTRIBUTE_SPARSE_FILE);
syscall_printf ("%p = NtFsControlFile(%S, FSCTL_SET_SPARSE)",
status, pc.get_nt_native_path ());
}
Index: mount.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/mount.cc,v
retrieving revision 1.95
diff -u -p -r1.95 mount.cc
--- mount.cc 16 Aug 2012 23:34:44 -0000 1.95
+++ mount.cc 13 Dec 2012 14:29:36 -0000
@@ -1028,6 +1028,7 @@ struct opt
{"override", MOUNT_OVERRIDE, 0},
{"posix=0", MOUNT_NOPOSIX, 0},
{"posix=1", MOUNT_NOPOSIX, 1},
+ {"sparse", MOUNT_SPARSE, 0},
{"text", MOUNT_BINARY, 1},
{"user", MOUNT_SYSTEM, 1}
};
@@ -1667,6 +1668,9 @@ fillout_mntent (const char *native_path,
if (flags & MOUNT_NOPOSIX)
strcat (_my_tls.locals.mnt_opts, (char *) ",posix=0");

+ if (!(flags & MOUNT_SPARSE)) /* user mount */
+ strcat (_my_tls.locals.mnt_opts, (char *) ",sparse");
+
if (!(flags & MOUNT_SYSTEM)) /* user mount */
strcat (_my_tls.locals.mnt_opts, (char *) ",user");

Index: path.h
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/path.h,v
retrieving revision 1.171
diff -u -p -r1.171 path.h
--- path.h 31 Jul 2012 19:36:16 -0000 1.171
+++ path.h 13 Dec 2012 14:29:36 -0000
@@ -70,6 +70,7 @@ enum path_types
PATH_EXEC = MOUNT_EXEC,
PATH_NOTEXEC = MOUNT_NOTEXEC,
PATH_CYGWIN_EXEC = MOUNT_CYGWIN_EXEC,
+ PATH_SPARSE = MOUNT_SPARSE,
PATH_RO = MOUNT_RO,
PATH_NOACL = MOUNT_NOACL,
PATH_NOPOSIX = MOUNT_NOPOSIX,
@@ -153,6 +154,11 @@ class path_conv
bool has_acls () const {return !(path_flags & PATH_NOACL) && fs.has_acls (); }
bool hasgood_inode () const {return !(path_flags & PATH_IHASH); }
bool isgood_inode (__ino64_t ino) const;
+ bool support_sparse () const
+ {
+ return (path_flags & PATH_SPARSE)
+ && (fs_flags () & FILE_SUPPORTS_SPARSE_FILES);
+ }
int has_symlinks () const {return path_flags & PATH_HAS_SYMLINKS;}
int has_dos_filenames_only () const {return path_flags & PATH_DOS;}
int has_buggy_open () const {return fs.has_buggy_open ();}
@@ -342,8 +348,8 @@ class path_conv
short get_unitn () const {return dev.get_minor ();}
DWORD file_attributes () const {return fileattr;}
void file_attributes (DWORD new_attr) {fileattr = new_attr;}
- DWORD fs_flags () {return fs.flags ();}
- DWORD fs_name_len () {return fs.name_len ();}
+ DWORD fs_flags () const {return fs.flags ();}
+ DWORD fs_name_len () const {return fs.name_len ();}
bool fs_got_fs () const { return fs.got_fs (); }
bool fs_is_fat () const {return fs.is_fat ();}
bool fs_is_ntfs () const {return fs.is_ntfs ();}
Index: include/sys/mount.h
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/include/sys/mount.h,v
retrieving revision 1.15
diff -u -p -r1.15 mount.h
--- include/sys/mount.h 11 Aug 2010 10:58:06 -0000 1.15
+++ include/sys/mount.h 13 Dec 2012 14:29:36 -0000
@@ -26,8 +26,8 @@ enum
device mount */
MOUNT_CYGWIN_EXEC = 0x00040, /* file or directory is or contains a
cygwin executable */
- MOUNT_MIXED = 0x00080, /* reads are text, writes are binary
- not yet implemented */
+ MOUNT_SPARSE = 0x00080, /* Support automatic sparsifying of
+ files. */
MOUNT_NOTEXEC = 0x00100, /* don't check files for executable magic */
MOUNT_DEVFS = 0x00200, /* /device "filesystem" */
MOUNT_PROC = 0x00400, /* /proc "filesystem" */
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Eric Blake
2012-12-13 17:32:57 UTC
Permalink
Post by Corinna Vinschen
Below is my cut on the issue. It introduces a "sparse" mount option and
sparsifies a file only if this mount option is set. I also slightly
improved the code in fhandler_base::write and ftruncate so that it will
not try to set the sparse flag on already sparse files.
The code compiles, so it's basically correct. It's just not tested. ;)
Also untested (so far) on my end, but I can agree to this - anyone that
_wants_ sparse files (such as for virtual disk image) will have to
enable that mount point option for the directory of those files in
question, but no one else is forced to use them. As sparse files are
_supposed_ to be an optimization, not being sparse is not a loss in
functionality, just in potential optimizations; but if Windows is not
taking advantage of optimizations even when a file IS sparse, then that
argues that we aren't losing much.

It should not be too hard for me to rig the coreutils testsuite to
ensure that its tests of sparse operations are done on a mount
explicitly set up to allow sparse files.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Corinna Vinschen
2012-12-13 18:53:47 UTC
Permalink
Post by Eric Blake
Post by Corinna Vinschen
Below is my cut on the issue. It introduces a "sparse" mount option and
sparsifies a file only if this mount option is set. I also slightly
improved the code in fhandler_base::write and ftruncate so that it will
not try to set the sparse flag on already sparse files.
The code compiles, so it's basically correct. It's just not tested. ;)
Also untested (so far) on my end, but I can agree to this - anyone that
_wants_ sparse files (such as for virtual disk image) will have to
enable that mount point option for the directory of those files in
question, but no one else is forced to use them. As sparse files are
_supposed_ to be an optimization, not being sparse is not a loss in
functionality, just in potential optimizations; but if Windows is not
taking advantage of optimizations even when a file IS sparse, then that
argues that we aren't losing much.
It should not be too hard for me to rig the coreutils testsuite to
ensure that its tests of sparse operations are done on a mount
explicitly set up to allow sparse files.
Yup, just add a temporary user mount point.



Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Corinna Vinschen
2012-12-14 11:18:15 UTC
Permalink
Post by Corinna Vinschen
Post by Eric Blake
Post by Corinna Vinschen
Below is my cut on the issue. It introduces a "sparse" mount option and
sparsifies a file only if this mount option is set. I also slightly
improved the code in fhandler_base::write and ftruncate so that it will
not try to set the sparse flag on already sparse files.
The code compiles, so it's basically correct. It's just not tested. ;)
Also untested (so far) on my end, but I can agree to this - anyone that
_wants_ sparse files (such as for virtual disk image) will have to
enable that mount point option for the directory of those files in
question, but no one else is forced to use them. As sparse files are
_supposed_ to be an optimization, not being sparse is not a loss in
functionality, just in potential optimizations; but if Windows is not
taking advantage of optimizations even when a file IS sparse, then that
argues that we aren't losing much.
It should not be too hard for me to rig the coreutils testsuite to
ensure that its tests of sparse operations are done on a mount
explicitly set up to allow sparse files.
Yup, just add a temporary user mount point.
I applied my patch with a single fix. The logic for adding the "sparse"\
option string to mnt_opts was upside down.


Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-12-27 22:53:18 UTC
Permalink
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Eric Blake
Post by Corinna Vinschen
Below is my cut on the issue. It introduces a "sparse" mount option and
sparsifies a file only if this mount option is set. I also slightly
improved the code in fhandler_base::write and ftruncate so that it will
not try to set the sparse flag on already sparse files.
The code compiles, so it's basically correct. It's just not tested. ;)
Also untested (so far) on my end, but I can agree to this - anyone that
_wants_ sparse files (such as for virtual disk image) will have to
enable that mount point option for the directory of those files in
question, but no one else is forced to use them. As sparse files are
_supposed_ to be an optimization, not being sparse is not a loss in
functionality, just in potential optimizations; but if Windows is not
taking advantage of optimizations even when a file IS sparse, then that
argues that we aren't losing much.
It should not be too hard for me to rig the coreutils testsuite to
ensure that its tests of sparse operations are done on a mount
explicitly set up to allow sparse files.
Yup, just add a temporary user mount point.
I applied my patch with a single fix. The logic for adding the "sparse"\
option string to mnt_opts was upside down.
I just finished building gcc-4.7 using the latest cygwin1 snapshot, and
performance is vastly, drastically improved. Stages 2 and 3 went as fast
as Stage 1, and the disk light only flickered occasionally instead of
staying pegged like it used to.

Thanks for finding and fixing this!

Ryan
Jin-woo Ye
2013-01-01 14:08:50 UTC
Permalink
Post by Ryan Johnson
Post by Corinna Vinschen
Post by Corinna Vinschen
Post by Eric Blake
Post by Corinna Vinschen
Below is my cut on the issue. It introduces a "sparse" mount option and
sparsifies a file only if this mount option is set. I also slightly
improved the code in fhandler_base::write and ftruncate so that it will
not try to set the sparse flag on already sparse files.
The code compiles, so it's basically correct. It's just not tested. ;)
Also untested (so far) on my end, but I can agree to this - anyone that
_wants_ sparse files (such as for virtual disk image) will have to
enable that mount point option for the directory of those files in
question, but no one else is forced to use them. As sparse files are
_supposed_ to be an optimization, not being sparse is not a loss in
functionality, just in potential optimizations; but if Windows is not
taking advantage of optimizations even when a file IS sparse, then that
argues that we aren't losing much.
It should not be too hard for me to rig the coreutils testsuite to
ensure that its tests of sparse operations are done on a mount
explicitly set up to allow sparse files.
Yup, just add a temporary user mount point.
I applied my patch with a single fix. The logic for adding the "sparse"\
option string to mnt_opts was upside down.
I just finished building gcc-4.7 using the latest cygwin1 snapshot, and
performance is vastly, drastically improved. Stages 2 and 3 went as fast
as Stage 1, and the disk light only flickered occasionally instead of
staying pegged like it used to.
Thanks for finding and fixing this!
Ryan
Building lib{java|fortran|stdc++} are now much faster than before the fix.
I don't want to remind how slow building gcc was !
Before that fix, I had to defragment a hard disk during building gcc or
it would take more than 3 hours.
Thanks for Corinna for fixing this.
--
Regards.
Daniel Colascione
2012-12-13 19:02:47 UTC
Permalink
Post by Corinna Vinschen
Post by Daniel Colascione
What about using the automatic sparse handling in lseek/lwrite and
ftruncate only when the file being operated on is already sparse?
That doesn't make sense. If the file is already sparse, there's no
reason to set the sparse flag in write or ftruncate again. Also, if you
set the sparse flag only on already sparse files, you will never be able
to create sparse
Yes, you're right. I thought I remembered a separate call we could
have retained to actually punch a hole in a sparse file. I checked
the code, and all we do is set the sparse flag.
Corinna Vinschen
2012-12-13 19:20:43 UTC
Permalink
Post by Daniel Colascione
Post by Corinna Vinschen
Post by Daniel Colascione
What about using the automatic sparse handling in lseek/lwrite and
ftruncate only when the file being operated on is already sparse?
That doesn't make sense. If the file is already sparse, there's no
reason to set the sparse flag in write or ftruncate again. Also, if you
set the sparse flag only on already sparse files, you will never be able
to create sparse
Yes, you're right. I thought I remembered a separate call we could
have retained to actually punch a hole in a sparse file.
That's what Eric mentioned yesterday. The Linux-specific fallocate call
can do that, but this isn't implemented in Cygwin yet(*).


Corinna

(*) http://cygwin.com/ml/cygwin-developers/2012-12/msg00018.html
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
Ryan Johnson
2012-12-13 12:32:17 UTC
Permalink
Post by Daniel Colascione
Post by Daniel Colascione
(I also haven't checked whether anything in /bin actually ends up
sparse. It'd be interesting to see.)
I checked. Nothing in my /bin was sparse.
No doubt because setup.exe wouldn't have left the sparse flag on
anything. Furthermore, my emacs binaries are the only ones with holes
big enough for `cp' to detect.

So we basically lose nothing by preventing executable files from being
marked as sparse...

Ryan
Ryan Johnson
2012-12-12 03:54:13 UTC
Permalink
Post by Daniel Colascione
Post by Daniel Colascione
The key to generating a binary that repros the problem is to unexec emacs, then
try to repro with that generated binary, not a copy of it.
The real explanation is a lot simpler: the binary is sparse. When you create a
file mapping object for a sparse file, Windows discards all cached pages for
that file.
!!

Wow. How in the world did you figure that out?

Ryan
Loading...