Way back in April (2014), I was in an Object Oriented Programming class, and I’d just started to master C++. With WinDirStat I had more than just an interesting problem to work on, but a real challenge to take on. But enough about WinDirStat, that’s for another article.
Side note: 90% of the slowdown in WinDirStat was NOT related to actually walking a directory tree, and was fixed by swapping a few data structures. Directory walking was however, still slow.
On windows there are two obvious ways to recursively enumerate files and directories. The first, provided by the Windows API, is with FindFirstFile/FindFirstFileEx & FindNextFile. This is a C/C++ API that will, on every call, (a) fill a predefined structure with information about a single file, or (b) fail and set the last error to ERROR_NO_MORE_FILES (the terrible behavior typical of the Windows API). The second, provided by MFC, is the CFileFind class. CFileFind is nothing more than a convenience wrapper for FindFirstFile, but with some utility functions & overhead.
The biggest downside to both methods is that they require a system call for every file. Internally, FindNextFile opens a file handle and then calls NtQueryDirectoryFile with said handle. This is terribly inefficient, especially if 8dot3 name creation is enabled. Back to the documentation.
Aha! With FindFirstFileEx, we can ask for only the relevant information (with FindExInfoBasic), and even better, there’s a mysterious flag: “FIND_FIRST_EX_LARGE_FETCH”, which is described to mean: “Uses a larger buffer for directory queries, which can increase performance of the find operation.”
Of course, the MSDN documentation provides “just the facts” (as Raymond Chen describes it), and nothing about where FIND_FIRST_EX_LARGE_FETCH should be used or what performance benefit it might provide. Raymond Chen offers some theory, but no hard numbers (and some comments suggest that it makes no difference). There’s also a poorly formatted Visual Studio user suggestion, which suggests using the USN (update sequence number) journal, which is not a viable option.
Perhaps most interestingly, FIND_FIRST_EX_LARGE_FETCH is used in the Chromium codebase!
The source even mentions that it “should speed up large enumerations”, but provides NO evidence.
Sidenote: the USN journal is not viable as it stores only a log of changes. There is no structure information, and it isn’t guaranteed to hold information about every file on the drive. Reconstructing the filesystem tree from the USN journal is an extremely complex task, and I doubt it could be done quickly, if at all.
The truth is: no matter what flags you set, you will see roughly the same picture:
Nearly all of the program’s time is spent in NtOpenFile & NtQueryDirectoryFile, and a single core (I have 8 logical) is pegged at 100%.
std::async as a force multiplier
Before I get to the hard numbers, I want to mention that there’s a parallel option. God damn do I love C++11.