How fast are FindFirstFile/FindFirstFileEx, and CFileFind – actually?
For the past several months, I’ve been working on a fork of WinDirStat, the extremely popular disk usage analysis program.
WinDirStat is fantastically useful, but also fantastically slow. A well populated drive will lock WinDirStat for anywhere from fifteen minutes to a few hours, or worse (mirror).
Way back in April (2014), I was in an Object Oriented Programming class, and I’d just started to master C++. With WinDirStat I had more than just an interesting problem to work on, but a real challenge to take on. But enough about WinDirStat, that’s for another article.
Side note: 90% of the slowdown in WinDirStat was NOT related to actually walking a directory tree, and was fixed by swapping a few data structures. Directory walking was however, still slow.
On windows there are two obvious ways to recursively enumerate files and directories. The first, provided by the Windows API, is with FindFirstFile/FindFirstFileEx & FindNextFile. This is a C/C++ API that will, on every call, (a) fill a predefined structure with information about a single file, or (b) fail and set the last error to ERROR_NO_MORE_FILES (the terrible behavior typical of the Windows API). The second, provided by MFC, is the CFileFind class. CFileFind is nothing more than a convenience wrapper for FindFirstFile, but with some utility functions & overhead.
The biggest downside to both methods is that they require a system call for every file. Internally, FindNextFile opens a file handle and then calls NtQueryDirectoryFile with said handle. This is terribly inefficient, especially if 8dot3 name creation is enabled. Back to the documentation.
Aha! With FindFirstFileEx, we can ask for only the relevant information (with FindExInfoBasic), and even better, there’s a mysterious flag: “FIND_FIRST_EX_LARGE_FETCH”, which is described to mean: “Uses a larger buffer for directory queries, which can increase performance of the find operation.”
Of course, the MSDN documentation provides “just the facts” (as Raymond Chen describes it), and nothing about where FIND_FIRST_EX_LARGE_FETCH should be used or what performance benefit it might provide. Raymond Chen offers some theory, but no hard numbers (and some comments suggest that it makes no difference). There’s also a poorly formatted Visual Studio user suggestion, which suggests using the USN (update sequence number) journal, which is not a viable option.
Perhaps most interestingly, FIND_FIRST_EX_LARGE_FETCH is used in the Chromium codebase!
The source even mentions that it “should speed up large enumerations”, but provides NO evidence.
Sidenote: the USN journal is not viable as it stores only a log of changes. There is no structure information, and it isn’t guaranteed to hold information about every file on the drive. Reconstructing the filesystem tree from the USN journal is an extremely complex task, and I doubt it could be done quickly, if at all.
The truth is: no matter what flags you set, you will see roughly the same picture:
Nearly all of the program’s time is spent in NtOpenFile & NtQueryDirectoryFile, and a single core (I have 8 logical) is pegged at 100%.
std::async as a force multiplier
Before I get to the hard numbers, I want to mention that there’s a parallel option. God damn do I love C++11.
If using the FindFirstFile/FindNextFile method: wherever you would recursively call a function to descend into a directory, emplace_back a future ( i.e. std::async( std::launch::async | std::launch::deferred, yourFunctionHere, FunctionArgsHere )
) to that function. Right before returning, call get( )
on each future.
i.e.:
std::int64_t stdRecurseFindFutures( _In_ std::wstring dir, _In_ const bool isLargeFetch, _In_ const bool isBasicInfo ) { std::int64_t num = 0; dir.reserve( MAX_PATH ); std::wstring normSzDir(dir); normSzDir.reserve( MAX_PATH ); if ( ( dir.back( ) != L'*' ) && ( dir.at( dir.length( ) - 2 ) != L'\\' ) ) { dir += L"\\*"; } else if ( dir.back( ) == L'\\' ) { dir += L"*"; } std::vector<std::future<std::int64_t>> futureDirs; WIN32_FIND_DATA fData; HANDLE fDataHand = NULL; if ( isLargeFetch ) { if ( isBasicInfo ) { fDataHand = FindFirstFileExW( dir.c_str( ), FindExInfoBasic, &fData, FindExSearchNameMatch, NULL, FIND_FIRST_EX_LARGE_FETCH ); } else { fDataHand = FindFirstFileExW( dir.c_str( ), FindExInfoStandard, &fData, FindExSearchNameMatch, NULL, FIND_FIRST_EX_LARGE_FETCH ); } } else { if ( isBasicInfo ) { fDataHand = FindFirstFileExW( dir.c_str( ), FindExInfoBasic, &fData, FindExSearchNameMatch, NULL, 0 ); } else { fDataHand = FindFirstFileExW( dir.c_str( ), FindExInfoStandard, &fData, FindExSearchNameMatch, NULL, 0 ); } } auto res = FindNextFileW( fDataHand, &fData ); //[a lot of code omitted] if ( ( fData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY ) && ( scmpVal != 0 ) ) { futureDirs.emplace_back( std::async( std::launch::async | std::launch::deferred, descendDirectory, fData, normSzDir, isLargeFetch, isBasicInfo, true ) ); } //[more code omitted] res = FindNextFileW( fDataHand, &fData ); } } FindClose( fDataHand ); for ( auto& a : futureDirs ) { num += a.get( ); } return num; }
This code will enumerate many directories in parallel! That was easy!
Methodology
The test program I’ve written, FileFindBench, is on GitHub. All it does, is blast through a directory (given as an argument), as quickly as possible. It doesn’t bother storing any information about the files, or with maintaining state. It tries all combination of FindFirstFileEx’s options, and all options of the async version.
I’m running it against my C:\Users directory, which (for stress testing) has ~2 million files in it. My computer has a fast SSD, and I’m running this test with a warm cache.
Results
I’ve added periods to compensate for wordpress mutilating the formatting, and sorted for timing. (approx Sept. 7, 2014 )
Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.77324 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 0.789377 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.791148 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.796216 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 0.796983 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.829399 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.834139 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.836425 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.842552 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 0.906004 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.919565 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.927472 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 0.930694 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 0.93445 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 0.945352 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 0.950834 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 0.965645 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 0.971893 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.03875 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.09779 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.3805 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.38791 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.40822 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.4307 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.43829 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.46621 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.47866 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.48441 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.48614 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.48666 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.40388 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.42533 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.45934 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.49625 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.5049 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.50675 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.50794 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.50918 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.51068 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.51309 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.51632 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.51834 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.52094 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.52248 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.52307 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.52398 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.52856 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.53466 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.53918 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.54056 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.54065 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.54439 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.55298 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.55733 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.58185 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.58753 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 1, Time in seconds: 1.60224 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 1, Time in seconds: 1.61878 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.03071 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.17842 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.18279 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.19601 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.20159 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.2182 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.22085 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.22822 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.25406 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.25552 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.25908 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.2671 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.27266 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.31976 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.33403 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.37965 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.39302 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.67497 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.69628 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 3.68602 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 4.3283 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 5.15483 <span style="line-height: 1.5;">
or, only the synchronous:
Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.49625 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.5049 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.50675 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.50794 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.50918 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.51068 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.51309 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.52094 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.52248 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.52307 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.52398 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.52856 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.53918 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.54056 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.54065 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 1.55298 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.55733 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 1.58753 Iteration without FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.03071 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.17842 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.18279 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.19601 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.20159 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.2182 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.22085 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.22822 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.25406 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.25552 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.25908 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.2671 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.27266 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.31976 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.33403 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.37965 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.39302 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 2.67497 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 2.69628 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 3.68602 Iteration WITH ...FIND_FIRST_EX_LARGE_FETCH, WITH ...isBasicInfo, IsAsync?: 0, Time in seconds: 4.3283 Iteration without FIND_FIRST_EX_LARGE_FETCH, without isBasicInfo, IsAsync?: 0, Time in seconds: 5.15483
First, all of the fastest iterations ( under 1 second ), are asynchronous enumerations – this is expected. For now, ignore the async aspect, so we can focus on FIND_FIRST_EX_LARGE_FETCH and FindExInfoBasic.
FIND_FIRST_EX_LARGE_FETCH negatively impacts performance! Also note that FindExInfoBasic makes little difference; perhaps it improves worst-case timing . Now that’s an interesting result.
Try it for yourself, and post a comment summarizing your results.
Up next: A quick look into the Native API for directory enumeration.
Reblogged this on Dinesh Ram Kali..
LikeLike