How fast are FindFirstFile/FindFirstFileEx, and CFileFind – actually?

•January 13, 2015 • Leave a Comment

For the past several months, I’ve been working on a fork of WinDirStat, the extremely popular disk usage analysis program.

WinDirStat is fantastically useful, but also fantastically slow. A well populated drive will lock WinDirStat for anywhere from fifteen minutes to a few hours, or worse (mirror).

Way back in April (2014), I was in an Object Oriented Programming class, and I’d just started to master C++. With WinDirStat I had  more than just an interesting problem to work on, but a real challenge to take on. But enough about WinDirStat, that’s for another article.

Side note: 90% of the slowdown in WinDirStat was NOT related to actually walking a directory tree, and was fixed by swapping a few data structures. Directory walking was however, still slow.

On windows there are two obvious ways to recursively enumerate files and directories. The first, provided by the Windows API, is with FindFirstFile/FindFirstFileEx & FindNextFile. This is a C/C++ API that will, on every call, (a) fill a predefined structure with information about a single file, or (b) fail and set the last error to ERROR_NO_MORE_FILES (the terrible behavior typical of the Windows API).  The second, provided by MFC, is the CFileFind class. CFileFind is nothing more than a convenience wrapper for FindFirstFile, but with some utility functions & overhead.

The biggest downside to both methods is that they require a system call for every file. Internally, FindNextFile opens a file handle and then calls NtQueryDirectoryFile with said handle. This is terribly inefficient,  especially if  8dot3 name creation is enabled. Back to the documentation.

Like many Windows APIs, there's an "extended" version.

Like many Windows APIs, there’s an “extended” version.

Aha! With FindFirstFileEx, we can ask for only the relevant information (with FindExInfoBasic), and even better, there’s a mysterious flag: “FIND_FIRST_EX_LARGE_FETCH”, which is described to mean: “Uses a larger buffer for directory queries, which can increase performance of the find operation.”

Of course, the MSDN documentation provides “just the facts” (as Raymond Chen describes it), and nothing about where FIND_FIRST_EX_LARGE_FETCH should be used or what performance benefit it might provide. Raymond Chen offers some theory, but no hard numbers (and some comments suggest that it makes no difference). There’s also a poorly formatted Visual Studio user suggestion, which suggests using the USN (update sequence number) journal, which is not a viable option.

Perhaps most interestingly, FIND_FIRST_EX_LARGE_FETCH is used in the Chromium codebase!

Yup, Chromium is using FIND_FIRST_EX_LARGE_FETCH.

Yup, Chromium is using FIND_FIRST_EX_LARGE_FETCH.

The source even mentions that it “should speed up large enumerations”, but provides NO evidence.

Sidenote: the USN journal is not viable as it stores only a log of changes. There is no structure information, and it isn’t guaranteed to hold information about every file on the drive. Reconstructing the filesystem tree from the USN journal is an extremely complex task, and I doubt it could be done quickly, if at all.

The truth is: no matter what flags you set, you will see roughly the same picture:

Note the "Functions Doing Most Individual Work" table.

Note the “Functions Doing Most Individual Work” table.

Nearly all of the program’s time is spent in NtOpenFile & NtQueryDirectoryFile, and a single core (I have 8 logical) is pegged at 100%.

std::async as a force multiplier

Before I get to the hard numbers, I want to mention that there’s a parallel option. God damn do I love C++11.

Continue reading ‘How fast are FindFirstFile/FindFirstFileEx, and CFileFind – actually?’

The future of automotive headlamps

•January 8, 2015 • Leave a Comment

 Nearly three years ago, a team from Carnegie Mellon’s Robotics Institute, Mines ParisTech, and Texas Instruments, approached the problem of driving in the rain, with a brilliantly simple idea.

We’ve all been there – come nighttime, your headlights shine as brightly as ever, but at the rain, not the road. The road is still dark, but your eyes are adjusted for bright light.

Their idea is to simply project light precisely around the rain.

Sound hard to you? Well, it turns out that someone’s already solved the hardest part of the problem:

Continue reading ‘The future of automotive headlamps’

Upgrading & migrating pip packages, en masse

•December 16, 2014 • Leave a Comment

Upgrading, faster

Pip, the Python package management system, still lacks an easy way to update all installed packages. The “upgrade-all” ability has been in the works for nearly 4 years now.

In the meantime, many simple hacks have evolved to meet the demand. They’re all simple, and quite slow.

About six months ago I wrote a fast Python script to upgrade all local pip packages.

The idea is simple.

First:

import pip
import queue

Then, query pip for the list of installed packages:

def buildQueueOfInstalledPackages():
    distQueue = queue.Queue()
    for dist in pip.get_installed_distributions():
        distQueue.put(dist)
    return distQueue

Here is where my script gets interesting:

Continue reading ‘Upgrading & migrating pip packages, en masse’

Ambulance drone can help heart attack victims in under 2 minutes

•November 30, 2014 • Leave a Comment

Alexander Riccio:

This is beautiful :)

Originally posted on Gigaom:

Drones get a bad rap from the FAA but there’s growing evidence that more unmanned aircraft in the sky would do more good than harm. We’ve already seen how drones can save the day in search-and-rescue situations, and now a Dutch student is showing people how the devices, which can weigh under 5 pounds, could be a game-changer in medical emergencies.

Alex Momont, an engineer at the Technical University of Delft, has created an airborne defibrillator-delivery system that can reach anyone with a five-square-mile area in less than minutes. The school has posted this remarkable video showing how it works:

The so-called “Ambulance Drone” was the result of Momont’s Master thesis research. On his website, he likens the project to a medical toolbox:

[blockquote person=”” attribution=””]”The first minutes after an accident are critical and essential to provide the right care to prevent escalation. Speeding up emergency response can prevent deaths and accelerate recovery dramatically. This is notably true…

View original 160 more words

PREVIEW: How fast are FindFirstFile/FindFirstFileEx, and CFileFind – actually?

•September 28, 2014 • Leave a Comment

I have a post in the works about the performance of enumerating a directory with FindFirstFile/FindFirstFileEx, and CFileFind.  I also investigate the various performance “tricks” (more like myths) used to speed these APIs up.

 

HowFastAre_preview

(sneak peek)

 

Two key findings:

  1. They’re actually fairly – but not terribly – fast
  2. FIND_FIRST_EX_LARGE_FETCH doesn’t do what you think it does.

An INVALID_POINTER_READ_EXPLOITABLE (buffer overrun) in Notepad++

•August 17, 2014 • Leave a Comment

Earlier this week I tracked down an insidious bug in Notepad++.

Continue reading ‘An INVALID_POINTER_READ_EXPLOITABLE (buffer overrun) in Notepad++’

CrashPlan log categories

•July 1, 2014 • Leave a Comment

I’m a very happy customer of CrashPlan. Offsite backup is a critical component of any backup plan!

Without advanced¹ filesystems² like btrfs³, maintaining up-to-date backups is an arduous task. CrashPlan’s fire-and-forget nature lifts that weight from my shoulders, freeing my mind & time. Better yet, CrashPlan supports Windows & Linux.

However, like many large-scale cross-platform programs, it’s far from perfect. There are many cases where certain files fail to backup, where scanning for files slows the entire computer to a grinding halt, backups take longer than they should, file upload is not fully utilizing available bandwidth, or memory usage seems inordinate.

Fortunately, CrashPlan has a mature logging infrastructure. Code42 provides some insight on their website (mirror). If you investigate these logs, you’ll notice that they (a) are marked as a logging “level” (ERROR, WARN, INFO, DEBUG, TRACE, ALL, OFF), and (b) are categorized.

For (a), CrashPlan PROe “ADMINISTRATION CONSOLE COMMAND-LINE INTERFACE OVERVIEW“(mirror) suggests that the levels are actually [Error, Warn, Info, Fine, Trace], but I’ve never seen ‘Fine’ in the home edition.

For (b), the aforementioned document says only “The complete list of options is  available by contacting our Customer Champions.”.

Continue reading ‘CrashPlan log categories’

Goals: The Intended Outcomes of Higher Education

•June 27, 2014 • Leave a Comment

This chapter, written by Howard R. Bowen in “Foundations of American Higher Education” is a brilliant read.

Marx sought to change the world through changing social institutions, Jesus through changing the hearts of men. Higher education tries to do both.

Update: The Windows Phone app for WordPress makes no distinction between “save” and “post”. Here’s the chapter: Goals: The Intended Outcomes of Higher Education

Make VC++ Compiles Fast Through Parallel Compilation

•April 16, 2014 • Leave a Comment

Alexander Riccio:

Random ASCII always writes brilliant in-depth analyses!

Originally posted on Random ASCII:

The free lunch is over and our CPUs are not getting any faster so if you want faster builds then you have to do parallel builds. Visual Studio supports parallel compilation but it is poorly understood and often not even enabled.

I want to show how, on a humble four-core laptop, enabling parallel compilation can give an actual four-times build speed improvement. I will also show how to avoid some of the easy mistakes that can significantly reduce VC++ compile parallelism and throughput. And, as a geeky side-effect, I’ll explain some details of how VC++’s parallel compilation works.

Plus, pretty pictures.

View original 3,184 more words

“destroyed in a heartbeat”

•April 15, 2014 • Leave a Comment

I’ve recently stumbled across this slashdot article (mirrored)wherein the comments, MadX says:

*If* such a mechanism was coded in, the nature of open source would mean it would be found by others. This in turn would compromise the trust of the ENTIRE kernel. That trust can take years to build up – but be detroyed in a heartbeat.

Now that has a special irony.

Heartbleed?

“detroyed in a heartbeat”….or a heartbleed?

 
Random ASCII

Forecast for randomascii: programming, tech topics, with a chance of unicycling

0xicf

I.C.F::Israel Cyber Forces

Modern

Modern C++ for the Windows Runtime

Krypt3ia

(Greek: κρυπτεία / krupteía, from κρυπτός / kruptós, “hidden, secret things”)

Andrzej's C++ blog

Guidelines and thoughts about C++

Bromium Labs

Call of the Wild Blog

fuzzing.info

the art of unexpected input engineering

Video Encoding & Streaming Technologies

Fabio Sonnati on video delivery and encoding

Freedom Embedded

Balau's technical blog on open hardware, free software and security

Paolo Bernardi

Paolo Bernardi, ramblings and notes (and Crypto-Gram for ebook readers)

The Embedded Code

Designing From Scratch

Bughira's Weblog

There is no such thing as closed source software...the processor sees every instruction, and so does the reverse engineer...

mov ah, 9<br>mov dx, hello_world_msg<br>int 21h

Just another WordPress.com weblog

Running the Gauntlet

Tank and Siko's Security Blog

mbrownnyc

so watch me do the funky dead butterfly.

The ThreatSTOP Blog

Stop Botnets Stealing from you

clevomods

home of the Custom light controller and LightFX library

Naked Security

Computer Security News, Advice and Research

root labs rdist

Embedded security, crypto, software protection

Biosingularity

Advances in biological systems.

Strategic Cyber LLC

A blog about Armitage, Cobalt Strike, and Red Teaming

Assumption Parish Police Jury

http://assumptionla.com/

Alexander Riccio

"Change the world or go home" -Microsoft Employee Slogan

Liquid Metals Project

Stuff that never made it into the paper.

We Are Made In NY

Learn, Launch and Find a Job in NYC Tech

Home Awesomation

It all started when I wanted to turn my fireplace on from my TV remote...

Mind Hacks

Neuroscience and psychology news and views.

tronixstuff

fun and learning with electronics

Cedar's Digest

Cognitive science, perception, teaching and ed reform

Walking Randomly

"Change the world or go home" -Microsoft Employee Slogan

"Change the world or go home" -Microsoft Employee Slogan

"Change the world or go home" -Microsoft Employee Slogan

Ken Shirriff's blog

"Change the world or go home" -Microsoft Employee Slogan

YouTube Blog

"Change the world or go home" -Microsoft Employee Slogan

Google Testing Blog

"Change the world or go home" -Microsoft Employee Slogan

Google Student Blog

"Change the world or go home" -Microsoft Employee Slogan

Google Research Blog

"Change the world or go home" -Microsoft Employee Slogan

"Change the world or go home" -Microsoft Employee Slogan

Google Public Policy Blog

"Change the world or go home" -Microsoft Employee Slogan

Google Open Source Blog

"Change the world or go home" -Microsoft Employee Slogan

Google Online Security Blog

"Change the world or go home" -Microsoft Employee Slogan

Official Google for Work Blog

"Change the world or go home" -Microsoft Employee Slogan

Gmail Blog

"Change the world or go home" -Microsoft Employee Slogan

Follow

Get every new post delivered to your Inbox.

Join 984 other followers