- DelphiTools - https://www.delphitools.info -

Efficient File Enumeration

MS-DOS_iconDelphi offers two ways of enumerating files in a directory and its sub-directories, the first is the classic (and buggy) FindFirst/FindNext, the second is IOUtils TDirectory.GetFiles and not very efficient.

Here is why and how I implemented DWScript [1]‘s dwsXPlatform.CollectFiles, and a tip about getting a small system-wide boost as a bonus.

8dot3 file naming

The old 8dot3 naming [2] convention dating back to the DOS ancestry of Windows has been obsolete for a while, but it’s still likely to cost you time… or trouble.

It affects both Delphi methods negatively, because of the underlying Windows API function they use (FindFirstFile [3]) is obsolete as well, and obsolete in two ways:

In the case of FindFirst, it means that if you search for ‘*.dpr’, you’ll get .dproj files as well.

TDirectory.GetFiles solves the filtering by doing it Delphi-side with TMask from the Masks unit. TMask uses a quite efficiently implemented state machine, but IOUtils invokes it through the MatchesMask function, which creates and destroys a TMask every single time…

IOUtils internal logic is also quite complex and heavy-weight (with anonymous procedures, implicit exception frames, implicit conversion and generally redundant code), and the GetFiles implementation doesn’t scale well as it relies on a dynamic array as return value (FastMM mitigates the issue, but not entirely).

So in practice, if you’ve got a fast SSD or if everything is in the Windows file system memory cache, IOUtils will be the bottleneck, not the file system.

Next: Getting around the 8dot3 names [4]

Previous: 8dot3 file naming. [5]

Getting around the 8dot3 names

To avoid the 8dot3 names overhead, the Windows API function to use is FindFirstFileEx [6], available since WinXP and Win2003, which allows to specify that you don’t care about 8dot3 names through the FindExInfoBasic option.

Note that this won’t solve the filtering issue, so Delphi-side masking is still necessary.

Jose_Barretto_8dot3_naming [7]Interestingly enough you can also get rid of 8dot3 names at the volume level, Jose Barreto’s blog post [7] describes the process.

This won’t just speed up file enumeration, but will drastically speedup other file system operations like creating or moving lots of files.

Whether the OS will generate 8dot3 names is defined by both a system (registry) and a volume setting. To check if it’s active on a volume you can use the FSUTIL command:

FSUTIL.EXE 8dot3name query D:

it’ll tell you if the setting is active in the registry and for the volume. You can turn it off with

FSUTIL.EXE 8dot3name set D: 1

This will only affect new files. For exitsint 8dot3 file name aliases, you can strip them with

FSUTIL.EXE 8dot3name strip /s /v D:\

That will make your volume incompatible with prehistoric software though, and if by default the newer server versions of Windows have 8dot3 names off by default for new volumes, you can’t rely on them being off in the wild.

Final remarks

FindFirstFileEx also supports FIND_FIRST_EX_LARGE_FETCH option, which is described in the WinAPI documentation as increasing performance when there are many files. But in my testing, I couldn’t find any case in which it was beneficial, and it even decreased performance when there were few files to be enumerated.

Another option I investigated was FindExSearchLimitToDirectories, which is said to be an advisory flag to only enumerate directories, it’s said to work only on some file systems, but I couldn’t find any on which it did work.

When all is said and done, I’ve found dwsXPlatform [8].CollectFiles can be from two to ten times faster than the XE version of TDirectory.GetFiles. The lower ratio being when everything is already in cache and CPU is the limiting factor, and the higher ratio being on busy volumes where 8dot3 names are active.