[ACCEPTED]-Is there a faster way to scan through a directory recursively in .NET?-filesystems

Accepted answer
Score: 42

This implementation, which needs a bit of 2 tweaking is 5-10X faster.

    static List<Info> RecursiveScan2(string directory) {
        IntPtr INVALID_HANDLE_VALUE = new IntPtr(-1);
        WIN32_FIND_DATAW findData;
        IntPtr findHandle = INVALID_HANDLE_VALUE;

        var info = new List<Info>();
        try {
            findHandle = FindFirstFileW(directory + @"\*", out findData);
            if (findHandle != INVALID_HANDLE_VALUE) {

                do {
                    if (findData.cFileName == "." || findData.cFileName == "..") continue;

                    string fullpath = directory + (directory.EndsWith("\\") ? "" : "\\") + findData.cFileName;

                    bool isDir = false;

                    if ((findData.dwFileAttributes & FileAttributes.Directory) != 0) {
                        isDir = true;
                        info.AddRange(RecursiveScan2(fullpath));
                    }

                    info.Add(new Info()
                    {
                        CreatedDate = findData.ftCreationTime.ToDateTime(),
                        ModifiedDate = findData.ftLastWriteTime.ToDateTime(),
                        IsDirectory = isDir,
                        Path = fullpath
                    });
                }
                while (FindNextFile(findHandle, out findData));

            }
        } finally {
            if (findHandle != INVALID_HANDLE_VALUE) FindClose(findHandle);
        }
        return info;
    }

extension method:

 public static class FILETIMEExtensions {
        public static DateTime ToDateTime(this System.Runtime.InteropServices.ComTypes.FILETIME filetime ) {
            long highBits = filetime.dwHighDateTime;
            highBits = highBits << 32;
            return DateTime.FromFileTimeUtc(highBits + (long)filetime.dwLowDateTime);
        }
    }

interop 1 defs are:

    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    public static extern IntPtr FindFirstFileW(string lpFileName, out WIN32_FIND_DATAW lpFindFileData);

    [DllImport("kernel32.dll", CharSet = CharSet.Unicode)]
    public static extern bool FindNextFile(IntPtr hFindFile, out WIN32_FIND_DATAW lpFindFileData);

    [DllImport("kernel32.dll")]
    public static extern bool FindClose(IntPtr hFindFile);

    [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
    public struct WIN32_FIND_DATAW {
        public FileAttributes dwFileAttributes;
        internal System.Runtime.InteropServices.ComTypes.FILETIME ftCreationTime;
        internal System.Runtime.InteropServices.ComTypes.FILETIME ftLastAccessTime;
        internal System.Runtime.InteropServices.ComTypes.FILETIME ftLastWriteTime;
        public int nFileSizeHigh;
        public int nFileSizeLow;
        public int dwReserved0;
        public int dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
        public string cAlternateFileName;
    }
Score: 11

There is a long history of the .NET file 26 enumeration methods being slow. The issue 25 is there is not an instantaneous way of 24 enumerating large directory structures. Even 23 the accepted answer here has it's issues 22 with GC allocations.

The best I've been able 21 do is wrapped up in my library and exposed 20 as the FileFile (source) class in the CSharpTest.Net.IO namespace. This class can 19 enumerate files and folders without unneeded 18 GC allocations and string marshaling.

The 17 usage is simple enough, and the RaiseOnAccessDenied 16 property will skip the directories and files 15 the user does not have access to:

    private static long SizeOf(string directory)
    {
        var fcounter = new CSharpTest.Net.IO.FindFile(directory, "*", true, true, true);
        fcounter.RaiseOnAccessDenied = false;

        long size = 0, total = 0;
        fcounter.FileFound +=
            (o, e) =>
            {
                if (!e.IsDirectory)
                {
                    Interlocked.Increment(ref total);
                    size += e.Length;
                }
            };

        Stopwatch sw = Stopwatch.StartNew();
        fcounter.Find();
        Console.WriteLine("Enumerated {0:n0} files totaling {1:n0} bytes in {2:n3} seconds.",
                          total, size, sw.Elapsed.TotalSeconds);
        return size;
    }

For my 14 local C:\ drive this outputs the following:

Enumerated 13 810,046 files totaling 307,707,792,662 bytes 12 in 232.876 seconds.

Your mileage may vary 11 by drive speed, but this is the fastest 10 method I've found of enumerating files in 9 managed code. The event parameter is a 8 mutating class of type FindFile.FileFoundEventArgs so be sure you do 7 not keep a reference to it as it's values 6 will change for each event raised.

You might 5 also note that the DateTime's exposed are 4 only in UTC. The reason is that the conversion 3 to local time is semi-expensive. You might 2 consider using UTC times to improve performance 1 rather than converting these to local time.

Score: 5

Depending on how much time you're trying 18 to shave off the function, it may be worth 17 your while to call the Win32 API functions 16 directly, since the existing API does a 15 lot of extra processing to check things 14 that you may not be interested in.

If you 13 haven't done so already, and assuming you 12 don't intend to contribute to the Mono project, I 11 would strongly recommend downloading Reflector and 10 having a look at how Microsoft implemented 9 the API calls you're currently using. This 8 will give you an idea of what you need to 7 call and what you can leave out.

You might, for 6 example, opt to create an iterator that 5 yields directory names instead of a function 4 that returns a list, that way you don't 3 end up iterating over the same list of names 2 two or three times through all the various 1 levels of code.

Score: 4

I just ran across this. Nice implementation 24 of the native version.

This version, while 23 still slower than the version that uses 22 FindFirst and FindNext, is quite a bit faster than the your 21 original .NET version.

    static List<Info> RecursiveMovieFolderScan(string path)
    {
        var info = new List<Info>();
        var dirInfo = new DirectoryInfo(path);
        foreach (var entry in dirInfo.GetFileSystemInfos())
        {
            bool isDir = (entry.Attributes & FileAttributes.Directory) != 0;
            if (isDir)
            {
                info.AddRange(RecursiveMovieFolderScan(entry.FullName));
            }
            info.Add(new Info()
            {
                IsDirectory = isDir,
                CreatedDate = entry.CreationTimeUtc,
                ModifiedDate = entry.LastWriteTimeUtc,
                Path = entry.FullName
            });
        }
        return info;
    }

It should produce 20 the same output as your native version. My 19 testing shows that this version takes about 18 1.7 times as long as the version that uses 17 FindFirst and FindNext. Timings obtained in release mode 16 running without the debugger attached.

Curiously, changing 15 the GetFileSystemInfos to EnumerateFileSystemInfos adds about 5% to the running time 14 in my tests. I rather expected it to run 13 at the same speed or possibly faster because 12 it didn't have to create the array of FileSystemInfo objects.

The 11 following code is shorter still, because 10 it lets the Framework take care of recursion. But 9 it's a good 15% to 20% slower than the version 8 above.

    static List<Info> RecursiveScan3(string path)
    {
        var info = new List<Info>();

        var dirInfo = new DirectoryInfo(path);
        foreach (var entry in dirInfo.EnumerateFileSystemInfos("*", SearchOption.AllDirectories))
        {
            info.Add(new Info()
            {
                IsDirectory = (entry.Attributes & FileAttributes.Directory) != 0,
                CreatedDate = entry.CreationTimeUtc,
                ModifiedDate = entry.LastWriteTimeUtc,
                Path = entry.FullName
            });
        }
        return info;
    }

Again, if you change that to GetFileSystemInfos, it 7 will be slightly (but only slightly) faster.

For 6 my purposes, the first solution above is 5 quite fast enough. The native version runs 4 in about 1.6 seconds. The version that uses 3 DirectoryInfo runs in about 2.9 seconds. I suppose if 2 I were running these scans very frequently, I'd 1 change my mind.

Score: 2

Its pretty shallow, 371 dirs with average 17 of 10 files in each directory. some dirs 16 contain other sub dirs

This is just a comment, but 15 your numbers do appear to be quite high. I 14 ran the below using essentially the same 13 recursive method you are using and my times 12 are far lower despite creating string output.

    public void RecurseTest(DirectoryInfo dirInfo, 
                            StringBuilder sb, 
                            int depth)
    {
        _dirCounter++;
        if (depth > _maxDepth)
            _maxDepth = depth;

        var array = dirInfo.GetFileSystemInfos();
        foreach (var item in array)
        {
            sb.Append(item.FullName);
            if (item is DirectoryInfo)
            {
                sb.Append(" (D)");
                sb.AppendLine();

                RecurseTest(item as DirectoryInfo, sb, depth+1);
            }
            else
            { _fileCounter++; }

            sb.AppendLine();
        }
    }

I 11 ran the above code on a number of different 10 directories. On my machine the 2nd call 9 to scan a directory tree was usually faster 8 due to caching either by the runtime or 7 the file system. Note that this system isn't 6 anything too special, just a 1yr old development 5 workstation.

// cached call
Dirs = 150, files = 420, max depth = 5
Time taken = 53 milliseconds

// cached call
Dirs = 1117, files = 9076, max depth = 11
Time taken = 433 milliseconds

// first call
Dirs = 1052, files = 5903, max depth = 12
Time taken = 11921 milliseconds

// first call
Dirs = 793, files = 10748, max depth = 10
Time taken = 5433 milliseconds (2nd run 363 milliseconds)

Concerned that I wasn't getting 4 the create and modified date, the code was 3 modified to output this as well with the 2 following times.

// now grabbing last update and creation time.
Dirs = 150, files = 420, max depth = 5
Time taken = 103 milliseconds (2nd run 93 milliseconds)

Dirs = 1117, files = 9076, max depth = 11
Time taken = 992 milliseconds (2nd run 984 milliseconds)

Dirs = 793, files = 10748, max depth = 10
Time taken = 1382 milliseconds (2nd run 735 milliseconds)

Dirs = 1052, files = 5903, max depth = 12
Time taken = 936 milliseconds (2nd run 595 milliseconds)

Note: System.Diagnostics.StopWatch 1 class used for timing.

Score: 1

I'd use or base myself on this multi-threaded 1 library: http://www.codeproject.com/KB/files/FileFind.aspx

Score: 1

I recently (2020) discovered this post because 19 of a need to count files and directories 18 across slow connections, and this was the 17 fastest implementation I could come up with. The 16 .NET enumeration methods (GetFiles(), GetDirectories()) perform 15 a lot of under-the-hood work that slows 14 them down tremendously by comparison.

This 13 solution utilizes the Win32 API and .NET's 12 Parallel.ForEach() to leverage the threadpool 11 to maximize performance.

P/Invoke:

/// <summary>
/// https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstfilew
/// </summary>
[DllImport("kernel32.dll", SetLastError = true)]
public static extern IntPtr FindFirstFile(
    string lpFileName,
    ref WIN32_FIND_DATA lpFindFileData
    );

/// <summary>
/// https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findnextfilew
/// </summary>
[DllImport("kernel32.dll", SetLastError = true)]
public static extern bool FindNextFile(
    IntPtr hFindFile,
    ref WIN32_FIND_DATA lpFindFileData
    );

/// <summary>
/// https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findclose
/// </summary>
[DllImport("kernel32.dll", SetLastError = true)]
public static extern bool FindClose(
    IntPtr hFindFile
    );

Method:

public static Tuple<long, long> CountFilesDirectories(
    string path,
    CancellationToken token
    )
{
    if (String.IsNullOrWhiteSpace(path))
        throw new ArgumentNullException("path", "The provided path is NULL or empty.");

    // If the provided path doesn't end in a backslash, append one.
    if (path.Last() != '\\')
        path += '\\';

    IntPtr hFile = IntPtr.Zero;
    Win32.Kernel32.WIN32_FIND_DATA fd = new Win32.Kernel32.WIN32_FIND_DATA();

    long files = 0;
    long dirs = 0;

    try
    {
        hFile = Win32.Kernel32.FindFirstFile(
            path + "*", // Discover all files/folders by ending a directory with "*", e.g. "X:\*".
            ref fd
            );

        // If we encounter an error, or there are no files/directories, we return no entries.
        if (hFile.ToInt64() == -1)
            return Tuple.Create<long, long>(0, 0);

        //
        // Find (and count) each file/directory, then iterate through each directory in parallel to maximize performance.
        //

        List<string> directories = new List<string>();

        do
        {
            // If a directory (and not a Reparse Point), and the name is not "." or ".." which exist as concepts in the file system,
            // count the directory and add it to a list so we can iterate over it in parallel later on to maximize performance.
            if ((fd.dwFileAttributes & FileAttributes.Directory) != 0 &&
                (fd.dwFileAttributes & FileAttributes.ReparsePoint) == 0 &&
                fd.cFileName != "." && fd.cFileName != "..")
            {
                directories.Add(System.IO.Path.Combine(path, fd.cFileName));
                dirs++;
            }
            // Otherwise, if this is a file ("archive"), increment the file count.
            else if ((fd.dwFileAttributes & FileAttributes.Archive) != 0)
            {
                files++;
            }
        }
        while (Win32.Kernel32.FindNextFile(hFile, ref fd));

        // Iterate over each discovered directory in parallel to maximize file/directory counting performance,
        // calling itself recursively to traverse each directory completely.
        Parallel.ForEach(
            directories,
            new ParallelOptions()
            {
                CancellationToken = token
            },
            directory =>
            {
                var count = CountFilesDirectories(
                    directory,
                    token
                    );

                lock (directories)
                {
                    files += count.Item1;
                    dirs += count.Item2;
                }
            });
    }
    catch (Exception)
    {
        // Handle as desired.
    }
    finally
    {
        if (hFile.ToInt64() != 0)
            Win32.Kernel32.FindClose(hFile);
    }

    return Tuple.Create<long, long>(files, dirs);
}

On 10 my local system, the performance of GetFiles()/GetDirectories() can 9 be close to this, but across slower connections 8 (VPNs, etc.) I found that this is tremendously 7 faster—45 minutes vs. 90 seconds to access 6 a remote directory of ~40k files, ~40 GB 5 in size.

This can also fairly easily be modified 4 to include other data, like the total file 3 size of all files counted, or rapidly recursing 2 through and deleting empty directories, starting 1 at the furthest branch.

Score: 0

try this (i.e. do the initialization first, and 2 then reuse your list and your directoryInfo 1 objects):

  static List<Info> RecursiveMovieFolderScan1() {
      var info = new List<Info>();
      var dirInfo = new DirectoryInfo(path);
      RecursiveMovieFolderScan(dirInfo, info);
      return info;
  } 

  static List<Info> RecursiveMovieFolderScan(DirectoryInfo dirInfo, List<Info> info){

    foreach (var dir in dirInfo.GetDirectories()) {

        info.Add(new Info() {
            IsDirectory = true,
            CreatedDate = dir.CreationTimeUtc,
            ModifiedDate = dir.LastWriteTimeUtc,
            Path = dir.FullName
        });

        RecursiveMovieFolderScan(dir, info);
    }

    foreach (var file in dirInfo.GetFiles()) {
        info.Add(new Info()
        {
            IsDirectory = false,
            CreatedDate = file.CreationTimeUtc,
            ModifiedDate = file.LastWriteTimeUtc,
            Path = file.FullName
        });
    }

    return info; 
}
Score: 0

Recently I have the same question, I think 13 it is also good to output all folders and 12 files into a text file, and then use streamreader 11 to read the text file, do what you want 10 to process with multi-thread.

cmd.exe /u /c dir "M:\" /s /b >"c:\flist1.txt"

[update] Hi 9 Moby, you are correct. My approach is slower 8 due to overhead of reading back the output 7 text file. Actually I took some time to 6 test the top answer and cmd.exe with 2 million 5 files.

The top answer: 2010100 files, time: 53023
cmd.exe method: 2010100 files, cmd time: 64907, scan output file time: 19832.

The top answer method(53023) is faster 4 than cmd.exe(64907), not to mention how 3 to improve reading output text file. Although 2 my original point is to provide a not-too-bad 1 answer, still feel sorry, ha.

More Related questions