[ACCEPTED]-How can I sanitize a string for use as a filename?-sanitization

Accepted answer
Score: 24

You can use PathGetCharType function, PathCleanupSpec function or the following trick:

  function IsValidFilePath(const FileName: String): Boolean;
  var
    S: String;
    I: Integer;
  begin
    Result := False;
    S := FileName;
    repeat
      I := LastDelimiter('\/', S);
      MoveFile(nil, PChar(S));
      if (GetLastError = ERROR_ALREADY_EXISTS) or
         (
           (GetFileAttributes(PChar(Copy(S, I + 1, MaxInt))) = INVALID_FILE_ATTRIBUTES)
           and
           (GetLastError=ERROR_INVALID_NAME)
         ) then
        Exit;
      if I>0 then
        S := Copy(S,1,I-1);
    until I = 0;
    Result := True;
  end;

This 6 code divides string into parts and uses 5 MoveFile to verify each part. MoveFile will 4 fail for invalid characters or reserved 3 file names (like 'COM') and return success 2 or ERROR_ALREADY_EXISTS for valid file name.


PathCleanupSpec 1 is in the Jedi Windows API under Win32API/JwaShlObj.pas

Score: 12

Regarding the question whether there is 28 any API function to sanitize a file a name 27 (or even check for its validity) - there 26 seems to be none. Quoting from the comment 25 on the PathSearchAndQualify() function:

There does not appear to be any 24 Windows API that will validate a path entered 23 by the user; this is left as an an ad hoc 22 exercise for each application.

So you can 21 only consult the rules for file name validity 20 from File Names, Paths, and Namespaces (Windows):

  • Use almost any character in the current 19 code page for a name, including Unicode 18 characters and characters in the extended 17 character set (128–255), except for the 16 following:

    • The following reserved characters are not allowed:
      < > : " / \ | ? *
    • Characters whose integer representations are in the range from zero through 31 are not allowed.
    • Any other character that the target file system does not allow.
  • Do not use the following reserved 15 device names for the name of a file: CON, PRN, AUX, NUL, COM1..COM9, LPT1..LPT9.
    Also 14 avoid these names followed immediately by 13 an extension; for example, NUL.txt is not recommended.

If 12 you know that your program will only ever 11 write to NTFS file systems you can probably 10 be sure that there are no other characters 9 that the file system does not allow, so 8 you would only have to check that the file 7 name is not too long (use the MAX_PATH constant) after 6 all invalid chars have been removed (or 5 replaced by underscores, for example).

A 4 program should also make sure that the file 3 name sanitizing has not lead to file name 2 conflicts and it silently overwrites other 1 files which ended up with the same name.

Score: 9
{
  CleanFileName
  ---------------------------------------------------------------------------

  Given an input string strip any chars that would result
  in an invalid file name.  This should just be passed the
  filename not the entire path because the slashes will be
  stripped.  The function ensures that the resulting string
  does not hae multiple spaces together and does not start
  or end with a space.  If the entire string is removed the
  result would not be a valid file name so an error is raised.

}

function CleanFileName(const InputString: string): string;
var
  i: integer;
  ResultWithSpaces: string;
begin

  ResultWithSpaces := InputString;

  for i := 1 to Length(ResultWithSpaces) do
  begin
    // These chars are invalid in file names.
    case ResultWithSpaces[i] of 
      '/', '\', ':', '*', '?', '"', '<', '>', '|', ' ', #$D, #$A, #9:
        // Use a * to indicate a duplicate space so we can remove
        // them at the end.
        {$WARNINGS OFF} // W1047 Unsafe code 'String index to var param'
        if (i > 1) and
          ((ResultWithSpaces[i - 1] = ' ') or (ResultWithSpaces[i - 1] = '*')) then
          ResultWithSpaces[i] := '*'
        else
          ResultWithSpaces[i] := ' ';

        {$WARNINGS ON}
    end;
  end;

  // A * indicates duplicate spaces.  Remove them.
  result := ReplaceStr(ResultWithSpaces, '*', '');

  // Also trim any leading or trailing spaces
  result := Trim(Result);

  if result = '' then
  begin
    raise(Exception.Create('Resulting FileName was empty Input string was: '
      + InputString));
  end;
end;

0

Score: 4

For anyone else reading this and wanting 7 to use PathCleanupSpec, I wrote this test 6 routine which seems to work... there is 5 a definate lack of examples on the 'net. You 4 need to include ShlObj.pas (not sure when 3 PathCleanupSpec was added but I tested this 2 in Delphi 2010) You will also need to check 1 for XP sp2 or higher

procedure TMainForm.btnTestClick(Sender: TObject);
var
  Path: array [0..MAX_PATH - 1] of WideChar;
  Filename: array[0..MAX_PATH - 1] of WideChar;
  ReturnValue: integer;
  DebugString: string;

begin
  StringToWideChar('a*dodgy%\filename.$&^abc',FileName, MAX_PATH);
  StringToWideChar('C:\',Path, MAX_PATH);
  ReturnValue:= PathCleanupSpec(Path,Filename);
  DebugString:= ('Cleaned up filename:'+Filename+#13+#10);
  if (ReturnValue and $80000000)=$80000000 then
    DebugString:= DebugString+'Fatal result. The cleaned path is not a valid file name'+#13+#10;
  if (ReturnValue and $00000001)=$00000001 then
    DebugString:= DebugString+'Replaced one or more invalid characters'+#13+#10;
  if (ReturnValue and $00000002)=$00000002 then
    DebugString:= DebugString+'Removed one or more invalid characters'+#13+#10;
  if (ReturnValue and $00000004)=$00000004 then
    DebugString:= DebugString+'The returned path is truncated'+#13+#10;
  if (ReturnValue and $00000008)=$00000008 then
    DebugString:= DebugString+'The input path specified at pszDir is too long to allow the formation of a valid file name from pszSpec'+#13;
  ShowMessage(DebugString);
end;
Score: 2

Well, the easy thing is to use a regex and 21 your favourite language's version of gsub to 20 replace anything that's not a "word character." This 19 character class would be "\w" in most languages 18 with Perl-like regexes, or "[A-Za-z0-9]" as a simple 17 option otherwise.

Particularly, in contrast 16 to some of the examples in other answers, you 15 don't want to look for invalid characters 14 to remove, but look for valid characters 13 to keep. If you're looking for invalid characters, you're 12 always vulnerable to the introduction of 11 new characters, but if you're looking for 10 only valid ones, you might be slightly less 9 inefficient (in that you replaced a character 8 you didn't really need to), but at least 7 you'll never be wrong.

Now, if you want to 6 make the new version as much like the old 5 as possible, you might consider replacement. Instead 4 of deleting, you can substitute a character 3 or characters you know to be ok. But doing 2 that is an interesting enough problem that 1 it's probably a good topic for another question.

Score: 2
// for all platforms (Windows\Unix), uses IOUtils.
function ReplaceInvalidFileNameChars(const aFileName: string; const aReplaceWith: Char = '_'): string;
var
  i: integer;
begin
  Result := aFileName;
  for i := Low(Result) to High(Result) do
  begin
    if not TPath.IsValidFileNameChar(Result[i]) then
      Result[i] := aReplaceWith;
  end;
end.

0

Score: 0

Try this on a modern delphi:

 use System.IOUtils;
 ...
 result := TPath.HasValidFileNameChars(FileName, False)

I allows also 2 to have german umlauts or other chars like 1 -, _,.. in a filename.

Score: 0

use this function. work fine for me the 2 purpose is to get back ONE level of directory 1 name

uses shelobj...

function  CleanDirName(DirFileName : String) : String;
var
  CheckStr : String;
  Path: array [0..MAX_PATH - 1] of WideChar;
  Filename: array[0..MAX_PATH - 1] of WideChar;
  ReturnValue: integer;

begin
  //--     The following are considered invalid characters in all names.
  //--     \ / : * ? " < > |

  CheckStr := Trim(DirFileName);
  CheckStr := StringReplace(CheckStr,'/','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'\','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'.','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,':',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'?',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'<',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'>',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'|',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'!',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'~',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'+',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'=',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,')',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'(',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'*',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'&',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'^',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'%',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'$',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'#',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'@',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'{',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'}',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'"',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,';',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,',',' ',[rfReplaceAll, rfIgnoreCase]);

  // '' become - nil
  CheckStr := StringReplace(CheckStr,'''','',[rfReplaceAll, rfIgnoreCase]);

  StringToWideChar(CheckStr,FileName, MAX_PATH);
  StringToWideChar('C:\',Path, MAX_PATH);
  ReturnValue:= PathCleanupSpec(Path,Filename);

  Filename := StringReplace(Filename,'  ',' ',[rfReplaceAll, rfIgnoreCase]);
  Result := String(Filename);
end;

More Related questions