[ACCEPTED]-Weird backslash substitution in Ruby-backslash

Accepted answer
Score: 73

Quick Answer

If you want to sidestep all this confusion, use the much less confusing block syntax. Here 51 is an example that replaces each backslash 50 with 2 backslashes:

"some\\path".gsub('\\') { '\\\\' }

Gruesome Details

The problem is that when 49 using sub (and gsub), without a block, ruby interprets 48 special character sequences in the replacement parameter. Unfortunately, sub uses 47 the backslash as the escape character for 46 these:

\& (the entire regex)
\+ (the last group)
\` (pre-match string)
\' (post-match string)
\0 (same as \&)
\1 (first captured group)
\2 (second captured group)
\\ (a backslash)

Like any escaping, this creates an 45 obvious problem. If you want include the 44 literal value of one of the above sequences 43 (e.g. \1) in the output string you have to 42 escape it. So, to get Hello \1, you need the replacement 41 string to be Hello \\1. And to represent this as 40 a string literal in Ruby, you have to escape 39 those backslashes again like this: "Hello \\\\1"

So, there 38 are two different escaping passes. The first one takes the string literal 37 and creates the internal string value. The 36 second takes that internal string value 35 and replaces the sequences above with the 34 matching data.

If a backslash is not followed 33 by a character that matches one of the above 32 sequences, then the backslash (and character 31 that follows) will pass through unaltered. This 30 is also affects a backslash at the end of 29 the string -- it will pass through unaltered. It's 28 easiest to see this logic in the rubinius 27 code; just look for the to_sub_replacement method in the String class.

Here 26 are some examples of how String#sub is parsing the replacement 25 string:

  • 1 backslash \ (which has a string literal of 24 "\\")

    Passes through unaltered because the backslash 23 is at the end of the string and has no characters 22 after it.

    Result: \

  • 2 backslashes \\ (which have a string literal 21 of "\\\\")

    The pair of backslashes match the escaped 20 backslash sequence (see \\ above) and gets 19 converted into a single backslash.

    Result: \

  • 3 backslashes \\\ (which 18 have a string literal of "\\\\\\")

    The first two 17 backslashes match the \\ sequence and get 16 converted to a single backslash. Then the 15 final backslash is at the end of the string 14 so it passes through unaltered.

    Result: \\

  • 4 backslashes \\\\ (which 13 have a string literal of "\\\\\\\\")

    Two pairs of backslashes 12 each match the \\ sequence and get converted 11 to a single backslash.

    Result: \\

  • 2 backslashes with character in the middle \a\ (which have a 10 string literal of "\\a\\")

    The \a does not match 9 any of the escape sequences so it is allowed 8 to pass through unaltered. The trailing 7 backslash is also allowed through.

    Result: \a\

    Note: The 6 same result could be obtained from: \\a\\ (with 5 the literal string: "\\\\a\\\\")

In hindsight, this 4 could have been less confusing if String#sub had used 3 a different escape character. Then there 2 wouldn't be the need to double escape all 1 the backslashes.

Score: 18

This is an issue because backslash (\) serves 10 as an escape character for Regexps and Strings. You 9 could do use the special variable \& to 8 reduce the number backslashes in the gsub 7 replacement string.

foo.gsub(/\\/,'\&\&\&') #for some string foo replace each \ with \\\

EDIT: I should mention 6 that the value of \& is from a Regexp 5 match, in this case a single backslash.

Also, I 4 thought that there was a special way to 3 create a string that disabled the escape 2 character, but apparently not. None of these 1 will produce two slashes:

puts "\\"
puts '\\'
puts %q{\\}
puts %Q{\\}
puts """\\"""
puts '''\\'''
puts <<EOF
\\
EOF  
Score: 4

argh, right after I typed all this out, I 7 realised that \ is used to refer to groups 6 in the replacement string. I guess this 5 means that you need a literal \\ in the replacement 4 string to get one replaced \. To get a literal 3 \\ you need four \s, so to replace one with 2 two you actually need eight(!).

# Double every occurrence of \. There's eight backslashes on the right there!
>> puts '\\'.sub(/\\/, '\\\\\\\\')

anything 1 I'm missing? any more efficient ways?

Score: 4

Clearing up a little confusion on the author's 4 second line of code.

You said:

>> puts '\\ <- 2x a, because 2 backslashes get replaced'.sub(/\\/, 'aa')
# aa <- 2x a, because two backslashes get replaced

2 backslashes 3 aren't getting replaced here. You're replacing 2 1 escaped backslash with two a's ('aa'). That is, if 1 you used .sub(/\\/, 'a'), you would only see one 'a'

'\\'.sub(/\\/, 'anything') #=> anything
Score: 2

the pickaxe book mentions this exact problem, actually. here's 2 another alternative (from page 130 of the 1 latest edition)

str = 'a\b\c'               # => "a\b\c"
str.gsub(/\\/) { '\\\\' }   # => "a\\b\\c"

More Related questions