[ACCEPTED]-C# How to replace Microsoft's Smart Quotes with straight quotation marks?-smart-quotes
A more extensive listing of problematic 1 word characters
if (buffer.IndexOf('\u2013') > -1) buffer = buffer.Replace('\u2013', '-');
if (buffer.IndexOf('\u2014') > -1) buffer = buffer.Replace('\u2014', '-');
if (buffer.IndexOf('\u2015') > -1) buffer = buffer.Replace('\u2015', '-');
if (buffer.IndexOf('\u2017') > -1) buffer = buffer.Replace('\u2017', '_');
if (buffer.IndexOf('\u2018') > -1) buffer = buffer.Replace('\u2018', '\'');
if (buffer.IndexOf('\u2019') > -1) buffer = buffer.Replace('\u2019', '\'');
if (buffer.IndexOf('\u201a') > -1) buffer = buffer.Replace('\u201a', ',');
if (buffer.IndexOf('\u201b') > -1) buffer = buffer.Replace('\u201b', '\'');
if (buffer.IndexOf('\u201c') > -1) buffer = buffer.Replace('\u201c', '\"');
if (buffer.IndexOf('\u201d') > -1) buffer = buffer.Replace('\u201d', '\"');
if (buffer.IndexOf('\u201e') > -1) buffer = buffer.Replace('\u201e', '\"');
if (buffer.IndexOf('\u2026') > -1) buffer = buffer.Replace("\u2026", "...");
if (buffer.IndexOf('\u2032') > -1) buffer = buffer.Replace('\u2032', '\'');
if (buffer.IndexOf('\u2033') > -1) buffer = buffer.Replace('\u2033', '\"');
When I encountered this problem I wrote 3 an extension method to the String class 2 in C#.
public static class StringExtensions
{
public static string StripIncompatableQuotes(this string s)
{
if (!string.IsNullOrEmpty(s))
return s.Replace('\u2018', '\'').Replace('\u2019', '\'').Replace('\u201c', '\"').Replace('\u201d', '\"');
else
return s;
}
}
This simply replaces the silly 'smart 1 quotes' with normal quotes.
[EDIT] Fixed to also support replacement of 'double smart quotes'.
To extend on Nick van Esch's popular answer, here 2 is the code with the names of the characters 1 in the comments.
if (buffer.IndexOf('\u2013') > -1) buffer = buffer.Replace('\u2013', '-'); // en dash
if (buffer.IndexOf('\u2014') > -1) buffer = buffer.Replace('\u2014', '-'); // em dash
if (buffer.IndexOf('\u2015') > -1) buffer = buffer.Replace('\u2015', '-'); // horizontal bar
if (buffer.IndexOf('\u2017') > -1) buffer = buffer.Replace('\u2017', '_'); // double low line
if (buffer.IndexOf('\u2018') > -1) buffer = buffer.Replace('\u2018', '\''); // left single quotation mark
if (buffer.IndexOf('\u2019') > -1) buffer = buffer.Replace('\u2019', '\''); // right single quotation mark
if (buffer.IndexOf('\u201a') > -1) buffer = buffer.Replace('\u201a', ','); // single low-9 quotation mark
if (buffer.IndexOf('\u201b') > -1) buffer = buffer.Replace('\u201b', '\''); // single high-reversed-9 quotation mark
if (buffer.IndexOf('\u201c') > -1) buffer = buffer.Replace('\u201c', '\"'); // left double quotation mark
if (buffer.IndexOf('\u201d') > -1) buffer = buffer.Replace('\u201d', '\"'); // right double quotation mark
if (buffer.IndexOf('\u201e') > -1) buffer = buffer.Replace('\u201e', '\"'); // double low-9 quotation mark
if (buffer.IndexOf('\u2026') > -1) buffer = buffer.Replace("\u2026", "..."); // horizontal ellipsis
if (buffer.IndexOf('\u2032') > -1) buffer = buffer.Replace('\u2032', '\''); // prime
if (buffer.IndexOf('\u2033') > -1) buffer = buffer.Replace('\u2033', '\"'); // double prime
Note that what you have is inherently a 15 corrupt CSV file. Indiscriminately replacing 14 all typographer's quotes with straight quotes 13 won't necessarily fix your file. For all 12 you know, some of the typographer's quotes 11 were supposed to be there, as part of a 10 field's value. Replacing them with straight 9 quotes might not leave you with a valid 8 CSV file, either.
I don't think there is 7 an algorithmic way to fix a file that is 6 corrupt in the way you describe. Your time 5 might be better spent investigating how 4 you come to have such invalid files in the 3 first place, and then putting a stop to 2 it. Is someone using Word to edit your data 1 files, for instance?
According to the Character Map application 4 that comes with Windows, the Unicode values 3 for the curly quotes are 0x201c and 0x201d. Replace 2 those values with the straight quote 0x0022, and 1 you should be good to go.
String.Replace(0x201c, '"');
String.Replace(0x201d, '"');
I have a whole great big... program... that 4 does precisely this. You can rip out the 3 script and use it at your leasure. It does 2 all sorts of replacements, and is located 1 at http://bitbucket.org/nesteruk/typografix
The VB equivalent of what @Matthew wrote:
Public Module StringExtensions
<Extension()>
Public Function StripIncompatableQuotes(BadString As String) As String
If Not String.IsNullOrEmpty(BadString) Then
Return BadString.Replace(ChrW(&H2018), "'").Replace(ChrW(&H2019), "'").Replace(ChrW(&H201C), """").Replace(ChrW(&H201D), """")
Else
Return BadString
End If
End Function
End Module
0
Using Nick and Barbara's answers, here is 3 example code with performance stats for 2 1,000,000 loops on my machine:
input = "shmB6BhLe0gdGU8OxYykZ21vuxLjBo5I1ZTJjxWfyRTTlqQlgz0yUtPu8iNCCcsx78EPsObiPkCpRT8nqRtvM3Bku1f9nStmigaw";
input.Replace('\u2013', '-'); // en dash
input.Replace('\u2014', '-'); // em dash
input.Replace('\u2015', '-'); // horizontal bar
input.Replace('\u2017', '_'); // double low line
input.Replace('\u2018', '\''); // left single quotation mark
input.Replace('\u2019', '\''); // right single quotation mark
input.Replace('\u201a', ','); // single low-9 quotation mark
input.Replace('\u201b', '\''); // single high-reversed-9 quotation mark
input.Replace('\u201c', '\"'); // left double quotation mark
input.Replace('\u201d', '\"'); // right double quotation mark
input.Replace('\u201e', '\"'); // double low-9 quotation mark
input.Replace("\u2026", "..."); // horizontal ellipsis
input.Replace('\u2032', '\''); // prime
input.Replace('\u2033', '\"'); // double prime
Time: 958.1011 1 milliseconds
input = "shmB6BhLe0gdGU8OxYykZ21vuxLjBo5I1ZTJjxWfyRTTlqQlgz0yUtPu8iNCCcsx78EPsObiPkCpRT8nqRtvM3Bku1f9nStmigaw";
var inputArray = input.ToCharArray();
for (int i = 0; i < inputArray.Length; i++)
{
switch (inputArray[i])
{
case '\u2013':
inputArray[i] = '-';
break;
// en dash
case '\u2014':
inputArray[i] = '-';
break;
// em dash
case '\u2015':
inputArray[i] = '-';
break;
// horizontal bar
case '\u2017':
inputArray[i] = '_';
break;
// double low line
case '\u2018':
inputArray[i] = '\'';
break;
// left single quotation mark
case '\u2019':
inputArray[i] = '\'';
break;
// right single quotation mark
case '\u201a':
inputArray[i] = ',';
break;
// single low-9 quotation mark
case '\u201b':
inputArray[i] = '\'';
break;
// single high-reversed-9 quotation mark
case '\u201c':
inputArray[i] = '\"';
break;
// left double quotation mark
case '\u201d':
inputArray[i] = '\"';
break;
// right double quotation mark
case '\u201e':
inputArray[i] = '\"';
break;
// double low-9 quotation mark
case '\u2026':
inputArray[i] = '.';
break;
// horizontal ellipsis
case '\u2032':
inputArray[i] = '\'';
break;
// prime
case '\u2033':
inputArray[i] = '\"';
break;
// double prime
}
}
input = new string(inputArray);
Time: 362.0858 milliseconds
Try this for smart single quotes if the 2 above don't work:
string.Replace("\342\200\230", "'")
string.Replace("\342\200\231", "'")
Try this as well for smart 1 double quotes:
string.Replace("\342\200\234", '"')
string.Replace("\342\200\235", '"')
I also have a program which does this, the 8 source is in this file of CP-1252 Fixer. It additionally defines 7 some mappings for converting characters 6 within RTF strings whilst preserving all 5 formatting, which may be useful to some.
It 4 is also a complete mapping of all "smart 3 quote" characters to their low-ascii 2 counterparts, entity codes and character 1 references.
just chiming in, I had done this with Regex 3 replace just to handle a few at a time based 2 on what I'm replacing them with:
public static string ReplaceWordChars(this string text)
{
var s = text;
// smart single quotes and apostrophe, single low-9 quotation mark, single high-reversed-9 quotation mark, prime
s = Regex.Replace(s, "[\u2018\u2019\u201A\u201B\u2032]", "'");
// smart double quotes, double prime
s = Regex.Replace(s, "[\u201C\u201D\u201E\u2033]", "\"");
// ellipsis
s = Regex.Replace(s, "\u2026", "...");
// em dashes
s = Regex.Replace(s, "[\u2013\u2014]", "-");
// horizontal bar
s = Regex.Replace(s, "\u2015", "-");
// double low line
s = Regex.Replace(s, "\u2017", "-");
// circumflex
s = Regex.Replace(s, "\u02C6", "^");
// open angle bracket
s = Regex.Replace(s, "\u2039", "<");
// close angle bracket
s = Regex.Replace(s, "\u203A", ">");
// weird tilde and nonblocking space
s = Regex.Replace(s, "[\u02DC\u00A0]", " ");
// half
s = Regex.Replace(s, "[\u00BD]", "1/2");
// quarter
s = Regex.Replace(s, "[\u00BC]", "1/4");
// dot
s = Regex.Replace(s, "[\u2022]", "*");
// degrees
s = Regex.Replace(s, "[\u00B0]", " degrees");
return s;
}
Also a 1 few more replacements in there.
it worked for me, you can try below code
string replacedstring = ("your string with smart quotes").Replace('\u201d', '\'');
Thanks!
0
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.