[ACCEPTED]-Testing for repeated characters in a string-string

Accepted answer
Score: 16

If the string is sorted, you could just 12 remember each character in turn and check 11 to make sure the next character is never 10 identical to the last character.

Other than 9 that, for strings under ten characters, just 8 testing each character against all the rest 7 is probably as fast or faster than most 6 other things. A bit vector, as suggested 5 by another commenter, may be faster (helps 4 if you have a small set of legal characters.)

Bonus: here's 3 a slick LINQ solution to implement Jon's 2 functionality:

int longestRun =
    s.Select((c, i) => s.Substring(i).TakeWhile(x => x == c).Count()).Max();

So, OK, it's not very fast! You 1 got a problem with that?!

:-)

Score: 9

If the string is short, then just looping 21 and testing may well be the simplest and 20 most efficient way. I mean you could create a 19 hash set (in whatever platform you're using) and 18 iterate through the characters, failing 17 if the character is already in the set and 16 adding it to the set otherwise - but that's 15 only likely to provide any benefit when 14 the strings are longer.

EDIT: Now that we 13 know it's sorted, mquander's answer is the best one IMO. Here's 12 an implementation:

public static bool IsSortedNoRepeats(string text)
{
    if (text.Length == 0)
    {
        return true;
    }
    char current = text[0];
    for (int i=1; i < text.Length; i++)
    {
        char next = text[i];
        if (next <= current)
        {
            return false;
        }
        current = next;
    }
    return true;
}

A shorter alternative 11 if you don't mind repeating the indexer 10 use:

public static bool IsSortedNoRepeats(string text)
{
    for (int i=1; i < text.Length; i++)
    {
        if (text[i] <= text[i-1])
        {
            return false;
        }
    }
    return true;
}

EDIT: Okay, with the "frequency" side, I'll 9 turn the problem round a bit. I'm still 8 going to assume that the string is sorted, so 7 what we want to know is the length of the 6 longest run. When there are no repeats, the 5 longest run length will be 0 (for an empty 4 string) or 1 (for a non-empty string). Otherwise, it'll 3 be 2 or more.

First a string-specific version:

public static int LongestRun(string text)
{
    if (text.Length == 0)
    {
        return 0;
    }
    char current = text[0];
    int currentRun = 1;
    int bestRun = 0;

    for (int i=1; i < text.Length; i++)
    {
        if (current != text[i])
        {
            bestRun = Math.Max(currentRun, bestRun);
            currentRun = 0;
            current = text[i];
        }
        currentRun++;
    }
    // It's possible that the final run is the best one
    return Math.Max(currentRun, bestRun);
}

Now 2 we can also do this as a general extension 1 method on IEnumerable<T>:

public static int LongestRun(this IEnumerable<T> source)
{
    bool first = true;
    T current = default(T);
    int currentRun = 0;
    int bestRun = 0;

    foreach (T element in source)
    {
        if (first || !EqualityComparer<T>.Default(element, current))
        {
            first = false;
            bestRun = Math.Max(currentRun, bestRun);
            currentRun = 0;
            current = element;
        }
    }
    // It's possible that the final run is the best one
    return Math.Max(currentRun, bestRun);
}

Then you can call "AABCD".LongestRun() for example.

Score: 9

This will tell you very quickly if a string 11 contains duplicates:

bool containsDups = "ABCDEA".Length != s.Distinct().Count();

It just checks the number 10 of distinct characters against the original 9 length. If they're different, you've got 8 duplicates...

Edit: I guess this doesn't take 7 care of the frequency of dups you noted 6 in your edit though... but some other suggestions 5 here already take care of that, so I won't 4 post the code as I note a number of them 3 already give you a reasonably elegant solution. I 2 particularly like Joe's implementation using 1 LINQ extensions.

Score: 7

Since you're using 3.5, you could do this 4 in one LINQ query:

var results = stringInput
  .ToCharArray() // not actually needed, I've left it here to show what's actually happening
  .GroupBy(c=>c)
  .Where(g=>g.Count()>1)
  .Select(g=>new {Letter=g.First(),Count=g.Count()})
;

For each character that 3 appears more than once in the input, this 2 will give you the character and the count 1 of occurances.

Score: 6

I think the easiest way to achieve that 3 is to use this simple regex

bool foundMatch = false;
foundMatch = Regex.IsMatch(yourString, @"(\w)\1");

If you need more 2 information about the match (start, length 1 etc)

        Match match = null;
    string testString = "ABCDE AABCD";
    match = Regex.Match(testString, @"(\w)\1+?");
    if (match.Success)
    {
        string matchText = match.Value; // AA
        int matchIndnex = match.Index;  // 6
        int matchLength = match.Length; // 2
    }
Score: 3

Update Now, you'd need an array of counters to 7 maintain a count.

Keep a bit array, with 6 one bit representing a unique character. Turn 5 the bit on when you encounter a character, and 4 run over the string once. A mapping of the 3 bit array index and the character set is 2 upto you to decide. Break if you see that 1 a particular bit is on already.

Score: 2
/(.).*\1/

(or whatever the equivalent is in your regex 5 library's syntax)

Not the most efficient, since 4 it will probably backtrack to every character 3 in the string and then scan forward again. And 2 I don't usually advocate regular expressions. But 1 if you want brevity...

Score: 2

How about something like:

string strString = "AA BRA KA DABRA";

var grp = from c in strString.ToCharArray() 
        group c by c into m
        select new { Key = m.Key, Count = m.Count() };

foreach (var item in grp)
{
    Console.WriteLine(
        string.Format("Character:{0} Appears {1} times", 
        item.Key.ToString(), item.Count));
}

0

Score: 1

I started looking for some info on the net 4 and I got to the following solution.

string input = "aaaaabbcbbbcccddefgg";
        char[] chars = input.ToCharArray();
        Dictionary<char, int> dictionary = new Dictionary<char,int>();

        foreach (char c in chars)
        {
            if (!dictionary.ContainsKey(c))
            {
                dictionary[c] = 1; //
            }
            else
            {
                dictionary[c]++;
            }
        }

        foreach (KeyValuePair<char, int> combo in dictionary)
        {
            if (combo.Value > 1) //If the vale of the key is greater than 1 it means the letter is repeated
            {
                Console.WriteLine("Letter " + combo.Key + " " + "is repeated " + combo.Value.ToString() + " times");
            }

        }

I hope 3 it helps, I had a job interview in which 2 the interviewer asked me to solve this and 1 I understand it is a common question.

Score: 0

When there is no order to work on you could 1 use a dictionary to keep the counts:

String input = "AABCD";
var result = new Dictionary<Char, int>(26);
var chars = input.ToCharArray();
foreach (var c in chars)
{
    if (!result.ContainsKey(c))
    {
        result[c] = 0; // initialize the counter in the result
    }
    result[c]++;
}

foreach (var charCombo in result)
{
    Console.WriteLine("{0}: {1}",charCombo.Key, charCombo.Value);   
}
Score: 0

The hash solution Jon was describing is 6 probably the best. You could use a HybridDictionary 5 since that works well with small and large 4 data sets. Where the letter is the key 3 and the value is the frequency. (Update 2 the frequency every time the add fails or 1 the HybridDictionary returns true for .Contains(key))

More Related questions