[ACCEPTED]-Most efficient way to iterate over all the chars in an NSString-objective-c

Accepted answer
Score: 148

I think it's important that people understand 56 how to deal with unicode, so I ended up 55 writing a monster answer, but in the spirit 54 of tl;dr I will start with a snippet that should 53 work fine. If you want to know details (which 52 you should!), please continue reading after 51 the snippet.

NSUInteger len = [str length];
unichar buffer[len+1];

[str getCharacters:buffer range:NSMakeRange(0, len)];

NSLog(@"getCharacters:range: with unichar buffer");
for(int i = 0; i < len; i++) {
  NSLog(@"%C", buffer[i]);
}

Still with me? Good!

The current 50 accepted answer seem to be confusing bytes 49 with characters/letters. This is a common 48 problem when encountering unicode, especially 47 from a C background. Strings in Objective-C 46 are represented as unicode characters (unichar) which 45 are much bigger than bytes and shouldn't 44 be used with standard C string manipulation 43 functions.

(Edit: This is not the full story! To my great shame, I'd completely forgotten to account for composable characters, where a "letter" is made up of multiple unicode codepoints. This gives you a situation where you can have one "letter" resolving to multiple unichars, which in turn are multiple bytes each. Hoo boy. Please refer to this great answer for the details on that.)

The proper answer to the question 42 depends on whether you want to iterate over 41 the characters/letters (as distinct from the type char) or the 40 bytes of the string (what the type char actually 39 means). In the spirit of limiting confusion, I 38 will use the terms byte and letter from now on, avoiding 37 the possibly ambigious term character.

If you want 36 to do the former and iterate over the letters 35 in the string, you need to exclusively deal 34 with unichars (sorry, but we're in the future 33 now, you can't ignore it anymore). Finding 32 the amount of letters is easy, it's the 31 string's length property. An example snippet 30 is as such (same as above):

NSUInteger len = [str length];
unichar buffer[len+1];

[str getCharacters:buffer range:NSMakeRange(0, len)];

NSLog(@"getCharacters:range: with unichar buffer");
for(int i = 0; i < len; i++) {
  NSLog(@"%C", buffer[i]);
}

If, on the other 29 hand, you want to iterate over the bytes 28 in a string, it starts getting complicated 27 and the result will depend entirely upon 26 the encoding you choose to use. The decent 25 default choice is UTF8, so that's what I 24 will show.

Doing this you have to figure 23 out how many bytes the resulting UTF8 string 22 will be, a step where it's easy to go wrong 21 and use the string's -length. One main reason this 20 very easy to do wrong, especially for a 19 US developer, is that a string with letters 18 falling into the 7-bit ASCII spectrum will 17 have equal byte and letter lengths. This is because UTF8 encodes 7-bit 16 ASCII letters with a single byte, so a simple 15 test string and basic english text might 14 work perfectly fine.

The proper way to do 13 this is to use the method -lengthOfBytesUsingEncoding:NSUTF8StringEncoding (or other encoding), allocate 12 a buffer with that length, then convert the 11 string to the same encoding with -cStringUsingEncoding: and copy 10 it into that buffer. Example code here:

NSUInteger byteLength = [str lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
char proper_c_buffer[byteLength+1];
strncpy(proper_c_buffer, [str cStringUsingEncoding:NSUTF8StringEncoding], byteLength);

NSLog(@"strncpy with proper length");
for(int i = 0; i < byteLength; i++) {
  NSLog(@"%c", proper_c_buffer[i]);
}

Just 9 to drive the point home as to why it's important 8 to keep things straight, I will show example 7 code that handles this iteration in four 6 different ways, two wrong and two correct. This 5 is the code:

#import <Foundation/Foundation.h>

int main() {
  NSString *str = @"буква";
  NSUInteger len = [str length];

  // Try to store unicode letters in a char array. This will fail horribly
  // because getCharacters:range: takes a unichar array and will probably
  // overflow or do other terrible things. (the compiler will warn you here,
  // but warnings get ignored)
  char c_buffer[len+1];
  [str getCharacters:c_buffer range:NSMakeRange(0, len)];

  NSLog(@"getCharacters:range: with char buffer");
  for(int i = 0; i < len; i++) {
    NSLog(@"Byte %d: %c", i, c_buffer[i]);
  }

  // Copy the UTF string into a char array, but use the amount of letters
  // as the buffer size, which will truncate many non-ASCII strings.
  strncpy(c_buffer, [str UTF8String], len);

  NSLog(@"strncpy with UTF8String");
  for(int i = 0; i < len; i++) {
    NSLog(@"Byte %d: %c", i, c_buffer[i]);
  }

  // Do It Right (tm) for accessing letters by making a unichar buffer with
  // the proper letter length
  unichar buffer[len+1];
  [str getCharacters:buffer range:NSMakeRange(0, len)];

  NSLog(@"getCharacters:range: with unichar buffer");
  for(int i = 0; i < len; i++) {
    NSLog(@"Letter %d: %C", i, buffer[i]);
  }

  // Do It Right (tm) for accessing bytes, by using the proper
  // encoding-handling methods
  NSUInteger byteLength = [str lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
  char proper_c_buffer[byteLength+1];
  const char *utf8_buffer = [str cStringUsingEncoding:NSUTF8StringEncoding];
  // We copy here because the documentation tells us the string can disappear
  // under us and we should copy it. Just to be safe
  strncpy(proper_c_buffer, utf8_buffer, byteLength);

  NSLog(@"strncpy with proper length");
  for(int i = 0; i < byteLength; i++) {
    NSLog(@"Byte %d: %c", i, proper_c_buffer[i]);
  }
  return 0;
}

Running this code will output 4 the following (with NSLog cruft trimmed 3 out), showing exactly HOW different the 2 byte and letter representations can be (the 1 two last outputs):

getCharacters:range: with char buffer
Byte 0: 1
Byte 1: 
Byte 2: C
Byte 3: 
Byte 4: :
strncpy with UTF8String
Byte 0: Ð
Byte 1: ±
Byte 2: Ñ
Byte 3: 
Byte 4: Ð
getCharacters:range: with unichar buffer
Letter 0: б
Letter 1: у
Letter 2: к
Letter 3: в
Letter 4: а
strncpy with proper length
Byte 0: Ð
Byte 1: ±
Byte 2: Ñ
Byte 3: 
Byte 4: Ð
Byte 5: º
Byte 6: Ð
Byte 7: ²
Byte 8: Ð
Byte 9: °
Score: 29

While Daniel's solution will probably work 27 most of the time, I think the solution is 26 dependent on the context. For example, I 25 have a spelling app and need to iterate 24 over each character as it appears onscreen 23 which may not correspond to the way it is 22 represented in memory. This is especially 21 true for text provided by the user.

Using 20 something like this category on NSString:

- (void) dumpChars
{
    NSMutableArray  *chars = [NSMutableArray array];
    NSUInteger      len = [self length];
    unichar         buffer[len+1];

    [self getCharacters: buffer range: NSMakeRange(0, len)];
    for (int i=0; i<len; i++) {
        [chars addObject: [NSString stringWithFormat: @"%C", buffer[i]]];
    }

    NSLog(@"%@ = %@", self, [chars componentsJoinedByString: @", "]);
}

And 19 feeding it a word like mañana might produce:

mañana = m, a, ñ, a, n, a

But 18 it could just as easily produce:

mañana = m, a, n, ̃, a, n, a

The former 17 will be produced if the string is in precomposed 16 unicode form and the later if it's in decomposed 15 form.

You might think this could be avoided 14 by using the result of NSString's precomposedStringWithCanonicalMapping 13 or precomposedStringWithCompatibilityMapping, but 12 this is not necessarily the case as Apple 11 warns in Technical Q&A 1225. For example a string like e̊gâds (which 10 I totally made up) still produces the following 9 even after converting to a precomposed form.

 e̊gâds = e, ̊, g, â, d, s

The 8 solution for me is to use NSString's enumerateSubstringsInRange 7 passing NSStringEnumerationByComposedCharacterSequences 6 as the enumeration option. Rewriting the 5 earlier example to look like this:

- (void) dumpSequences
{
    NSMutableArray  *chars = [NSMutableArray array];

    [self enumerateSubstringsInRange: NSMakeRange(0, [self length]) options: NSStringEnumerationByComposedCharacterSequences
        usingBlock: ^(NSString *inSubstring, NSRange inSubstringRange, NSRange inEnclosingRange, BOOL *outStop) {
        [chars addObject: inSubstring];
    }];

    NSLog(@"%@ = %@", self, [chars componentsJoinedByString: @", "]);
}

If we 4 feed this version e̊gâds then we get

e̊gâds = e̊, g, â, d, s

as expected, which 3 is what I want.

The section of documentation 2 on Characters and Grapheme Clusters may also be helpful in explaining some 1 of this.

Note: Looks like some of the unicode strings I used are tripping up SO when formatted as code. The strings I used are mañana, and e̊gâds.

Score: 25

Neither. The "Optimize Your Text Manipulations" section of the "Cocoa Performance Guidelines" in the Xcode Documentation recommends:

If you want to 16 iterate over the characters of a string, one 15 of the things you should not do is use 14 the characterAtIndex: method to retrieve each character 13 separately. This method is not designed 12 for repeated access. Instead, consider 11 fetching the characters all at once using 10 the getCharacters:range: method and iterating over the bytes 9 directly.

If you want to search a string 8 for specific characters or substrings, do not 7 iterate through the characters one by 6 one. Instead, use higher level methods 5 such as rangeOfString:, rangeOfCharacterFromSet:, or substringWithRange:, which are optimized 4 for searching the NSString characters.

See this 3 Stack Overflow answer on How to remove whitespace from right end of NSString for an example of how to let rangeOfCharacterFromSet: iterate over 2 the characters of the string instead of 1 doing it yourself.

Score: 19

I would definitely get a char buffer first, then 1 iterate over that.

NSString *someString = ...

unsigned int len = [someString length];
char buffer[len];

//This way:
strncpy(buffer, [someString UTF8String]);

//Or this way (preferred):

[someString getCharacters:buffer range:NSMakeRange(0, len)];

for(int i = 0; i < len; ++i) {
   char current = buffer[i];
   //do something with current...
}
Score: 3

This is little different solution for the 4 question but I thought maybe this will be 3 useful for someone. What I wanted was to 2 actually iterate as actual unicode character 1 in NSString. So, I found this solution:

NSString * str = @"hello 🤠💩";

NSRange range = NSMakeRange(0, str.length);
[str enumerateSubstringsInRange:range
                          options:NSStringEnumerationByComposedCharacterSequences
                       usingBlock:^(NSString *substring, NSRange substringRange,
                                    NSRange enclosingRange, BOOL *stop)
{
    NSLog(@"%@", substring);
}];
Score: 2

Although you would technically be getting 3 individual NSString values, here is an alternative 2 approach:

NSRange range = NSMakeRange(0, 1);
for (__unused int i = range.location; range.location < [starring length]; range.location++) {
  NSLog(@"%@", [aNSString substringWithRange:range]);
}

(The __unused int i bit is necessary to silence 1 the compiler warning.)

Score: 2

try enum string with blocks

Create Category 1 of NSString

.h

@interface NSString (Category)

- (void)enumerateCharactersUsingBlock:(void (^)(NSString *character, NSInteger idx, bool *stop))block;

@end

.m

@implementation NSString (Category)

- (void)enumerateCharactersUsingBlock:(void (^)(NSString *character, NSInteger idx, bool *stop))block
{
    bool _stop = NO;
    for(NSInteger i = 0; i < [self length] && !_stop; i++)
    {
        NSString *character = [self substringWithRange:NSMakeRange(i, 1)];
        block(character, i, &_stop);
    }
}
@end

example

NSString *string = @"Hello World";
[string enumerateCharactersUsingBlock:^(NSString *character, NSInteger idx, bool *stop) {
        NSLog(@"char %@, i: %li",character, (long)idx);
}];
Score: 2

You should not use

NSUInteger len = [str length];
unichar buffer[len+1];

you should use memory 2 allocation

NSUInteger len = [str length];
unichar* buffer = (unichar*) malloc (len+1)*sizeof(unichar);

and in the end use

free(buffer);

in order to 1 avoid memory problems.

More Related questions