[ACCEPTED]-Why do compilers not warn about out-of-bounds static array indices?-warnings

Accepted answer
Score: 28

GCC does warn about this. But you need to do 3 two things:

  1. Enable optimization. Without at least -O2, GCC is not doing enough analysis to know what a is, and that you ran off the edge.
  2. Change your example so that a[] is actually used, otherwise GCC generates a no-op program and has completely discarded your assignment.

.

$ cat foo.c 
int main(void)
{
  int a[10];
  a[13] = 3;  // oops, overwrote the return address
  return a[1];
}
$ gcc -Wall -Wextra  -O2 -c foo.c 
foo.c: In function ‘main’:
foo.c:4: warning: array subscript is above array bounds

BTW: If you returned a[13] in 2 your test program, that wouldn't work either, as 1 GCC optimizes out the array again.

Score: 10

Have you tried -fmudflap with GCC? These are runtime 52 checks but are useful, as most often you 51 have got to do with runtime calculated indices 50 anyway. Instead of silently continue to 49 work, it will notify you about those bugs.

-fmudflap -fmudflapth -fmudflapir For 48 front-ends that support it (C and C++), instrument 47 all risky pointer/array dereferencing operations, some 46 standard library string/heap 45 functions, and some other associated constructs 44 with range/validity tests. Modules so 43 instrumented should be immune 42 to buffer overflows, invalid heap use, and 41 some other classes of C/C++ programming errors. The 40 instrumen‐ tation relies on 39 a separate runtime library (libmudflap), which will 38 be linked into a program if -fmudflap 37 is given at link time. Run-time 36 behavior of the instrumented program is 35 controlled by the MUDFLAP_OPTIONS environment variable. See 34 "env MUDFLAP_OPTIONS=-help 33 a.out" for its options.

Use -fmudflapth instead 32 of -fmudflap to compile and to link if your 31 program is multi-threaded. Use -fmudflapir, in 30 addition to -fmudflap or -fmudflapth, if 29 instrumentation should ignore pointer reads. This 28 produces less instrumentation (and there‐ fore 27 faster execution) and still provides some 26 protection against outright memory corrupting 25 writes, but allows erroneously read 24 data to propagate within a program.

Here 23 is what mudflap gives me for your example:

[js@HOST2 cpp]$ gcc -fstack-protector-all -fmudflap -lmudflap mudf.c        
[js@HOST2 cpp]$ ./a.out
*******
mudflap violation 1 (check/write): time=1229801723.191441 ptr=0xbfdd9c04 size=56
pc=0xb7fb126d location=`mudf.c:4:3 (main)'
      /usr/lib/libmudflap.so.0(__mf_check+0x3d) [0xb7fb126d]
      ./a.out(main+0xb9) [0x804887d]
      /usr/lib/libmudflap.so.0(__wrap_main+0x4f) [0xb7fb0a5f]
Nearby object 1: checked region begins 0B into and ends 16B after
mudflap object 0x8509cd8: name=`mudf.c:3:7 (main) a'
bounds=[0xbfdd9c04,0xbfdd9c2b] size=40 area=stack check=0r/3w liveness=3
alloc time=1229801723.191433 pc=0xb7fb09fd
number of nearby objects: 1
[js@HOST2 cpp]$

It 22 has a bunch of options. For example it can 21 fork off a gdb process upon violations, can 20 show you where your program leaked (using 19 -print-leaks) or detect uninitialized variable reads. Use 18 MUDFLAP_OPTIONS=-help ./a.out to get a list of options. Since mudflap 17 only outputs addresses and not filenames 16 and lines of the source, i wrote a little 15 gawk script:

/^ / {
    file = gensub(/([^(]*).*/, "\\1", 1);
    addr = gensub(/.*\[([x[:xdigit:]]*)\]$/, "\\1", 1);
    if(file && addr) {
        cmd = "addr2line -e " file " " addr
        cmd | getline laddr
        print $0 " (" laddr ")"
        close (cmd)
        next;
    }
}

1 # print all other lines

Pipe the output of mudflap into 14 it, and it will display the sourcefile and 13 line of each backtrace entry.

Also -fstack-protector[-all] :

-fstack-protector Emit 12 extra code to check for buffer overflows, such 11 as stack smashing attacks. This is done 10 by adding a guard variable to functions 9 with vulnerable objects. This includes 8 functions that call alloca, and functions 7 with buffers larger than 8 bytes. The guards 6 are initialized when a function is entered 5 and then checked when the function exits. If 4 a guard check fails, an error message is 3 printed and the program exits.

-fstack-protector-all Like 2 -fstack-protector except that all functions 1 are protected.

Score: 7

You're right, the behavior is undefined. C99 pointers must point 16 within or just one element beyond declared 15 or heap-allocated data structures.

I've never 14 been able to figure out how the gcc people 13 decide when to warn. I was shocked to learn 12 that -Wall by itself will not warn of uninitialized variables; at 11 minimum you need -O, and even then the warning 10 is sometimes omitted.

I conjecture that because 9 unbounded arrays are so common in C, the 8 compiler probably doesn't have a way in 7 its expression trees to represent an array that has 6 a size known at compile time. So although 5 the information is present at the declaration, I 4 conjecture that at the use it is already 3 lost.

I second the recommendation of valgrind. If you are programming in C, you 2 should run valgrind on every program, all the time until you can no longer take the 1 performance hit.

Score: 5

It's not a static array.

Undefined behavior 20 or not, it's writing to an address 13 integers 19 from the beginning of the array. What's 18 there is your responsibility. There are 17 several C techniques that intentionally 16 misallocate arrays for reasonable reasons. And 15 this situation is not unusual in incomplete 14 compilation units.

Depending on your flag 13 settings, there are a number of features 12 of this program that would be flagged, such 11 as the fact that the array is never used. And 10 the compiler might just as easily optimize 9 it out of existence and not tell you - a 8 tree falling in the forest.

It's the C way. It's 7 your array, your memory, do what you want 6 with it. :)

(There are any number of lint 5 tools for helping you find this sort of 4 thing; and you should use them liberally. They 3 don't all work through the compiler though; Compiling 2 and linking are often tedious enough as 1 it is.)

Score: 4

The reason C doesn't do it is that C doesn't 12 have the information. A statement like 11

int a[10];

does two things: it allocates sizeof(int)*10 bytes of 10 space (plus, potentially, a little dead 9 space for alignment), and it puts an entry 8 in the symbol table that reads, conceptually,

a : address of a[0]

or 7 in C terms

a : &a[0]

and that's all. In fact, in C 6 you can interchange *(a+i) with a[i] in (almost*) all 5 cases with no effect BY DEFINITION. So 4 your question is equivalent to asking "why 3 can I add any integer to this (address) value?"

* Pop 2 quiz: what is the one case in this this 1 isn't true?

Score: 4

The C philosophy is that the programmer is always right. So it will silently 4 allow you to access whatever memory address 3 you give there, assuming that you always 2 know what you are doing and will not bother 1 you with a warning.

Score: 2

shouldn't the compiler emit a warning at 7 the very least?

No; C compilers generally 6 do not preform array bounds checks. The 5 obvious negative effect of this is, as you 4 mention, an error with undefined behavior, which 3 can be very difficult to find.

The positive 2 side of this is a possible small performance 1 advantage in certain cases.

Score: 2

I believe that some compilers do in certain 11 cases. For example, if my memory serves 10 me correctly, newer Microsoft compilers 9 have a "Buffer Security Check" option which 8 will detect trivial cases of buffer overruns.

Why 7 don't all compilers do this? Either (as 6 previously mentioned) the internal representation 5 used by the compiler doesn't lend itself 4 to this type of static analysis or it just 3 isn't high enough of the writers priority 2 list. Which to be honest, is a shame either 1 way.

Score: 0

There are some extension in gcc for that 7 (from compiler side) http://www.doc.ic.ac.uk/~awl03/projects/miro/

on the other hand splint, rat 6 and quite a few other static code analysis 5 tools would have found that.

You also can 4 use valgrind on your code and see the output. http://valgrind.org/

another 3 widely used library seems to be libefence

It's 2 simply a design decision ones made. Which 1 now leads to this things.

Regards Friedrich

Score: 0

-fbounds-checking option is available with 3 gcc.

worth going thru this article http://www.doc.ic.ac.uk/~phjk/BoundsChecking.html

'le dorfier' has 2 given apt answer to your question though, its 1 your program and it is the way C behaves.

More Related questions