[ACCEPTED]-SQL Server 2008 Empty String vs. Space-datalength

Accepted answer
Score: 94

varchars and equality are thorny in TSQL. The LEN function 21 says:

Returns the number of characters, rather 20 than the number of bytes, of the given string 19 expression, excluding trailing blanks.

You need to use DATALENGTH to get a true 18 byte count of the data in question. If you have 17 unicode data, note that the value you get 16 in this situation will not be the same as 15 the length of the text.

print(DATALENGTH(' ')) --1
print(LEN(' '))        --0

When it comes to 14 equality of expressions, the two strings 13 are compared for equality like this:

  • Get Shorter string
  • Pad with blanks until length equals that of longer string
  • Compare the two

It's 12 the middle step that is causing unexpected 11 results - after that step, you are effectively 10 comparing whitespace against whitespace 9 - hence they are seen to be equal.

LIKE behaves 8 better than = in the "blanks" situation 7 because it doesn't perform blank-padding 6 on the pattern you were trying to match:

if '' = ' '
print 'eq'
else
print 'ne'

Will 5 give eq while:

if '' LIKE ' '
print 'eq'
else
print 'ne'

Will give ne

Careful with LIKE though: it 4 is not symmetrical: it treats trailing whitespace 3 as significant in the pattern (RHS) but 2 not the match expression (LHS). The following 1 is taken from here:

declare @Space nvarchar(10)
declare @Space2 nvarchar(10)

set @Space = ''
set @Space2 = ' '

if @Space like @Space2
print '@Space Like @Space2'
else
print '@Space Not Like @Space2'

if @Space2 like @Space
print '@Space2 Like @Space'
else
print '@Space2 Not Like @Space'

@Space Not Like @Space2
@Space2 Like @Space
Score: 20

The = operator is T-SQL is not so much "equals" as 73 it is "are the same word/phrase, according 72 to the collation of the expression's context," and 71 LEN is "the number of characters in 70 the word/phrase." No collations treat 69 trailing blanks as part of the word/phrase 68 preceding them (though they do treat leading 67 blanks as part of the string they precede).

If 66 you need to distinguish 'this' from 'this 65 ', you shouldn't use the "are the 64 same word or phrase" operator because 63 'this' and 'this ' are the same word.

Contributing 62 to the way = works is the idea that the 61 string-equality operator should depend on 60 its arguments' contents and on the collation 59 context of the expression, but it shouldn't 58 depend on the types of the arguments, if 57 they are both string types.

The natural language 56 concept of "these are the same word" isn't 55 typically precise enough to be able to 54 be captured by a mathematical operator like 53 =, and there's no concept of string type 52 in natural language. Context (i.e., collation) matters 51 (and exists in natural language) and is 50 part of the story, and additional properties 49 (some that seem quirky) are part of the 48 definition of = in order to make it well-defined 47 in the unnatural world of data.

On the type 46 issue, you wouldn't want words to change 45 when they are stored in different string 44 types. For example, the types VARCHAR(10), CHAR(10), and 43 CHAR(3) can all hold representations of 42 the word 'cat', and ? = 'cat' should let 41 us decide if a value of any of these types 40 holds the word 'cat' (with issues of case 39 and accent determined by the collation).

Response to JohnFx's comment:

See 38 Using char and varchar Data in Books Online. Quoting from that page, emphasis 37 mine:

Each char and varchar data value has 36 a collation. Collations define attributes 35 such as the bit patterns used to represent 34 each character, comparison rules, and sensitivity to case 33 or accenting.

I agree it could be easier 32 to find, but it's documented.

Worth noting, too, is 31 that SQL's semantics, where = has to do 30 with the real-world data and the context 29 of the comparison (as opposed to something 28 about bits stored on the computer) has been 27 part of SQL for a long time. The premise 26 of RDBMSs and SQL is the faithful representation 25 of real-world data, hence its support for 24 collations many years before similar ideas 23 (such as CultureInfo) entered the realm 22 of Algol-like languages. The premise of 21 those languages (at least until very recently) was 20 problem-solving in engineering, not management 19 of business data. (Recently, the use of 18 similar languages in non-engineering applications 17 like search is making some inroads, but 16 Java, C#, and so on are still struggling 15 with their non-businessy roots.)

In my opinion, it's 14 not fair to criticize SQL for being different 13 from "most programming languages." SQL 12 was designed to support a framework for 11 business data modeling that's very different 10 from engineering, so the language is different 9 (and better for its goal).

Heck, when SQL 8 was first specified, some languages didn't 7 have any built-in string type. And in some 6 languages still, the equals operator between 5 strings doesn't compare character data at 4 all, but compares references! It wouldn't 3 surprise me if in another decade or two, the 2 idea that == is culture-dependent becomes 1 the norm.

Score: 9

I found this blog article which describes the behavior 8 and explains why.

The SQL standard requires that string comparisons, effectively, pad the shorter string with space characters. This leads to the surprising 7 result that N'' = N' ' (the empty string equals 6 a string of one or more space characters) and 5 more generally any string equals another 4 string if they differ only by trailing 3 spaces. This can be a problem in some 2 contexts.

More information also available 1 in MSKB316626

Score: 5

There was a similar question a while ago 6 where I looked into a similar problem here

Instead 5 of LEN(' '), use DATALENGTH(' ') - that gives you the correct value.

The 4 solutions were to use a LIKE clause as explained 3 in my answer in there, and/or include a 2 2nd condition in the WHERE clause to check DATALENGTH too.

Have 1 a read of that question and links in there.

Score: 3

To compare a value to a literal space, you 2 may also use this technique as an alternative 1 to the LIKE statement:

IF ASCII('') = 32 PRINT 'equal' ELSE PRINT 'not equal'
Score: 1

As SQL - 92 8.2 comparison predicate saying:

If the 14 length in characters of X is not equal to 13 the length in characters of Y, then the 12 shorter string is effectively replaced, for 11 the purposes of comparison, with a copy 10 of itself that has been extended to the 9 length of the longer string by concatenation 8 on the right of one or more pad char- acters, where 7 the pad character is chosen based on CS. If CS 6 has the NO PAD attribute, then the pad character 5 is an implementation-dependent character 4 different from any char- acter in the character 3 set of X and Y that collates less than any 2 string under CS. Otherwise, the pad character 1 is a <space>.

Score: 0

Sometimes one has to deal with spaces in 7 data, with or without any other characters, even 6 though the idea of using Null is better 5 - but not always usable. I did run into 4 the described situation and solved it this 3 way:

... where ('>' + @space + '<') <> ('>' + @space2 + '<')

Of course you wouldn't do that for large 2 amount of data but it works quick and easy 1 for some hundred lines ...

Score: 0

How to distinct records on select with fields 5 char/varchar on sql server: example:

declare @mayvar as varchar(10)

set @mayvar = 'data '

select mykey, myfield from mytable where myfield = @mayvar

expected

mykey (int) | myfield (varchar10)

1 | 'data '

obtained

mykey | myfield

1 | 'data' 2 | 'data '

even 4 if I write select mykey, myfield from mytable where myfield = 'data' (without final blank) I get 3 the same results.

how I solved? In this mode:

select mykey, myfield
from mytable
where myfield = @mayvar 
and DATALENGTH(isnull(myfield,'')) = DATALENGTH(@mayvar)

and 2 if there is an index on myfield, it'll be 1 used in each case.

I hope it will be helpful.

Score: 0

Another way is to put it back into a state 4 that the space has value. eg: replace the 3 space with a character known like the _

if REPLACE('hello',' ','_') = REPLACE('hello ',' ','_')
    print 'equal'
else
    print 'not equal'

returns: not 2 equal

Not ideal, and probably slow, but is 1 another quick way forward when needed quickly.

More Related questions