[ACCEPTED]-Subquery using Exists 1 or Exists *-tsql

Accepted answer
Score: 141

No, SQL Server is smart and knows it is 17 being used for an EXISTS, and returns NO 16 DATA to the system.

Quoth Microsoft: http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4

The 15 select list of a subquery introduced by 14 EXISTS almost always consists of an asterisk 13 (*). There is no reason to list column 12 names because you are just testing whether 11 rows that meet the conditions specified 10 in the subquery exist.

To check yourself, try 9 running the following:

SELECT whatever
  FROM yourtable
 WHERE EXISTS( SELECT 1/0
                 FROM someothertable 
                WHERE a_valid_clause )

If it was actually 8 doing something with the SELECT list, it 7 would throw a div by zero error. It doesn't.

EDIT: Note, the 6 SQL Standard actually talks about this.

ANSI 5 SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

3) Case:
a) If 4 the <select list> "*" is simply contained in 3 a <subquery> that is immediately contained in an 2 <exists predicate>, then the <select list> is equivalent to a <value expression> that 1 is an arbitrary <literal>.

Score: 119

The reason for this misconception is presumably 77 because of the belief that it will end up 76 reading all columns. It is easy to see that 75 this is not the case.

CREATE TABLE T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)

CREATE NONCLUSTERED INDEX NarrowIndex ON T(Y)

IF EXISTS (SELECT * FROM T)
    PRINT 'Y'

Gives plan

Plan

This shows 74 that SQL Server was able to use the narrowest 73 index available to check the result despite 72 the fact that the index does not include 71 all columns. The index access is under a 70 semi join operator which means that it can 69 stop scanning as soon as the first row is 68 returned.

So it is clear the above belief 67 is wrong.

However Conor Cunningham from the 66 Query Optimiser team explains here that he typically 65 uses SELECT 1 in this case as it can make a minor 64 performance difference in the compilation of the query.

The 63 QP will take and expand all *'s early in 62 the pipeline and bind them to objects (in 61 this case, the list of columns). It will 60 then remove unneeded columns due to the 59 nature of the query.

So for a simple EXISTS subquery 58 like this:

SELECT col1 FROM MyTable WHERE EXISTS (SELECT * FROM Table2 WHERE MyTable.col1=Table2.col2) The * will be expanded to some 57 potentially big column list and then it 56 will be determined that the semantics of 55 the EXISTS does not require any of those columns, so 54 basically all of them can be removed.

"SELECT 1" will 53 avoid having to examine any unneeded metadata 52 for that table during query compilation.

However, at 51 runtime the two forms of the query will 50 be identical and will have identical runtimes.

I 49 tested four possible ways of expressing 48 this query on an empty table with various 47 numbers of columns. SELECT 1 vs SELECT * vs SELECT Primary_Key vs SELECT Other_Not_Null_Column.

I ran 46 the queries in a loop using OPTION (RECOMPILE) and measured 45 the average number of executions per second. Results 44 below

enter image description here

+-------------+----------+---------+---------+--------------+
| Num of Cols |    *     |    1    |   PK    | Not Null col |
+-------------+----------+---------+---------+--------------+
| 2           | 2043.5   | 2043.25 | 2073.5  | 2067.5       |
| 4           | 2038.75  | 2041.25 | 2067.5  | 2067.5       |
| 8           | 2015.75  | 2017    | 2059.75 | 2059         |
| 16          | 2005.75  | 2005.25 | 2025.25 | 2035.75      |
| 32          | 1963.25  | 1967.25 | 2001.25 | 1992.75      |
| 64          | 1903     | 1904    | 1936.25 | 1939.75      |
| 128         | 1778.75  | 1779.75 | 1799    | 1806.75      |
| 256         | 1530.75  | 1526.5  | 1542.75 | 1541.25      |
| 512         | 1195     | 1189.75 | 1203.75 | 1198.5       |
| 1024        | 694.75   | 697     | 699     | 699.25       |
+-------------+----------+---------+---------+--------------+
| Total       | 17169.25 | 17171   | 17408   | 17408        |
+-------------+----------+---------+---------+--------------+

As can be seen there is no consistent 43 winner between SELECT 1 and SELECT * and the difference 42 between the two approaches is negligible. The 41 SELECT Not Null col and SELECT PK do appear slightly faster though.

All 40 four of the queries degrade in performance 39 as the number of columns in the table increases.

As 38 the table is empty this relationship does 37 seem only explicable by the amount of column 36 metadata. For COUNT(1) it is easy to see that this 35 gets rewritten to COUNT(*) at some point in the 34 process from the below.

SET SHOWPLAN_TEXT ON;

GO

SELECT COUNT(1)
FROM master..spt_values

Which gives the following 33 plan

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
       |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
            |--Index Scan(OBJECT:([master].[dbo].[spt_values].[ix2_spt_values_nu_nc]))

Attaching a debugger to the SQL Server 32 process and randomly breaking whilst executing 31 the below

DECLARE @V int 

WHILE (1=1)
    SELECT @V=1 WHERE EXISTS (SELECT 1 FROM ##T) OPTION(RECOMPILE)
    

I found that in the cases where 30 the table has 1,024 columns most of the 29 time the call stack looks like something 28 like the below indicating that it is indeed 27 spending a large proportion of the time 26 loading column metadata even when SELECT 1 is used 25 (For the case where the table has 1 column 24 randomly breaking didn't hit this bit of 23 the call stack in 10 attempts)

sqlservr.exe!CMEDAccess::GetProxyBaseIntnl()  - 0x1e2c79 bytes  
sqlservr.exe!CMEDProxyRelation::GetColumn()  + 0x57 bytes   
sqlservr.exe!CAlgTableMetadata::LoadColumns()  + 0x256 bytes    
sqlservr.exe!CAlgTableMetadata::Bind()  + 0x15c bytes   
sqlservr.exe!CRelOp_Get::BindTree()  + 0x98 bytes   
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CRelOp_FromList::BindTree()  + 0x5c bytes  
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CRelOp_QuerySpec::BindTree()  + 0xbe bytes 
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CScaOp_Exists::BindScalarTree()  + 0x72 bytes  
... Lines omitted ...
msvcr80.dll!_threadstartex(void * ptd=0x0031d888)  Line 326 + 0x5 bytes C
kernel32.dll!_BaseThreadStart@8()  + 0x37 bytes 

This manual 22 profiling attempt is backed up by the VS 21 2012 code profiler which shows a very different 20 selection of functions consuming the compilation 19 time for the two cases (Top 15 Functions 1024 columns vs Top 15 Functions 1 column).

Both the SELECT 1 and 18 SELECT * versions wind up checking column permissions 17 and fail if the user is not granted access 16 to all columns in the table.

An example I 15 cribbed from a conversation on the heap

CREATE USER blat WITHOUT LOGIN;
GO
CREATE TABLE dbo.T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
GO

GRANT SELECT ON dbo.T TO blat;
DENY SELECT ON dbo.T(Z) TO blat;
GO
EXECUTE AS USER = 'blat';
GO

SELECT 1
WHERE  EXISTS (SELECT 1
               FROM   T); 
/*  ↑↑↑↑ 
Fails unexpectedly with 

The SELECT permission was denied on the column 'Z' of the 
           object 'T', database 'tempdb', schema 'dbo'.*/

GO
REVERT;
DROP USER blat
DROP TABLE T

So one might 14 speculate that the minor apparent difference 13 when using SELECT some_not_null_col is that it only winds up checking 12 permissions on that specific column (though 11 still loads the metadata for all). However 10 this doesn't seem to fit with the facts 9 as the percentage difference between the 8 two approaches if anything gets smaller 7 as the number of columns in the underlying 6 table increases.

In any event I won't be 5 rushing out and changing all my queries 4 to this form as the difference is very minor 3 and only apparent during query compilation. Removing 2 the OPTION (RECOMPILE) so that subsequent executions can use 1 a cached plan gave the following.

enter image description here

+-------------+-----------+------------+-----------+--------------+
| Num of Cols |     *     |     1      |    PK     | Not Null col |
+-------------+-----------+------------+-----------+--------------+
| 2           | 144933.25 | 145292     | 146029.25 | 143973.5     |
| 4           | 146084    | 146633.5   | 146018.75 | 146581.25    |
| 8           | 143145.25 | 144393.25  | 145723.5  | 144790.25    |
| 16          | 145191.75 | 145174     | 144755.5  | 146666.75    |
| 32          | 144624    | 145483.75  | 143531    | 145366.25    |
| 64          | 145459.25 | 146175.75  | 147174.25 | 146622.5     |
| 128         | 145625.75 | 143823.25  | 144132    | 144739.25    |
| 256         | 145380.75 | 147224     | 146203.25 | 147078.75    |
| 512         | 146045    | 145609.25  | 145149.25 | 144335.5     |
| 1024        | 148280    | 148076     | 145593.25 | 146534.75    |
+-------------+-----------+------------+-----------+--------------+
| Total       | 1454769   | 1457884.75 | 1454310   | 1456688.75   |
+-------------+-----------+------------+-----------+--------------+

The test script I used can be found here

Score: 8

Best way to know is to performance test 3 both versions and check out the execution 2 plan for both versions. Pick a table with 1 lots of columns.

Score: 5

There is no difference in SQL Server and 4 it has never been a problem in SQL Server. The 3 optimizer knows that they are the same. If 2 you look at the execution plans, you will 1 see that they are identical.

Score: 1

Personally I find it very, very hard to 4 believe that they don't optimize to the 3 same query plan. But the only way to know 2 in your particular situation is to test 1 it. If you do, please report back!

More Related questions