Thursday, February 28, 2008

Understanding the difference between “IS NULL” and “= NULL”

When a variable is created in SQL with the declare statement it is created with no data and stored in the variable table (vtable) inside SQLs memory space. The vtable contains the name and memory address of the variable. However, when the variable is created no memory address is allocated to the variable and thus the variable is not defined in terms of memory.

When you SET the variable it is allotted a memory address and the initial data is stored in that address. When you SET the value again the data in the memory address pointed to by the variable is then changed to the new value.

Now for the difference and why each behaves the way it does.


“= NULL”

“= NULL” is an expression of value. Meaning, if the variable has been set and memory created for the storage of data it has a value. A variable can in fact be set to NULL which means the data value of the objects is unknown. If the value has been set like so:


DECLARE @val CHAR(4)
SET @val = NULL

You have explicitly set the value of the data to unknown and so when you do:

If @val = NULL
It will evaluate as a true expression.


But if I do:

DECLARE @val CHAR(4)
If @val = NULL
It will evaluate to false.


The reason for this is the fact that I am checking for NULL as the value of @val. Since I have not SET the value of @val no memory address has been assigned and therefore no value exists for @val.

Note: See section on SET ANSI_NULLS (ONOFF) due to differences in SQL 7 and 2000 defaults that cause examples to not work. This is based on SQL 7.



“IS NULL”

Now “IS NULL” is a little trickier and is the preferred method for evaluating the condition of a variable being NULL. When you use the “IS NULL” clause, it checks both the address of the variable and the data within the variable as being unknown. So if I for example do:

DECLARE @val CHAR(4)
If @val IS NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’

SET @val = NULL
If @val IS NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’


Both outputs will be TRUE. The reason is in the first @val IS NULL I have only declared the variable and no address space for data has been set which “IS NULL” check for. And in the second the value has been explicitly set to NULL which “IS NULL” checks also.


SET ANSI_NULLS (ONOFF)

Now let me throw a kink in the works. In the previous examples you see that = NULL will work as long as the value is explicitly set. However, when you SET ANSI_NULLS ON things will behave a little different.

Ex.
DECLARE @val CHAR(4)
SET @val = NULL
SET ANSI_NULLS ON
If @val =NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’

SET ANSI_NULLS OFF
If @val =NULL
PRINT ‘TRUE’
ELSE
PRINT ‘FALSE’

You will note the first time you run the = NULL statement after doing SET ANSI_NULLS ON you get a FALSE and after setting OFF you get a TRUE. The reason is as follows.


Excerpt from SQL BOL article “SET ANSI_NULLS”

The SQL-92 standard requires that an equals (=) or not equal to (<>) comparison against a null value evaluates to FALSE. When SET ANSI_NULLS is ON, a SELECT statement using WHERE column_name = NULL returns zero rows even if there are null values in column_name. A SELECT statement using WHERE column_name <> NULL returns zero rows even if there are nonnull values in column_name.
When SET ANSI_NULLS is OFF, the Equals (=) and Not Equal To (<>) comparison operators do not follow the SQL-92 standard. A SELECT statement using WHERE column_name = NULL returns the rows with null values in column_name. A SELECT statement using WHERE column_name <> NULL returns the rows with nonnull values in the column. In addition, a SELECT statement using WHERE column_name <> XYZ_value returns all rows that are not XYZ value and that are not NULL.
End Excerpt

So as defined by SQL92, “= NULL” should always evaluate false. So even setting the value explicitly means you will never meet the = NULL if condition and your code may not work as intended. The biggest reason where = NULL will shoot you in the foot is this, SQL 7 when shipped and installed is defaulted to ANSI_NULL OFF but SQL 2000 is defaulted to ANSI_NULL ON. Of course you can alter this several ways but if you upgraded a database from 7 to 2000 and found the = NULL worked only when you set if explicitly when you roll out a default 2000 server your code now breaks and can cause data issues.

Yet another reason to use IS NULL instead as under SQL 92 guidelines it is still going to evaluate to TRUE and thus your code is safer for upgrading the server.




Summary

If summary unless you need to check that the value of a variable was set to equal NULL and you have set ANSI_NULLS ON, then always use the “IS NULL” clause to validate if a variable is NULL. By using = NULL instead you can cause yourself a lot of headaches in trying to troubleshoot issues that may arise from it, now or unexpectedly in the future.

No comments: