Is It Not Null?
Of all the weird things that go on in the world of writing SQL, the common
syntax for querying databases, the use of IS NOT NULL
has to be the one that annoys me the most - because it is an entirely
unnecessary bit of syntactic fluff which does not do what pure logic says it
should.
For people who haven't had the joy of a formal introduction to databases,
the key point for this post is that they used three-value logic. In our everyday
lives, we general tend to think of things as either 'true' or 'false'. In
three-valued logic, there is a third value 'unknown' which is represented by
the value NULL
.
Logic using three-values is still intuitive, with some visualisations.
Imagine a circle - if it's full, it's true; if it's empty, it's false; and if
it's either half-full or half-empty, it's unknown. The logical operations
AND
, OR
and
NOT
work as expected in this situation1, so TRUE AND UNKNOWN
is
UNKNOWN
, and so forth. If you want a
quick distraction, you can derive the truth tables in this method2.
There are a couple of other results from this method -
UNKNOWN OR (NOT UNKNOWN)
appears like it should be
true. The key one I want to come to is NOT UNKNOWN
,
which you should have hopefully found to be UNKNOWN
Now, operators - IS
is an equivalence operator, and therefore has the
same precedence as equals; NOT
is a unitary operator, and therefore should
be applied before. Thus, in logical terms, the expressions
IS NOT NULL = IS (NOT NULL) = IS NULL
.
As I said, logically it shoudl do the complete opposite of what it does, which
is return all the values that fall in (x IS TRUE) OR (x IS FALSE)
.
I have fond memories of trying to explain this issue to me Databases Lecture who, I think deliberately, refused to see that there was any problem, having spent the previous lecture drilling the tenants of three-valued logic into our minds. It came back to mind whilst watching the younger students on my course grapple with a rather familiar sounding coursework.
I suppose, to finish you, I should deal with why is does select the values
which are not null - because it is treated as one token. There are some
reasons behind this, primarily that databases don't just contain logical values
but also actual meaningful data. Browsing through the internet, I can see
that some people have gone as far to disassociate UNKNOWN
and NULL
as having
different meanings; in traditional coding, which uses two-valued logic
(boolean values can not be null3
null values represented where data simply didn't exist - it was as known as
the next value.