Myself, Coding, Ranting, and Madness

The Consciousness Stream Continues…

Is It Not Null?

27 Feb 2012 8:00 Tags: None

Of all the weird things that go on in the world of writing SQL, the common syntax for querying databases, the use of IS NOT NULL has to be the one that annoys me the most - because it is an entirely unnecessary bit of syntactic fluff which does not do what pure logic says it should.

For people who haven't had the joy of a formal introduction to databases, the key point for this post is that they used three-value logic. In our everyday lives, we general tend to think of things as either 'true' or 'false'. In three-valued logic, there is a third value 'unknown' which is represented by the value NULL.

Logic using three-values is still intuitive, with some visualisations. Imagine a circle - if it's full, it's true; if it's empty, it's false; and if it's either half-full or half-empty, it's unknown. The logical operations AND, OR and NOT work as expected in this situation1, so TRUE AND UNKNOWN is UNKNOWN, and so forth. If you want a quick distraction, you can derive the truth tables in this method2.

There are a couple of other results from this method - UNKNOWN OR (NOT UNKNOWN) appears like it should be true. The key one I want to come to is NOT UNKNOWN, which you should have hopefully found to be UNKNOWN

Now, operators - IS is an equivalence operator, and therefore has the same precedence as equals; NOT is a unitary operator, and therefore should be applied before. Thus, in logical terms, the expressions IS NOT NULL = IS (NOT NULL) = IS NULL. As I said, logically it shoudl do the complete opposite of what it does, which is return all the values that fall in (x IS TRUE) OR (x IS FALSE).

I have fond memories of trying to explain this issue to me Databases Lecture who, I think deliberately, refused to see that there was any problem, having spent the previous lecture drilling the tenants of three-valued logic into our minds. It came back to mind whilst watching the younger students on my course grapple with a rather familiar sounding coursework.

I suppose, to finish you, I should deal with why is does select the values which are not null - because it is treated as one token. There are some reasons behind this, primarily that databases don't just contain logical values but also actual meaningful data. Browsing through the internet, I can see that some people have gone as far to disassociate UNKNOWN and NULLas having different meanings; in traditional coding, which uses two-valued logic (boolean values can not be null3 null values represented where data simply didn't exist - it was as known as the next value.

  1. 1 This can probably be derived from formal theory, using a circle of unit area and some probability theory
  2. 2 if you order the columns and row True, Unknown, False you may even see some patterns!
  3. 3 Except in things like Java Booleans