Tuesday, January 26, 2016

When simple SQL can be complex

I think SQL is a very simple language, but ofcourse I'm biased.

But even a simple statement might have more complexity to it than you might think.

Do you know what the result is of this statement?
SELECT FALSE = FALSE = TRUE;
scroll down for the answer.



























The answer is: it depends.

You might expect it to return false because the 3 items in the comparison are not equal. But that's not the case.

In PostgreSQL this is the result:
postgres=# SELECT FALSE = FALSE = TRUE;
 ?column? 
----------
 t
(1 row)
So it compares FALSE against FALSE, which results in TRUE and then That is compared against TRUE, which results in TRUE. PostgreSQL has proper boolean literals.

Next up is MySQL:
mysql> SELECT FALSE = FALSE = TRUE;
+----------------------+
| FALSE = FALSE = TRUE |
+----------------------+
|                    1 |
+----------------------+
1 row in set (0.00 sec)
This is similar but it's slightly different. The result is 1 because in MySQL TRUE and FALSE evalueate to 0 and 1. If you use BOOLEAN in your DDL this will be changed to tinyint(1). But note that the (1) is only the display width and doesn't change the storage space (tinyint is 1 byte).

And SQLite has yet another result:
sqlite> SELECT FALSE = FALSE = TRUE;
Error: no such column: FALSE
This is because SQLite doesn't have a boolean type and you're expected to use 0 and 1.
If we use the suggested solution we get the same result as with MySQL.
sqlite> SELECT 0 = 0 = 1;
1

What about the SQL standard?

There is a boolean literal in the SQL:1999 standard according to this Wikipedia article. Note that 1999 is 17 years ago. It is an optional feature so it isn't required. Note that a boolean can have 3 values according to the standard. It can be TRUE, FALSE or UNKNOWN. It suggests that the UNKNOWN literal may evaluate to NULL. Neither MySQL, PostgreSQL or SQLite implements the UNKNOWN literal.

What about commercial databases?

DB2, Oracle and SQL Server don't have a boolean type according to this webpage. For DB2 this has changed, according to this page from IBM BOOLEAN support was added in DB2 9.7.0. It supports TRUE, FALSE and NULL, but not UNKNOWN for what I can see.
Ingres 10.0 has full standards complient support for BOOLEAN according to their wiki.

Interestingly enough there are multiple suggestions about what to use when there is no boolean type: BIT, CHAR(1), NUMBER(1). This blogpost from Peter Zaitsev also lists another option: CHAR(0).

So even something simple as a boolean might be less portable than you might have thought it was.

But what about doing a real three-way compare in SQL?

One solution would be to use the & operator:
postgres=# SELECT FALSE::int & FALSE::int & TRUE::int;
 ?column? 
----------
        0
(1 row)
 
mysql [(none)] > SELECT FALSE & FALSE & TRUE;
+----------------------+
| FALSE & FALSE & TRUE |
+----------------------+
|                    0 |
+----------------------+
1 row in set (0.00 sec)
 
sqlite> SELECT 0 & 0 & 1;
0

8 comments:

  1. Hi Daniel. MySQL doesn't have the UNKNOWN literal, but it has the IS UNKNOWN operator, that is a synonym for IS NULL:

    MariaDB [(none)]> SELECT version(), NULL IS UNKNOWN, NULL IS NOT UNKNOWN;
    +------------------------------+-----------------+---------------------+
    | version() | NULL IS UNKNOWN | NULL IS NOT UNKNOWN |
    +------------------------------+-----------------+---------------------+
    | 10.1.10-MariaDB-1~jessie-log | 1 | 0 |
    +------------------------------+-----------------+---------------------+
    1 row in set (0.00 sec)

    ReplyDelete
  2. Yet, I think that the UNKNOWN keyword is misleading. I'm not sure if it is misleading in the standards, but it is in current MySQL implementation. Consider the following query:

    MariaDB [(none)]> SELECT (@x >= 1 OR @x < 1) IS UNKNOWN;
    +--------------------------------+
    | (@x >= 1 OR @x < 1) IS UNKNOWN |
    +--------------------------------+
    | 1 |
    +--------------------------------+
    1 row in set (0.00 sec)

    This can work if I don't care about logic, or if I consider NULL as a "not exists" flag. But it definitely doesn't work if I consider it as an unknown value. UNKNOWN suggests that a 3-way logic is implemented, but it isn't :)

    ReplyDelete
  3. It doesn't work on PostgreSQL 9.5!

    SELECT FALSE = FALSE = TRUE;
    ERROR: syntax error at or near "="
    LINE 1: SELECT FALSE = FALSE = TRUE;
    ^
    (the error is at the second '=')

    It's maybe related to "Adjust operator precedence to match the SQL standard"

    ReplyDelete
  4. Hi Daniel! Nice write up. I'm a bit puzzled by this:

    "
    But what about doing a real three-way compare in SQL?

    One solution would be to use the & operator:
    "

    What do you mean by "real three-way compare"?
    I'm not a pg expert but it seems to me this too would be evaluated like so:

    SELECT (FALSE::int & FALSE::int) & TRUE::int;

    I mean, the & operator is still a binary operator, just like =. So there will never be a "three-way compare" - only a series of two-way compares.

    ReplyDelete
    Replies
    1. If the objective is to get TRUE/1 only if all three booleans are TRUE/1 then this can be a solution. And yes it will be a series of two way compares.

      Delete
  5. Concetrating on MySQl here! Isn't

    SELECT FALSE = FALSE = TRUE;
    simply evaluated/executed as
    SELECT FALSE = (FALSE = TRUE);
    ?

    This was what I expected, so the result does not surprise me.

    ReplyDelete
  6. The problem with & is that FALSE = FALSE should return TRUE, but FALSE & FALSE returns FALSE;

    To simulate = with Boolean operators one should use NOT XOR instead of AND

    ReplyDelete
  7. T-SQL:
    Msg 102, Level 15, State 1, Line 1
    Incorrect syntax near '='.

    ReplyDelete