yaakov.online

yaakov.online


I fight with computers

SQL Server is smarter than I had assumed

Here's something neat about Microsoft SQL Server that I learned this week. I don't know if it applies to PostgreSQL, MySQL and other such systems, but I'd hope they're also as clever.

Let's assume we're building an improved of the 5-minutes-of-fame app Yo. As well as being able to send a 'Yo' to another user, you can also send the traditional Yiddish complaint, 'Oy'.

To facilitate this, here's a database to store users and their messages:

CREATE TABLE Users (  
    Id INT NOT NULL IDENTITY PRIMARY KEY,
    Name VARCHAR(20) NOT NULL
);

CREATE TABLE Messages (  
    SenderId INT NOT NULL,
    RecipientId INT NOT NULL,
    MessageType CHAR(2) NOT NULL,

    FOREIGN KEY (SenderId) REFERENCES Users (Id),
    FOREIGN KEY (RecipientId) REFERENCES Users (Id),
    CONSTRAINT MessageTypeConstraint CHECK (MessageType IN ('YO', 'OY'))
);

Now let's add some simple test data:

INSERT INTO Users (Name) VALUES ('Bob'), ('Alice');

INSERT INTO Messages (SenderId, RecipientId, MessageType) VALUES  
(1, 2, 'YO'),
(2, 1, 'OY');

Note that I've created a constraint that specifies that Messages.MessageType can only be one of two possible values: 'YO', or 'OY'.

This is a great way to ensure that your data values fall within the ranges you expect, rather than having to debug ages later to figure out how some bad data got into your database. If I try insert a different value, such as 'HI', SQL Server will return an error instead of inserting the bad data.

As it turns out, this also helps SQL Server tune queries.

Consider the following queries:

1. Find all 'Yo' messages:

SELECT * FROM Messages WHERE MessageType = 'YO'  

Let's look at the query execution plan for this. If you've never heard of an execution plan, this shows you how a SQL server will execute your query.

In MySQL and PostgreSQL, you can get this with the EXPLAIN keyword. In Microsoft SQL Server, you can get this by pressing Ctrl+L with your query selected in SQL Server Management Studio.

I've used OPTION (RECOMPILE) in the screenshots below, which forces Microsoft SQL Server to figure out a plan from scratch, rather than using one in it's execution plan cache.

Query Execution Plan for `SELECT * FROM Messages WHERE MessageType = 'YO'`

This shows a pretty simple plan for a simple query on a simple table. It will simply search all rows in the table to find matching ones.

2. Find all 'Oy' messages:

SELECT * FROM Messages WHERE MessageType = 'OY'  

Query Execution Plan for Query Execution Plan for `SELECT * FROM Messages WHERE MessageType = 'OY'`

This one is exactly the same as above. It will simply search all rows in the table to find matching ones.

3. Find all 'Hi' messages:

SELECT * FROM Messages WHERE MessageType = 'HI'  

This is where the magic kicks in:

Query Execution Plan for `SELECT * FROM Messages WHERE MessageType = 'HI'`

SQL Server doesn't scan the table. Or any indexes. In fact, it doesn't even look at the table.

Here it does a 'Constant Scan', which in this case is just a short-circuit of the entire query to return an empty set. SQL Server can do this, because it knows that 'HI' is not a valid value due to the constraint on that column.

By adding database-level validation to this table, SQL Server can better understand my data to sensibly tune queries, and spend no time at all answering bogus ones.