Saturday, February 8, 2014

Unittesting your indexes

During FOSDEM PGDay I watched the "Indexes: The neglected performance all-rounder" talk by Markus Winand. Both his talk and the "SQL Performance Explained" book (which is also available online) are great.

The conclusion of the talk is that we should put more effort in carefully designing indexes. But how can we make sure the indexes are really used now and in the future? We need to write some tests for it.

So I wrote a small Python script to test index usage per query. This uses the JSON explain format available in MySQL 5.6. It's just a proof-of-concept so don't expect too much of it yet (but please sent pull requests!).

A short example:
import indextest

class tester(indextest.IndexTester):
    def __init__(self):
        dbparams = { 'user': 'msandbox',
                     'password': 'msandbox',
                     'host': 'localhost',
                     'port': '5615',
                     'database': 'imdb'}
        self.dbparams = dbparams

    def test_query1(self):
        q1 = self.query("SELECT * FROM movies WHERE rank>9.8")
        return q1.testEqual('query_block.table.access_type', 'range')

    def test_query2(self):
        q2 = self.query("SELECT * FROM actors WHERE first_name='Tom'")
        return q2.testEqual('query_block.table.access_type', 'range')

    def test_query3(self):
        q3 = self.query("SELECT * FROM actors WHERE first_name='%Tom'")
        return q3.testEqual('query_block.table.key', 'idx_first_name')

run = tester()
$ ./ 
Query: SELECT * FROM movies WHERE rank>9.8
Test: query_block.table.access_type == range
Result: range == range: True

Query: SELECT * FROM actors WHERE first_name='Tom'
Test: query_block.table.access_type == range
Result: range == ref: False

Query: SELECT * FROM actors WHERE first_name='%Tom'
Test: query_block.table.key == idx_first_name
Result: idx_first_name == idx_first_name: True

Tested 3 queries, Pass: 2, Fail: 1

The code is available on github

1 comment:

  1. that's a nice idea. One thing that comes to my mind is index selectivity, and, unless you have an exact copy of your production data, the plans might different between production and test environments.

    Maybe it's better to set the PostgreSQL setting like this:

    auto_explain.log_min_duration = 0

    So all plans are logged in production, and we can use a Batch Processor to scan it and validate that all queries use the right indexes that we expect them to do.