Working with SQL Server in Django pt. II

This is a follow-up and brutal takedown of a post I wrote about two years ago. Go here for part I.

I'm starting to use Django again lately and am integrating with our data warehouse, contrary to my own apparent recommendation against that from a couple of years ago. None of this is in production yet, mind you. Just like before. But this time, development is going a lot smoother.

Last episode...

What were the issues we were facing before?

We have two databases - the app database and an external one, which is our data warehouse. The data warehouse is what we'll be focusing on here.
our data warehouse is SQL server, which isn't supported by Django out of the box
- Using SQL server itself isn't too bad because there's a decent package to help with that - django-pyodbc-azure - though it's a little behind.
and the database uses schemas, which Django abhors.
- This takes some code
and the database is unmanaged, which isn't fun for testing and local development
- More code to get around this

The journey

Two databases

That one's pretty easy. See Django's docs. The main thing here is that you have to create a Database Routing class, which is just a class with a few methods. It would extend an interface if this were any other language, but in this case it doesn't even have to do that.

# database_router.py

from django.conf import settings


class DatabaseRouter:

    def db_for_read(self, model, **hints):
        return getattr(model, 'database', 'default')

    def db_for_write(self, model, **hints):
        return getattr(model, 'database', 'default')

    def allow_relation(self, obj1, obj2, **hints):
        return True

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        if db == 'default':
            return True

        return settings.ALLOW_WAREHOUSE_DB_MIGRATION

And add your router to your settings. We'll look at your db settings next.

# settings.py

DATABASE_ROUTERS = ['utils.database_routers.DatabaseRouter']

The only trick here was to determine which database each model maps to, as I couldn't find an attribute to use by default. So I created one. Initially, I just used a static attribute in each model, like so:

# models.py

class MyModel(models.Model):  
    database = 'data_warehouse'

    ...

That's a bit of a pain to remember to do and gets fixed up later on, but the router still needs that attribute.

Django + SQL server

Again, django-pyodbc-azure.

pip install django-pyodbc-azure

# settings.py

DATABASES = {  
    'default': {
        'ENGINE': 'django.db.backends.sqlite3',
        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
    },
    'data_warehouse': {
        'ENGINE': 'sql_server.pyodbc',
        'NAME': 'DataWarehouse',
        'HOST': '192.168.1.1',
        'PORT': '1433',
        'USER': config.get('database', 'username'),
        'PASSWORD': config.get('database', 'password'),
        'AUTOCOMMIT': False,  # Just cuz I treat this as read-only
        'OPTIONS': {
            'driver': 'ODBC Driver 13 for SQL Server',
            'host_is_server': True,
        }
    }
}

Creating a model

So now you're about ready to map out some models to your data warehouse, right?

# models.py

class Customer(models.Model):

    database = 'data_warehouse'

    id = models.AutoField(primary_key=True, db_column='Id')
    name = models.CharField(max_length=128, db_column='FullName')

    class Meta:
        managed = False
        db_table = ...?

Hint: remember to use AutoField for the primary key (if it's your standard auto-incrementing pk) - otherwise when you do Consumer.objects.create(), the id won't auto-populate and you'll waste two hours trying to get your tests to work.

Anyways...by golly, that table is in a schema, how do we specify the schema we want?

That's an ugly bit, but manageable. It involves a bit of SQL injection. But...the good kind...?

db_table = '<schema name>].[<table name>

so our metaclass would be...

class Customer(models.Model):

    class Meta:
        managed = False
        db_table = 'B2C].[Customer'

See? Not too bad. Not fun. But not too bad.

Regroup!

Cool, so now we can read from our database. And that's fine, if we have a pre-populated database to read from. But what if we want to run some tests?

Well...the database is unmanaged, so migrations won't be applied to our test db. Fine, so turn off migrations and have django do the equivalent of a sync db. Cool, but our models are still unmanaged so we'll run into issues.

What else? Oh yeah, there's still that ugly stuff we have to do for each data warehouse model. And there are a lot of such models.

Testing

This ended up being fine. Just some code.

First, we'll disable migrations. Turns out it's not as easy as SOUTH_DISABLE_MIGRATIONS = True anymore, but more dynamic, I guess. So, in your settings or wherever:

# settings.py

class DisableMigrations(object):

    def __contains__(self, item):
        return True

    def __getitem__(self, item):
        return None


MIGRATION_MODULES = DisableMigrations()

Easy? Easy.

To tell Django that you want your model to be managed during tests, you just need to write a custom TestRunner that goes through all of your models, checks their meta, and changes the managed attribute if necessary. I really just copied/pasted this code from here or here

# settings.py

class UnManagedModelTestRunner(DiscoverRunner):  
    '''
    Test runner that automatically makes all unmanaged models in your Django
    project managed for the duration of the test run.
    Many thanks to the Caktus Group: http://bit.ly/1N8TcHW
    '''

    def setup_test_environment(self, *args, **kwargs):
        from django.apps import apps
        self.unmanaged_models = [m for m in apps.get_models() if not m._meta.managed]

        for m in self.unmanaged_models:
            m._meta.managed = True

        super(UnManagedModelTestRunner, self).setup_test_environment(*args, **kwargs)

    def teardown_test_environment(self, *args, **kwargs):
        super(UnManagedModelTestRunner, self).teardown_test_environment(*args, **kwargs)
        for m in self.unmanaged_models:
            m._meta.managed = False


TEST_RUNNER = 'my_project.settings.UnManagedModelTestRunner'

Cool, so now you can run your tests and save your test objects to the database. The rest is really just icing.

Abstract the boilerplate

This part was kind of fun because I had never gotten to mess around with meta classes before. Essentially, I wanted to add a schema attribute to Meta, and then have the base class do the sql injection for me. Then that garbage isn't littered all around the code. Also not having to specify the database would be nice.

I ended up creating a DataWarehouseModel that extends model.Model and takes care of all that for me.

from django.db.models.base import ModelBase  
from django.db import models  
from django.conf import settings


DEFAULT_DB_FORMAT = '{schema}__{table}'

# Ugly string injection hack so that we can access the table under the schema
# See: http://kozelj.org/django-1-6-mssql-and-schemas/
SQL_DB_FORMAT = '{schema}].[{table}'


class DataWarehouseMeta(ModelBase):

    def __new__(typ, name, bases, attrs, **kwargs):
        super_new = super().__new__

        # Also ensure initialization is only performed for subclasses of Model
        # (excluding Model class itself).
        parents = [b for b in bases if isinstance(b, DataWarehouseMeta)]
        if not parents:
            return super_new(typ, name, bases, attrs)

        meta = attrs.get('Meta', None)
        if not meta:
            meta = super_new(typ, name, bases, attrs, **kwargs).Meta

        # ignore abstract models
        is_abstract = getattr(meta, 'abstract', False)
        if is_abstract:
            return super_new(typ, name, bases, attrs, **kwargs)

        # Ensure table is unmanaged unless explicitly set
        is_managed = getattr(meta, 'managed', False)
        meta.managed = is_managed

        # SQL injection garbage
        meta.db_table = typ.format_db_table(bases, meta)

        # Delete my custom attributes so the Meta validation will let the server run
        del meta.warehouse_schema
        del meta.warehouse_table

        attrs['Meta'] = meta
        return super().__new__(typ, name, bases, attrs, **kwargs)

    @classmethod
    def format_db_table(cls, bases, meta):
        table_format = DEFAULT_DB_FORMAT

        model_database = bases[0].database
        db_settings = settings.DATABASES.get(model_database)
        engine = db_settings['ENGINE']

        if engine == 'sql_server.pyodbc':
            table_format = SQL_DB_FORMAT

        return table_format.format(
            schema=meta.warehouse_schema,
            table=meta.warehouse_table
        )


class DataWarehouseModel(models.Model, metaclass=DataWarehouseMeta):

    database = 'data_warehouse'

    class Meta:
        abstract = True

And that class is used like so:

# models.py

class Customer(DataWarehouseModel):

    id = models.IntegerField(...)

    class Meta:
        warehouse_schema = 'B2C'
        warehouse_table = 'Customer'

And that's it.

What DataWarehouseMeta is doing is:

setting managed = False by default
generating db_table from our custom attributes (warehouse_schema + warehouse_table)
setting db_table to what will be passed to the super class - Django's Meta implementation
deleting the custom attributes, since Django makes sure there aren't any extra attributes lying around its Meta class

This StackOverflow answer was a lot of help.

Summary

So that's it! Nothing a little elbow grease can't fix. I now officially endorse using Django with SQL Server - even complex ones with schemas. The caveat is that we are relying on django-pyodbc-azure, which is a couple of versions behind.

If I get a basic example repo going on GitHub, I'll link it here later.