Working with SQL Server in Django pt. II
This is a follow-up and brutal takedown of a post I wrote about two years ago. Go here for part I.
I'm starting to use Django again lately and am integrating with our data warehouse, contrary to my own apparent recommendation against that from a couple of years ago. None of this is in production yet, mind you. Just like before. But this time, development is going a lot smoother.
Last episode...
What were the issues we were facing before?
- We have two databases - the app database and an external one, which is our data warehouse. The data warehouse is what we'll be focusing on here.
- our data warehouse is SQL server, which isn't supported by Django out of the box
- Using SQL server itself isn't too bad because there's a decent package to help with that - django-pyodbc-azure - though it's a little behind.
- and the database uses schemas, which Django abhors.
- This takes some code
- and the database is unmanaged, which isn't fun for testing and local development
- More code to get around this
The journey
Two databases
That one's pretty easy. See Django's docs. The main thing here is that you have to create a Database Routing class, which is just a class with a few methods. It would extend an interface if this were any other language, but in this case it doesn't even have to do that.
# database_router.py
from django.conf import settings
class DatabaseRouter:
def db_for_read(self, model, **hints):
return getattr(model, 'database', 'default')
def db_for_write(self, model, **hints):
return getattr(model, 'database', 'default')
def allow_relation(self, obj1, obj2, **hints):
return True
def allow_migrate(self, db, app_label, model_name=None, **hints):
if db == 'default':
return True
return settings.ALLOW_WAREHOUSE_DB_MIGRATION
And add your router to your settings. We'll look at your db settings next.
# settings.py
DATABASE_ROUTERS = ['utils.database_routers.DatabaseRouter']
The only trick here was to determine which database each model maps to, as I couldn't find an attribute to use by default. So I created one. Initially, I just used a static attribute in each model, like so:
# models.py
class MyModel(models.Model):
database = 'data_warehouse'
...
That's a bit of a pain to remember to do and gets fixed up later on, but the router still needs that attribute.
Django + SQL server
Again, django-pyodbc-azure.
pip install django-pyodbc-azure
# settings.py
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
},
'data_warehouse': {
'ENGINE': 'sql_server.pyodbc',
'NAME': 'DataWarehouse',
'HOST': '192.168.1.1',
'PORT': '1433',
'USER': config.get('database', 'username'),
'PASSWORD': config.get('database', 'password'),
'AUTOCOMMIT': False, # Just cuz I treat this as read-only
'OPTIONS': {
'driver': 'ODBC Driver 13 for SQL Server',
'host_is_server': True,
}
}
}
Creating a model
So now you're about ready to map out some models to your data warehouse, right?
# models.py
class Customer(models.Model):
database = 'data_warehouse'
id = models.AutoField(primary_key=True, db_column='Id')
name = models.CharField(max_length=128, db_column='FullName')
class Meta:
managed = False
db_table = ...?
Hint: remember to use AutoField
for the primary key (if it's your standard auto-incrementing pk) - otherwise when you do Consumer.objects.create()
, the id won't auto-populate and you'll waste two hours trying to get your tests to work.
Anyways...by golly, that table is in a schema, how do we specify the schema we want?
That's an ugly bit, but manageable. It involves a bit of SQL injection. But...the good kind...?
db_table = '<schema name>].[<table name>
so our metaclass would be...
class Customer(models.Model):
class Meta:
managed = False
db_table = 'B2C].[Customer'
See? Not too bad. Not fun. But not too bad.
Regroup!
Cool, so now we can read from our database. And that's fine, if we have a pre-populated database to read from. But what if we want to run some tests?
Well...the database is unmanaged, so migrations won't be applied to our test db. Fine, so turn off migrations and have django do the equivalent of a sync db. Cool, but our models are still unmanaged so we'll run into issues.
What else? Oh yeah, there's still that ugly stuff we have to do for each data warehouse model. And there are a lot of such models.
Testing
This ended up being fine. Just some code.
First, we'll disable migrations. Turns out it's not as easy as SOUTH_DISABLE_MIGRATIONS = True
anymore, but more dynamic, I guess. So, in your settings or wherever:
# settings.py
class DisableMigrations(object):
def __contains__(self, item):
return True
def __getitem__(self, item):
return None
MIGRATION_MODULES = DisableMigrations()
Easy? Easy.
To tell Django that you want your model to be managed during tests, you just need to write a custom TestRunner that goes through all of your models, checks their meta, and changes the managed
attribute if necessary. I really just copied/pasted this code from here or here
# settings.py
class UnManagedModelTestRunner(DiscoverRunner):
'''
Test runner that automatically makes all unmanaged models in your Django
project managed for the duration of the test run.
Many thanks to the Caktus Group: http://bit.ly/1N8TcHW
'''
def setup_test_environment(self, *args, **kwargs):
from django.apps import apps
self.unmanaged_models = [m for m in apps.get_models() if not m._meta.managed]
for m in self.unmanaged_models:
m._meta.managed = True
super(UnManagedModelTestRunner, self).setup_test_environment(*args, **kwargs)
def teardown_test_environment(self, *args, **kwargs):
super(UnManagedModelTestRunner, self).teardown_test_environment(*args, **kwargs)
for m in self.unmanaged_models:
m._meta.managed = False
TEST_RUNNER = 'my_project.settings.UnManagedModelTestRunner'
Cool, so now you can run your tests and save your test objects to the database. The rest is really just icing.
Abstract the boilerplate
This part was kind of fun because I had never gotten to mess around with meta classes before. Essentially, I wanted to add a schema
attribute to Meta, and then have the base class do the sql injection for me. Then that garbage isn't littered all around the code. Also not having to specify the database would be nice.
I ended up creating a DataWarehouseModel
that extends model.Model
and takes care of all that for me.
from django.db.models.base import ModelBase
from django.db import models
from django.conf import settings
DEFAULT_DB_FORMAT = '{schema}__{table}'
# Ugly string injection hack so that we can access the table under the schema
# See: http://kozelj.org/django-1-6-mssql-and-schemas/
SQL_DB_FORMAT = '{schema}].[{table}'
class DataWarehouseMeta(ModelBase):
def __new__(typ, name, bases, attrs, **kwargs):
super_new = super().__new__
# Also ensure initialization is only performed for subclasses of Model
# (excluding Model class itself).
parents = [b for b in bases if isinstance(b, DataWarehouseMeta)]
if not parents:
return super_new(typ, name, bases, attrs)
meta = attrs.get('Meta', None)
if not meta:
meta = super_new(typ, name, bases, attrs, **kwargs).Meta
# ignore abstract models
is_abstract = getattr(meta, 'abstract', False)
if is_abstract:
return super_new(typ, name, bases, attrs, **kwargs)
# Ensure table is unmanaged unless explicitly set
is_managed = getattr(meta, 'managed', False)
meta.managed = is_managed
# SQL injection garbage
meta.db_table = typ.format_db_table(bases, meta)
# Delete my custom attributes so the Meta validation will let the server run
del meta.warehouse_schema
del meta.warehouse_table
attrs['Meta'] = meta
return super().__new__(typ, name, bases, attrs, **kwargs)
@classmethod
def format_db_table(cls, bases, meta):
table_format = DEFAULT_DB_FORMAT
model_database = bases[0].database
db_settings = settings.DATABASES.get(model_database)
engine = db_settings['ENGINE']
if engine == 'sql_server.pyodbc':
table_format = SQL_DB_FORMAT
return table_format.format(
schema=meta.warehouse_schema,
table=meta.warehouse_table
)
class DataWarehouseModel(models.Model, metaclass=DataWarehouseMeta):
database = 'data_warehouse'
class Meta:
abstract = True
And that class is used like so:
# models.py
class Customer(DataWarehouseModel):
id = models.IntegerField(...)
class Meta:
warehouse_schema = 'B2C'
warehouse_table = 'Customer'
And that's it.
What DataWarehouseMeta
is doing is:
- setting
managed = False
by default - generating
db_table
from our custom attributes (warehouse_schema
+warehouse_table
) - setting
db_table
to what will be passed to the super class - Django's Meta implementation - deleting the custom attributes, since Django makes sure there aren't any extra attributes lying around its Meta class
This StackOverflow answer was a lot of help.
Summary
So that's it! Nothing a little elbow grease can't fix. I now officially endorse using Django with SQL Server - even complex ones with schemas. The caveat is that we are relying on django-pyodbc-azure, which is a couple of versions behind.
If I get a basic example repo going on GitHub, I'll link it here later.