Jakub Korab's post Unit Testing the Database Tier discusses testing of database heavy applications. He explains why unit testing is difficult in database heavy applications and gives some very good pointers for making it work. Please read the whole article as it is great.
The highlights are:
- Unit testing database code is a bit of a funny problem. Most developers can pretty easily get their heads around unit testing a piece of Java code using interfaces and mock objects. When it comes to database code or DAOs, it suddenly becomes particularly difficult.
- Most large development projects go like this: The database guys update the schema. The developers write the code. The developers need a particular data set to exercise the various use cases so they add it to the schema. It all becomes a bit messy.
- The database schema generally is not version controlled, as it is constantly being redefined using DDL statements run by the DBAs.
- Bob [the developer working next to you] didn’t actually run your test because he was too busy with his own and the test suite isn’t clean anyway because everyone is falling over each other.
- In Ruby on Rails a developer’s workspace has multiple environments by default - development, test and production. You develop against the development schema. When you run unit tests the schema from development is copied into the test database with no data in it. The framework imports version controlled sets of test data. Whenever a test is run, it is guaranteed that the database will be in this state. The tear down step cleans out your changes.
- You need multiple database schemas [and consequently - database instances] in order to unit test your db code. Pause and re-read that line. It’s not negotiable. Probably two per developer. One with sample data to use while you work on the user interface. The other, a temporary one for unit testing. A whole development team using the one schema does not work. Most projects do it, but that doesn’t mean that it’s a good idea.
- The full DDL for the database is kept in version control. After each change, the full database DDL is dumped and checked in. No UPDATE TABLE statements. Ever. This way you are guaranteed that if you ever want to get a baseline of your system, you can also rebuild the database as it existed at this time. I worked on a very large telecoms project with a huge development team, and this worked.
- The test data for your environments is stored in version control - at the very least, as dumps of insert statements.
- To test your database code, refresh your test schema [and test data] with the one from version control. Now run your test cases. Gorgeous! No tripping over other people, and your tests are guaranteed to work the same each time.
- So why is unit testing databases so difficult if it doesn’t have to be? Most of the time it involves process change and getting out of bad habits, not just a tool. And change means convincing people. Generally, managers do not understand what benefit there is in multiple database schemas, as it is seen to increase complexity and therefore risk, and DBAs like to have full control over what is going on on their servers. The topic of databases and processes is also a great one for religious zeal.
He did not address one issue that I hear a lot: how will we test the production bugs if we don't have the production data. The answer is quite simple - you connect to a snapshot of the production database and propose a fix. After bug is fixed, you throw the snapshot away and go back to mock databases. You can even capture the test case against the bug by creating new mock database that has just enough data to reproduce the bug.
Another common problem that is harder to solve is about the test cases with large number of database records. How do you test the speed of the SELECT
or JOIN statement in a table with 100 million rows? One can import up to 100,000 records using bulk insert reasonably fast, but not millions. This problem may be here to stay.
Comment by dbunittest — May 7, 2009 @ 9:35 pm
Good article!
I am glad to share one database unit testing tool. It is named as AnyDbTest (<a href=”http://www.anydbtest.com”>www.anydbtest.com</a>).
I know some guys are using DbUnit or other xUnit test framework to perform DB unit testing. I also tried to use them in my projects, but at last I had to give up these tools because I must keep focus on the database rather than switch to be as application developer.
AnyDbTest is declarative style testing tool. We will not need to program at all. What we do is to express what we want to test, rather than how to test. We only need to configure an Xml test file to tell AnyDbTest what we want to test. Rather than painstakingly writing test code for xUnit test framework. So AnyDbTest is the right choice for DBA or DB developers.
AnyDbTest also offers a visual dashboard. Success or failure of test is automatically computed and presented to us via an easy-to-understand red/green light display.
Features specific to AnyDbTest:
*Writing test case with Xml, rather than Java/C++/C#/VB test case code.
*Many kinds of assertion supported, such as StrictEqual, SetEqual, IsSupersetOf, Overlaps, and RecordCountEqual etc.
*Allows using Excel spreadsheet/Xml as the source of the data for the tests.
*Supports Sandbox test model, if test will be done in sandbox, all database operations will be rolled back meaning any changes will be undone.
*Unique cross-different-type-database testing, which means target and reference result set can come from two databases, even one is SQL Server, another is Oracle.