Running CouchDB against a RAMDisk

Struggling with a slow running test suite, I’d previously experimented with one db per test. Although superficially elegant, the fact that the dbs were pre-created meant that tests that should fail, passed, and vice versa.

Unsatisfied, I started looking for a different solution. CouchDB allows dbs to be created under a sub-dir, so creating a db named foo/bar will result in the following structure (assuming a default install)

  • /usr/local/var/lib/couchdb/foo/bar.couch
  • /usr/local/var/lib/couchdb/.foo/bar_design/my_design_doc.view

I mounted two RAM disks following the example in the man page for hdid (OS X).

DB_MOUNT=/usr/local/var/lib/couchdb/sd_test_rd
NUM_SECTORS=128000 # 2 * 1024 * Size in MB

RAM_DEV=`hdid -nomount ram://$NUM_SECTORS`
newfs_hfs $RAM_DEV
mkdir -p $DB_MOUNT
mount -t hfs $RAM_DEV $DB_MOUNT

A quick test with dd showed improved, if not stellar, performance. Running dd if=/dev/zero of=$DB_MOUNT/foo bs=1024 count=50000 took about 0.5s vs 3.5s against the HDD.

Unfortunately, this improvement wasn’t translated to CouchDB. Run time for my application’s test suite was virtually unchanged. I haven’t yet had the time to study why, but at a guess small frequent writes simply don’t see the same runtime improvement as large writes on a RAM disk.

I also tried a couple of alternatives, like the following, that resulted in no discernible difference.

  • diskutil erasevolume HFS+ "ram disk" `hdiutil attach -nomount ram://4629672`

I haven’t given up yet though. #couchdb’s manveru has been talking about an in-memory Ruby implementation of CouchDB. The nascent JS implementations of CouchDB, coupled with something like Helma or POW could also be ideal for frequent quick runs of a test suite.

Update: Spotlight chose to rebuild its indexes after I rebooted, presumably as a result of my device / volume mangling.

Posted by Paul Mon, 27 Apr 2009 10:01:00 GMT


Quick and Effective - Per test databases with CouchDB

Testing is surely the aspect of software development that I find most irksome. If the goal is an efficacious, fast-running test suite, it’s not an easy target and as a result, I’m almost never happy with my tests.

A few years ago, tired of a ‘unit test’ suite that took twenty minutes to run, and a functional test suite that took almost five times as long, I jumped on the mock bandwagon as it went by. I’ve since jumped off. Mocks have their uses, but they don’t typically stand the acid test of tests - “Broken interface means failing tests, Working interface means passing tests”. Almost axiomatic, but often ignored.

Which brings me onto model level tests and CouchDB. I’m only going to have test confidence when my models are operating on data retrieved from the database. That confidence comes at a price - my test suite is slow. But as I mentioned before, working with CouchDB encourages new ways of thinking.

Traditionally, tests that interact with a database clear it out before every test case, load it up with the required data and finally run the actual test. But a single CouchDB instance supports many databases, potentially many many databases. What if I had a database per test, pre-loaded with the required data? Tests would still issue GET requests, but the costly setup stage would be obviated. As it turns out, this is fairly straighforward to do. The basic premise is as follows:

  • Specify the test setup code in a lambda (or any delayed execution construct)
  • When the test starts, query CouchDB for a known document id against a database whose name matches the current test
  • If the known document exists and if it contains a property whose contents match the test setup code exactly, you’re done. Just run the test.
  • If the condition above doesn’t hold, the test setup code has changed. Delete the database named by the test, create it again, run the test setup code against it and store a document containing the test setup code. Now run the test.

The following is extracted from a real spec. The code inside the cdb block is executed if and only if it hasn’t already been run.


it "should know one another" do
  cdb do
    p1, p2 = Player.stock(:name => "p1"), Player.stock(:name => "p2")
    p1.acquaint p2
  end
  # p1 and p2 are methods that load players by names p1 and p2
  p1.should know(p2)
  p2.should know(p1)
end

It’s worth pointing out that you’ll want to rerun the lambda if the object creation code changes, even if the lambda itself is unchanged. This is done easily with a command line switch.

I’ve published the code I use for doing this with RSpec on github. The speed up is of course test dependent, but the specs that I’ve applied it to run almost an order of magnitude faster. Happy days!

Posted by Paul Sun, 18 Jan 2009 14:27:00 GMT


Visualizing inter doc relationships in CouchDB

One of the many enjoyable aspects of working with CouchDB is the scope it offers for exploration. Developing against CouchDB presents such a different paradigm for working with data that it really does stimulate thought.

I wanted to illustrate this at my talk at LRUG last week. I’d been thinking for some time how useful a graphical document browser would be for CouchDB, and that it would be fairly simple to write one. So, rather than preparing a talk, I spent last Monday writing fuschia. The idea was straightforward - a user enters a seed document id, fuschia displays it, all docs that it links to, and that link to it. Docs are represented by colored nodes, and labelled with their most descriptive attribute (as defined by the user). Click a node to repeat the process.

Given CouchDB’s HTTP interface, UUIDs as identifiers, and an ability to aggregate data with a map function, developing fuschia required just a few dozen lines of core code. Kudos also goes to prefuse, but my relationship with that library is more a love / hate one. Dragons lie on either side of its hidden golden path.

Somewhat predictably, writing fuschia took longer than expected and I didn’t have time to prepare a talk. Which wouldn’t have been so bad had I not forgotten to demo fuschia. Not my finest hour.

While fuschia works well with sample data, using it with real world data didn’t offer the insights I’d hoped for. An ability to filter out data is needed and views offer a natural way to achieve this. Work for the future so…

Posted by Paul Sun, 18 Jan 2009 13:16:00 GMT