Running CouchDB against a RAMDisk
Struggling with a slow running test suite, I’d previously experimented with one db per test. Although superficially elegant, the fact that the dbs were pre-created meant that tests that should fail, passed, and vice versa.
Unsatisfied, I started looking for a different solution. CouchDB allows dbs to be created under a sub-dir, so creating a db named foo/bar will result in the following structure (assuming a default install)
/usr/local/var/lib/couchdb/foo/bar.couch/usr/local/var/lib/couchdb/.foo/bar_design/my_design_doc.view
I mounted two RAM disks following the example in the man page for hdid (OS X).
DB_MOUNT=/usr/local/var/lib/couchdb/sd_test_rd NUM_SECTORS=128000 # 2 * 1024 * Size in MB RAM_DEV=`hdid -nomount ram://$NUM_SECTORS` newfs_hfs $RAM_DEV mkdir -p $DB_MOUNT mount -t hfs $RAM_DEV $DB_MOUNT
A quick test with dd showed improved, if not stellar, performance. Running dd if=/dev/zero of=$DB_MOUNT/foo bs=1024 count=50000 took about 0.5s vs 3.5s against the HDD.
Unfortunately, this improvement wasn’t translated to CouchDB. Run time for my application’s test suite was virtually unchanged. I haven’t yet had the time to study why, but at a guess small frequent writes simply don’t see the same runtime improvement as large writes on a RAM disk.
I also tried a couple of alternatives, like the following, that resulted in no discernible difference.
diskutil erasevolume HFS+ "ram disk" `hdiutil attach -nomount ram://4629672`
I haven’t given up yet though. #couchdb’s manveru has been talking about an in-memory Ruby implementation of CouchDB. The nascent JS implementations of CouchDB, coupled with something like Helma or POW could also be ideal for frequent quick runs of a test suite.
Update: Spotlight chose to rebuild its indexes after I rebooted, presumably as a result of my device / volume mangling.
Evolving Social Games
Gamasutra recently reposted a Simple Lifeforms blog post on the different types of social game. It’s great stuff, but the sheer diversity of games listed highlights the fact that a generally accepted definition of what constitutes a social game continues to prove elusive.
The social gaming panel at the 2008 Graphing Social Patterns conference reached a consensus opinion that
if emotions like guilt, pride, reciprocity, gratitude or vengeance get evoked in the gameplay because of the combination of gameplay and player relationship, a game is social.
This sounds reasonable, but by that definition, the team based play of Call of Duty et al. is right on the button. Yet when I hear the phrase social games, Call of Duty doesn’t really spring to mind, even if it is a wonderful example of a game that invokes a desire for revenge. I’m increasingly inclined to agree with Jeremy Liew’s opinion that social gaming is a tactic not a category.
Yet, there’s great scope for exploring games where the combination of gameplay and player relationship doesn’t merely invoke emotions as a side effect, but where the explicit goal of every player action is to evoke an emotional response in other players.
This is exactly what I’m aiming for with Strawberry Diva. When I started developing it, I used to call it a casual MMORPG. But a little piece of me died every time I did that. These days I call Strawberry Diva a social strategy game as this description is much closer to its essence of game mechanics derived from social interactions.
Games that focus on social interactions remain comparatively rare. The Sims 2 was one of the first games to place an emphasis on relationships and emotional goals. But it took until 2005 and the release of Facade for a game to place an exclusive focus on a relationship. Facade tasks the player with saving the marriage of Trip and Grace, a materially successful couple whose dinner party - with the player as the sole guest - may play host to the demise of their relationship. A true interactive drama, it was critically acclaimed on release, but nothing quite like it has appeared in the last four years.
Maybe this shouldn’t surprise us, though. Facade relies on natural language processing, and while it does so well, NLP is going to remain an open problem for quite some time. A game placing real relationships formed by real people at its core has the potential for far greater depth and longevity.
Friends for Sale and Erepulik do just this. Friends for Sale wraps mercenary gameplay in a fun package. Erepublik offers a intriguing blend of social competition and politics. Its Spartan visuals and virtualpolitik setting surely limit the number of potential players, yet it boasts 100k active players.
Ultimately, I see social games evolving to place social interaction at their core, although we might give such games a different name. Games that zero in on our all-too-human nature - our social frailties, our need for validation and our desire for self-actualization - have the potential to elicit powerful responses.
Semi structuring CouchDB databases
A recurring question on the CouchDB mailing list and IRC channel is one of document structure. Most developers exploring CouchDB bring their SQL heritage along for the ride. Shifting your way of thinking doesn’t happen overnight, but using CouchDB effectively requires ditching the relational mindset.
Alex Lang’s recent Scotland on Rails talk about CouchDB and its Ruby libraries generated a little flutter on Twitter. At heart was his assertion that while you could take a relational approach when structuring your documents, to do so would be to miss the point. I concur. Given that the terminology around relational dbs and normalization is often abused, I’ll make what I mean explicit.
The relational in RDBMS has a well defined meaning that has nothing to do with multiple tables relating to one another. It means simply that a tuple has a value defined over it - the cells in a table row are related in some way and that relation has a meaning associated with it. That’s it.
A document oriented database has no real analog of a table and so is inherently non-relational. You could, of course, impose relational semantics by structuring your documents in a particular way. You could also normalize your structured documents, taking care not to assign an object or array to a document property, as doing so would preclude your database from being considered normalized (1NF). But why bother? Why not just use an RDBMS?
I’ll add, in case of ambiguity, that storing non-normalized data is completely orthogonal to storing doc ids as document properties, thereby allowing inter-document navigation.
So given that the relational model is effectively abandoned, how should document databases be structured? I’ll simply offer my own opinion, noting that many different approaches exist. (One of the more interesting deviations from the standard RDMBS model is the single db per user approach.)
If new data supersedes existing data, should the existing data be kept (e.g. for analysis), or may it be deleted?
In general, larger databases and views take longer to query. The difference is typically marginal, unless querying with a
group_level > 0in which case the repeated executions of the reduce function may become significant. A simple benchmark I carried out resulted in an avg 0.035s response time when querying against 1.6m docs withgroup_level=0. Querying withgroup_level=2took 2.010s. It’s worth pointing out that I conducted the benchmarks in December; CouchDB improves rapidly.Should writes be contention free?
Guaranteeing contention free writes means a new doc per write.
How do you want to retrieve the data?
While views offer great flexibility in aggregating data, an up-front understanding of how you’ll want to retrieve your data is beneficial. Issuing multiple requests to CouchDB and performing client side aggregation is a fine approach, but sometimes it’s simply easier to retrieve all the required data in a single request. This is particularly the case when paginating a result set. Trying to paginate across the combined results of multiple queries would be a real PITA.
Judicious denormalization
You can’t really denormalize something that wasn’t already normalized, but the idea is to embed unlikely-to-change properties in documents that aren’t the canonical source of that property.
For example, an Invite may contain a recipient name and sender name, but also contain the doc ids of the docs representing the recipient and sender. This approach allows invites and their relevant presentational information to be retrieved with just a single CouchDB query, but it also allows for simple navigation to the referenced documents (recipient and sender). I make heavy use of this approach myself and RelaxDB supports it explicitly.
Quick and Effective - Per test databases with CouchDB
Testing is surely the aspect of software development that I find most irksome. If the goal is an efficacious, fast-running test suite, it’s not an easy target and as a result, I’m almost never happy with my tests.
A few years ago, tired of a ‘unit test’ suite that took twenty minutes to run, and a functional test suite that took almost five times as long, I jumped on the mock bandwagon as it went by. I’ve since jumped off. Mocks have their uses, but they don’t typically stand the acid test of tests - “Broken interface means failing tests, Working interface means passing tests”. Almost axiomatic, but often ignored.
Which brings me onto model level tests and CouchDB. I’m only going to have test confidence when my models are operating on data retrieved from the database. That confidence comes at a price - my test suite is slow. But as I mentioned before, working with CouchDB encourages new ways of thinking.
Traditionally, tests that interact with a database clear it out before every test case, load it up with the required data and finally run the actual test. But a single CouchDB instance supports many databases, potentially many many databases. What if I had a database per test, pre-loaded with the required data? Tests would still issue GET requests, but the costly setup stage would be obviated. As it turns out, this is fairly straighforward to do. The basic premise is as follows:
- Specify the test setup code in a lambda (or any delayed execution construct)
- When the test starts, query CouchDB for a known document id against a database whose name matches the current test
- If the known document exists and if it contains a property whose contents match the test setup code exactly, you’re done. Just run the test.
- If the condition above doesn’t hold, the test setup code has changed. Delete the database named by the test, create it again, run the test setup code against it and store a document containing the test setup code. Now run the test.
The following is extracted from a real spec. The code inside the cdb block is executed if and only if it hasn’t already been run.
it "should know one another" do
cdb do
p1, p2 = Player.stock(:name => "p1"), Player.stock(:name => "p2")
p1.acquaint p2
end
# p1 and p2 are methods that load players by names p1 and p2
p1.should know(p2)
p2.should know(p1)
end
It’s worth pointing out that you’ll want to rerun the lambda if the object creation code changes, even if the lambda itself is unchanged. This is done easily with a command line switch.
I’ve published the code I use for doing this with RSpec on github. The speed up is of course test dependent, but the specs that I’ve applied it to run almost an order of magnitude faster. Happy days!
Visualizing inter doc relationships in CouchDB
One of the many enjoyable aspects of working with CouchDB is the scope it offers for exploration. Developing against CouchDB presents such a different paradigm for working with data that it really does stimulate thought.
I wanted to illustrate this at my talk at LRUG last week. I’d been thinking for some time how useful a graphical document browser would be for CouchDB, and that it would be fairly simple to write one. So, rather than preparing a talk, I spent last Monday writing fuschia. The idea was straightforward - a user enters a seed document id, fuschia displays it, all docs that it links to, and that link to it. Docs are represented by colored nodes, and labelled with their most descriptive attribute (as defined by the user). Click a node to repeat the process.
Given CouchDB’s HTTP interface, UUIDs as identifiers, and an ability to aggregate data with a map function, developing fuschia required just a few dozen lines of core code. Kudos also goes to prefuse, but my relationship with that library is more a love / hate one. Dragons lie on either side of its hidden golden path.
Somewhat predictably, writing fuschia took longer than expected and I didn’t have time to prepare a talk. Which wouldn’t have been so bad had I not forgotten to demo fuschia. Not my finest hour.
While fuschia works well with sample data, using it with real world data didn’t offer the insights I’d hoped for. An ability to filter out data is needed and views offer a natural way to achieve this. Work for the future so…
Strawberry Diva - Opening Up
I founded http://strawberrydiva.com, a web-based casual MMO, eight months ago. My decision to quit my job and pursue it full time was based on a desire to create a game that was informed by social rather than spatial navigation.
I believe we’re getting close to reaching that goal. If you want to be one of the first to play, sign up.
Illegally embedding attributes for fun and profit
Imagine a web interface where you drag items into clusters. Dropping an item onto a cluster should make a request that persists the association. To do this, two ids are required - one for the item and one for the cluster. The most natural way to list these attributes would be on the HTML elements themselves. Maybe something like
<div class="item" item-id="478">idea</div>
<div class="cluster" cluster-id="112">group name</div>
(Let’s assume that we can’t simply use the id attribute - id collision or whatever). Unfortunately, the HTML above isn’t valid. Neither item-id nor cluster-id are recognized HTML attributes and the spec makes no allowance for adding arbitrary attributes. This is a real shame as any alternatives involve more markup, scripting or both.
Is knowing generation of invalid HTML such a bad thing? Well the W3C validator states that
Validity is one of the quality criteria for a Web page, but there are many others. In other words, a valid Web page is not necessarily a good web page, but an invalid Web page has little chance of being a good web page.
Yikes! Sounds quite imperious. So how does this little axiom stand up in the wild?
- http://ajaxian.com/ is XHTML 1.0 Transitional with 682 Errors, 16 warning(s)
- http://www.google.com/ is HTML 4.01 transitional with 66 errors, 9 warning(s)
- http://www.facebook.com/index.php is XHTML 1.0 strict with 69 errors, 27 warning(s)
- http://groups.google.com/group/jquery-en is XHTML 1.0 Transitional with 752 Errors, 122 warning(s)
- http://developer.mozilla.org/ is XHTML 1.0 Transitional with 43 Errors, 36 warning(s)
- http://validator.w3.org/ was successfully checked as XHTML 1.0 Strict
Ok, so I no longer care about validation. Read that advisably - I have no intention of producing tag soup and I’m fully aware that many of the errors above were caused by malformatted urls, but I’m not going to worry if my page fails validation.
What happens when the page is parsed and invalid attributes are encountered? Well, the spec merely makes a recommendation
If a user agent encounters an attribute it does not recognize, it should ignore the entire attribute specification (i.e., the attribute and its value).
In practice, however, browsers do not do this. A webkit blog post tells us that
Many technically illegal constructs, like misnested tags or bad attribute names, are allowed or safely ignored. This error-handling is relatively consistent between browsers.
This bodes well, but what of the future? Will HTML 5 outlaw or support arbitrary attributes? The news is good, embedded attributes are explicitly supported by HTML 5. They take the form data-*="". Let’s rewrite the attributes above so we’re at least consistent with HTML 5.
<div class="item" data-item-id="478">idea</div>
<div class="cluster" data-cluster-id="112">group name</div>
So, I’m explicity generating invalid HTML 4 and, DOCTYPE excepted, valid HTML 5. But the real question - does it work? I haven’t tested in IE6, because as we all know, it’s teh suck, but in other browsers all is well.
A test page consisting of 26 elements, each with two attributes, is consistently aggregated in less than 1ms by Safari. Firefox takes between 8-12ms and IE7 is a little slower, usually about 50ms.
The end result is simple markup that can be easily inspected with a minimum of fuss with a library like jQuery. For example, to establish the relationship between item and cluster, we’d merely write something like the following.
$.post("/relationship", {
item_id : $draggedItem.attr("data-item-id"),
cluster_id : $droppable.attr("data-cluster-id")
}
);
Update: Scott Byers had problems with the selector syntax above - jQuery merely returned an empty set. He got around it by escaping the - character e.g. $("[data\-foo='bar']").
Relax with Merb and CouchDB
I’ve posted a tutorial describing how to create a Merb app backed by CouchDB on strawberrydiva.com
If you’re running Rails rather than merb, you should still be able to follow along.
DNS to lose relevance around the edges?
As noted elsewhere, the trend for url free advertisements for web sites is growing. Relying on high search engine rankings for your product name is clearly a risky business, but let’s assume you have it nailed. Do you still need to line the pockets of a domainer?
Well, DNS is often employed as a load balancing technique for heavily trafficked sites, so having search engine results direct to a domain would typically be a good thing. But that load balancing service could potentially be replaced with the help of a chunky EC2 instance sitting up front directing requests, and an elastic IP address - an IP address which is dynamically reassigned e.g. in case of instance failure.
So, assuming that Google’s ranking algorithm doesn’t penalise domain free sites, how long before we see a website launch that simply ignores domains and DNS and relies exclusively on search engine ranking?
Developers' License
In October of last year, the cavernous Turbine Hall of London’s Tate Modern opened to a work of art by Columbian sculptor Doris Salcedo. The work of art is simply a crack in the floor. Salcedo states that the fracture symbolises the gap between white Europeans and the rest of humanity. When asked how deep the crack is, she replied "It’s bottomless. It’s as deep as humanity." Now, to be honest, I don’t think it is. In fact, I can’t imagine it’s more than 40 or 50cm deep. Not that that’s stopped some people falling into it. Nonetheless, her reply made me wonder if we, as software developers, shouldn’t be afforded the opportunity to use our own form of artistic license…
I’m not convinced. Note: Possessing little artistic ability of my own, the characters above were inspired by Nemi and Itachi.
Older posts: 1 2