Johan Sørensen - CouchDb and CouchObjects

CouchDb and CouchObjects

Friday September 07, 2007

I’ve been watching CouchDb for a while, but it wasn’t until recently when it changed it transport format from XML to JSON that I got real interest in doing something with it, something I apparently wasn’t alone about.

One of the things I’m doing with it is a library called CouchObject, and one of the things it does is allowing you to serialize arbitrary ruby objects to and from CouchDb JSON documents by including a module and defining a few methods on your class:


class Bike
  include CouchObject::Persistable

  def initialize(wheels)
    @wheels = wheels
  end
  attr_accessor :wheels

  def to_couch
    {:wheels => @wheels}
  end

  def self.from_couch(attributes)
    new(attributes["wheels"])
  end
end

The #to_couch method is the one that describes the format we want the class instances’ attributes serialized as a document in the CouchDb database:


{ 
  "_id": "6FA2AFB684A93ECE77DEAAF52BB02565", 
  "_rev": 1745167971, 
  "attributes": {
    "wheels": 4
  }, 
  "class": "Bike"
}

Our #to_couch return result is stored in the attributes key, and the class of the object is the class key, for querying purposes (_id and _rev are CouchDb document attributes).

The from_couch class method is what describes how we should set up our new Bike object that we load from the database, the attributes parameter is the attributes key from the CouchDb document. In this case we just instantiate a new Bike with a number of wheels:


>> bike_4wd = Bike.new(4)
=> #<Bike:0x6a0a68 @wheels=4>
>> bike_4wd.save("couchobject")
=> {"_rev"=>1745167971, "_id"=>"6FA2AFB623A93E0E77DEAAF59BB02565", "ok"=>true}
>> bike = Bike.get_by_id("couchobject", bike_4wd.id)
=> #<Bike:0x64846c @wheels=4>

As I started on this last night there’s still lots of little things to add, like better server and database semantics (in the above #save call, the argument is the database name and the host is hardcoded for now; not pretty).

Another thing I’ve been thinking about doing is a more formal way to describe “models”, something along the DataMapper pattern perhaps, but we’ll see if I actually need it once I get the Persistable module some more features.

Update: I’ve uploaded the Git repository here, I want to add a few things before I do a release.

Comments:

Jon Wood Says:
Sep 07 at 13:56
Excellent – I’ve been hoping for something a bit more robust for a while, but not got round to writing anything.

Is the source available anywhere? I’d love to have a play around with it – I’d especially like to see if the to_couch and from_couch methods could be dropped in standard use cases.
johan Says:
Sep 07 at 14:38
it’ll be up on rubyforge soon, hopefully something after the weekend.

My first approach was actually to just copy the instance variables in and out.
Dado Says:
Sep 07 at 18:44
Could the wheel attribute be put at the same level as the class, _id, and _rev attributes instead of in a “attributes” field? Is this a convention, something forced by CouchDb, or what? I believe it only add unnecessary clutter to the data structure.
Dan Says:
Sep 09 at 02:27
Does it make sense to have an ActiveRecord adapter for CouchDB?
rabble Says:
Sep 09 at 20:22
I’ve been thinking about ActiveRecord and CouchDB, maybe ActiveCouch. :)

It’s perhaps not the best fit. A rails like storage model is a great idea, but because couchdb doesn’t have set scheams, we need to define that in our model. Perhaps an AR style definition of the fields would be a good addition to this lib.
Johan Says:
Sep 10 at 06:01
@Dado: not enforced by CouchDb at all, I just think its nice to separate the metadata from the actual object data.

@Dan/rabble: Yeah, after working a bit with the approach from the post here, I find that I need, or want rather, a more formal and descriptive model of my data, since my current wish isn’t really to store arbitrary Ruby objects in CouchDb, but rather a domain-specific set of objects.
I don’t think the ActiveRecord pattern at it’s core maps too well to CouchDb’s loose (schemaless) structure. But I’d certainly want to do something along these lines:
```
class Post < CouchObject::Model
  couch_attribute :title, :body
  # could be typecasted to JSON types too
  couch_attribute :created_on, :Date 
end
```
Crabbers Says:
Sep 10 at 09:24
I’m not fully up to speed with couch but ive been interested in the query side of it using javascript constructs to declare the map functions. could it be modelled in ruby in the same way with a block then the block serialised to a javascript construct? does that make sense?
```
def find
   Couch.query do |doc|
      return doc unless doc.type == 'something'
   end 
end
```
johan Says:
Sep 10 at 09:32
there’s ruby2js, never used it though. But more interesting is the fact that it looks fairly easy to change the query engine in CouchDb (it’s essentially shelling out to spidermonkey right now).

Next on my list is obviously to try and make the query engine use Ruby instad of Javascript :)
Dan Says:
Sep 10 at 15:43
@rabble: Could the schema be derived from db/schema.rb in the rails app rather than the models?
Maraby Says:
Sep 13 at 16:57
Or perhaps a CouchDB document for CouchDB documents including the schema description, using CouchDB to describe itself (sort of).
Kevin Teague Says:
Sep 14 at 10:50
If you are interested in existing implementations of formal schema definitions and doing data modelling uses pure dynamic objects, there has been a lot of different projects within the Python community.

In Django the models contain the schema definition directly – they’ve experimented with schema inheritance but I think that’s on-hold since they have an ORM to deal with:

http://code.djangoproject.com/wiki/ModelInheritance

In the Zope and Plone world we have been publishing persistent dynamic objects to the web for a long time using the Zope Object Database (ZODB), this is very similar to the method used by Gemstone – implementation details are of course quite different, but the core concept is the same. Plone developed Archetypes which uses multiple inheritance to do schema inheritance, so mix-ins style schemas are possible. Archetypes does a good job, but like Django, Archetypes tightly couples Widget objets by embedding them within the schema, making code reuse hard. It has it’s other warts too:

http://plone.org/documentation/tutorial/borg/to-archetype-or-not-to-archetype

When the core Zope developers did the whole let’s-start-over-from-scratch thing after they had been working on Zope 2 for a long time, it took them a lot more years to produce Zope 3. The zope.interface and zope.schema packages in Zope 3 provides a very formal way of specifiying boths APIs and Schemas respectively. These are very well written packages. Schemas are considered an aspect of your API, since in the world of objects the two are tightly linked. Interfaces are just objects thought, and your model declares that it implements specific schemas. This is a much more pleasant way of doing it, IMO.

http://pypi.python.org/pypi/zope.schema/

Except of course Zope 3 requires a great deal of explicit configuration in the form of XML. Which isn’t always the most fun stuff to write. Recently there has been a movement to create a way of working with Zope 3 that uses a lot of the same ideas where Ruby on Rails did a lot of innovation, such as convention over configuration. This project is called Grok and it makes Zope 3 a heck off a lot more fun to play with. It can also give you a glimpse of what Ruby on Rails might be like if it used an OODB:

http://grok.zope.org
rubyruy Says:
Sep 15 at 06:52
@Maraby: That strikes me as throwing away the benefit of having schema-free storage. It makes much more sense (to me anyway) to define fields in one’s model – you know, close to the validation rules and other smartness that go with the object.

In fact, it seems to me that CouchDb makes it far easier to embrace ruby’s dynamism, since neither ruby nor couch really cares what you store in your attributes. The default behavior should be to just store your data and get on with it. If you want specific behavior, ruby already has many excellent ways of doing that, like actually defining the setter/getter methods with specific code, validation macros, type-casting macros etc etc.
Justin Says:
Sep 21 at 15:21
Looks great, but why not use yaml instead of manually creating the to/from methods?
johan Says:
Sep 21 at 16:16
Well, the idea was that there’s no general way of knowing exactly how any particular object should map it attributes (could be into accessors, methods, class/instance/local variables etc etc), hence the mapping methods