At Drupalcon 2008, one question I was asked a number of times was "how did you get involved with Drupal?" Here's one of my first letters to Dries, after discovering Drupal in 2004. Interesting to think about how far we've come since then.
Hi Dries,
thanks for your response to Matt's posting. Here's a little background.
A few years back I realized as I built websites that I was solving the same problem over and over. I did a lot of thinking about what the fundamental problems were and how to solve them. The most common problems were site organization, navigation, workflow, and permissions. Well, the most common problem was "how do I change the information on this page" but the CMS I was working in at the time, Userland Manila, had already solved that.
One day I was thinking about Plato's theory of forms and realized that if you combine Plato's forms with database theory you get the content management system that gets you 90% of the way there.
You start with the idea that everything is a form. There must be a foundational layer that treats all things the same -- a "thing framework." After this, you need attributes to distinguish one thing from another. A key understanding is that the attributes are global. Here's an example, describing a mug's required attributes:
MUG
---
can_hold_liquid (boolean)
has_handle (boolean)
A mug must have these attributes or it is not a mug. One can argue that a handleless mug is still a mug, but recognizing that reality is always fuzzier than Plato's forms, I would simply say that we are going for the 90% use case in which you want to drink hot liquid without burning your fingers. Likewise, if your mug can't hold liquid it is not very useful. We come, then, to this fundamental theory, which we have up on the wall in our office to remind ourselves of it:
"An object is defined by its required attributes."
Note that the attributes are still global. For example, a butterfly net shares the has_handle attibute:
BUTTERFLY NET
-------------
permeable_to_air (boolean)
has_handle (boolean)
In addition to these required attributes that define what an object is, there are optional attributes that give additional information about the object:
MUG
---
*can_hold_liquid (boolean)
*has_handle (boolean)
handle_length_in_cm (integer) 4
color (string) white
Note that again, various objects can share attributes from a central "attribute pool":
BUTTERFLY NET
-------------
*permeable_to_air (boolean)
*has_handle (boolean)
handle_length_in_cm (integer) 90
color (string) white
One more unifying theory is that each attribute is of a certain type. It's a string, or an integer, or a boolean, or a floating point number...but they are all treated alike. So this tells us more about our "thing framework." The framework must (1) treat all objects alike, because they are all things, (2) it must be able to keep a library of attributes, and (3) it must be able to assemble objects based on their attributes.
The final piece of the puzzle is semantic. To the framework, all booleans are the same. We need some way to get from what objects are (i.e., their required attributes) to what they do. The boolean has_handle is different from explodes_on_contact and should be treated differently. That's where code comes in. The code attaches to the attribute and does different things depending on what the attribute is and what the attribute's value is (if explodes_on_contact is false, we can handle the object less gingerly, for example).
What I've been describing is, of course, the theory of object orientation. It's a very powerful theory which holds up most of the time. Keeping that in mind, let's move on to an examination of how the above affects websites.
What is a web page? What are its required attributes?
WEB PAGE
--------
*title (string)
*body (string)
*url (string)
For our purposes, Body can be broken down further as long as we assume that the end result will always be a valid HTML body. For example, here is a web page for an article in one of our newsletters:
NEWSLETTER ARTICLE
------------------
*title (string)
*url (string)
*article_text (string)
*author (string)
*publication_date (date)
*topic (string)
*original_open_office_document (file)
I just threw in an attribute of type file for fun.
Now, let me take a detour lest you become bored and fall asleep. Let's talk about one more unifying theory: single vs. multiple. In the case of each object above, you can do things either a single instance of the object or a list of objects. For example, you can delete either a single newsletter article or many from a list. You can view either one post or a collection of posts. You can see one image or an image gallery. Each content type can be dealt with individually or in aggregate.
Moving on, Each attribute may also have actions attached to it. For example, an email_address attribute would have a validator action attached to it that would make sure a proper e-mail address was submitted in that field. Or a background_image attribute could have an action that changes the background picture of the page to the image that matches the value of the attribute. Or an e-mail action might be triggered when the allowed_to_edit attribute is modified by adding a username.
To summarize the grand unifying theory :) :
1. A "thing framework" must be present to deal with all objects in a standard way.
2. Objects are defined by their required attributes.
3. Objects are additionally described by their optional attributes.
4. The attributes are global.
5. When dealing with objects, there will always be either single or multiple objects.
6. Attributes may have actions, such as validation, optionally associated with them.
Now, to finally come around to talking about Drupal! When looking around for a CMS that understands the theory, we come down to a few:
1. Zope/Plone with Archetypes. Archetypes are the attributes described above.
http://plone.org/documentation/archetypes/
The problem with Plone is that (1) it's slow, (2) it's fragile, and (3) it stores content in its own catalog system, not in an SQL database.
2. Manila with the Metadata Plugin or with David Bayly's addedValues plugin.
http://www.addedvalues.org/
The problem with this is that (1) it runs on Frontier, which has an unstable kernel and no kernel developers, (2) it is slow, and (3) it stores content in its own catalog system, not in an SQL database. (Sound familiar?)
3. MFrameWork. MFrameWork is a CMS that Matt Westgate and I designed, written in Ruby using extensive object orientation and an SQL backend.
The problem with this is that (1) Ruby development seems to have slowed, and Ruby has some problems with load, (2) it's hard to write a whole framework by yourself. We do have it in nearly working order.
Then there's Drupal. Drupal wins on many things: (1) it's fast, or fast enough, (2) it stores content in a standard SQL database, (3) it uses PHP which is widely available.
How does Drupal stack up against the grand unifying theory? One of the first things that attracted me to Drupal (other than Matt's enthusiasm :) ) was the fact that it has as its fundamental assumption that "everything is a node." That means that it has the "thing framework." Without this foundation, it's hard to move forward.
How about the second point, that objects are defined by their required attributes? Here Drupal needs a little work, but only a little. Because objects are separated into modules, the required attributes actually exist as assumptions in the code inside the modules. It would be simple enough for each module that is tied to a "node type" to declare what its required attributes are. For example, a node type of "page" has title, body, and teaser as its required attributes. Actually teaser could be viewed as an attribute of type "calculated"; more on that later.
In reality, we don't care that much about the required attributes because they are taken care of inside module code anyway. The only thing that might be handy would be to know what type they are (text, boolean, etc.) in order to display an alternate presentation form from the default. But I'm digressing.
In the next point, "objects are additionally described by their optional attributes" Drupal again has a big win because of the taxonomy module. The idea that you have a vocabulary with terms, and that the vocabulary can be associated with a node type, is synonymous with the idea that objects have global attributes with values, and that the attributes can be associated with an object instance. So this framework is already in place! However, as it currently exists, the taxonomy module has several limitations: (1) it only has one data type, "string"; (2) it has only one format, the select box, and (3) adding attribute values (i.e., vocabulary terms) is cumbersome because it cannot be done within the node editing form but must be done in the taxonomy (now category) admin module.
What about attributes being global? Yes, Drupal wins here too! The vocabularies in the taxonomy can be applied to multiple node types. Thus, the attributes are fundamentally global. I can search the taxonomy for "all blue things" if I have a vocabulary named "color" and several different node types with the attribute color and the value "blue".
The next point is that "when dealing with objects, there will always be either single or multiple objects." Well, Drupal has both the individual node type edit tabs and the administer content page where all objects can be viewed. By adding filter fields to the content table there would be an easy way to select only pages, or only blog entries. The Filter options and Update options have selected attributes as their subjects (approved = false, promoted = false, promoted = true, sticky = true, published = false, etc.). This leads to a further distinction: there are system attributes, or internal vocabularies, that Drupal uses. Some examples of these:
node_type (string)
authored_by (string)
authored_on (date)
last_modified (date)
etc.
Note that these could be considered required attributes for all objects.
Also note that attributes can have access permissions. For example, a role or user may have write access to an attribute but another user may only have read access (or no access). What this means is that you have a fine-grained access control mechanism built-in at two places: at the attribute level and at the object level. Practical example: an article is written, but its author does not have write access to the workflow_status attribute (a hypothetical attribute). However, the editor of the publication does, and after editing changes the workflow_status attribute's value from "awaiting editor" to "edited". Now the information czar who insists on approving all content before it is posted to the website changes the value to "published". An action associated with this attribute value then fires (if value = "published then set system attribute Published to true). This is just an example.
And lastly, "attributes may have actions, such as validation, optionally associated with them." Currently vocabularies have no actions associated with them.
So there's the background and a quick overview of how Drupal fits into the grand unifying theory of content management. On to your questions. First, about terminology: "metatype" is the term we have been using for "attribute" or "vocabulary." I am not tied to that term; in fact, I think it is probably somewhat confusing. Whatever is easiest for people to understand without totally losing what is communicated by the term. So when you ask, why not call schemas "extensions", I agree -- why not? Well, there are a couple of reasons. First, the word "schema", even though it is a scary word, means "a structured framework" which is exactly what we want. Secondly, I would say the schema includes both the required fields (defined by the node type module) and the optional/user-defined fields. Thus, it not only extends but encompasses. That said...whatever people can understand. I think good documentation and how-to's and clear terms are better than dumbed-down terms, but if people are not reading the documentation because they're scared of the terms, maybe the term needs to be changed.
You say that only 20% of the users understood the taxonomy module's power. That jives with my experience, though I think 20% is a little high. :) However, in my experience it is not the users, but the site administrators that need to understand, and when they understand they get very enthusiastic, to the point of mailing checks! Anyway, maybe that's what you meant.
Lastly, you asked how this is different from Jonathan's flexinode module. I was very excited about flexinode when I saw it (and it's great, don't get me wrong); however, flexinode makes limiting assumptions. First, flexinode nodes are special. Rather than having a complete framework for making nodes based on any node type, flexinode has the story type as its base. Most importantly for us, flexinode's attributes are not global; it doesn't tie into Drupal's powerful taxonomy system to have a unified way of searching/filtering/working with values associated with nodes. Again, flexinode is great for what it does. But my goal is to have a framework to work in that has a unified way for treating content types (again, defined by their attributes) and the attributes/values themselves.
Moving forward, this is our plan. I am still in the process of getting Drupal's code into my head. You've seen the first commentary and flowchart from that; I've also done the menu and node systems and plan to do the theme system soon. Matt and I are heading out to the O'Reilly Open Source Conference where we will spend a precious week working on the taxonomy system as it pertains to metadata. We will also most likely have a Drupal BOF session for fun. Goals are to have both data types (strings, integers, booleans, and dates) supported by the taxonomy module as well as having selectable formats (select, radio, dropbox, etc.) for vocabularies. I would also like to distinguish between closed/controlled vocabularies and open vocabularies, where a user can add a term to a vocabulary right in the node edit form. Calculated vocabularies would be a powerful feature, off in the future.
I've been watching drupal-devel, and it seems like many of the problems discussed there for particular cases could be solved as a general case using the metadata approach. Matt and I have discussed individual issues as they arose and it was clear to us that a metadata approach solves many problems quickly.
I hope that this gives you a better idea of where we are coming from with this "metadata approach" and why it is worth doing. We're trying to solve the 90% general case once, and solve it well. Drupal is the closest thing to it that I've seen.
Best regards,
John VanDyk