Nutch and Drupal

For the past five weeks I've been working with the Nutch open source search engine and web crawler. Nutch is at the point that Drupal was a year and a half ago. For Drupal, that "meant lots of potential," "poised for explosive growth," and "lacking in documentation." We had the first Drupal conference in February of 2005, and though we expected only a handful of people, many more showed up! I hear that a similar thing happened at the Nutch meetup recently.

Doug Cutting, currently at Yahoo!, is the lead developer. The Nutch project also resulted in an interesting distributed system called Hadoop.

Here's a site running Drupal as the front end and Nutch as the back end. There is currently work being done to integrate Drupal vocabularies and terms with Nutch's search.


Book progress

I'm extremely pleased to note that Matt Westgate of Lullabot has agreed to be a coauthor on the book. I'm pleased not only because Matt brings a wealth of Drupal experience to this endeavor, but because I'll be having lunch with him more often.

Pubcookie for Drupal 4.7

I've just committed a version of the pubcookie module that's compatible with Drupal 4.7. I also added a bit to the README to explain how the pubcookie module works:

When you click on the Log In link provided by the pubcookie block, it takes you to the directory you specified for "Login directory" under admin > settings > pubcookie (by default, 'login'). The pubcookie module takes this path, adds "pc" (an arbitrary string) to the end of it and -- and here's the key -- registers it as a menu item in the menu hook. So now is not a nonexistent file but a registered Drupal path that is "located" inside a directory that's protected by a .htaccess file restricting the contents to pubcookie-server-authenticated users. So when you reach that path, the pubcookie module receives a call to pubcookie_page() and goes from there.

Script for automating Drupal installation

I do a lot of testing with Drupal, so I need a quick and easy way to create a new Drupal site, create the associated database, and get started. There are probably better solutions out there, but this is what I use. It's a bash script that I developed with the help of killes sometime last year. I just updated it for Drupal 4.7.

Now when I want to create a new Drupal site I just type

newdrupal47 foo bar

where foo is the name I'm giving the new site and bar is the name of the MySQL user (the MySQL user is optional; it defaults to root). It gets the Drupal 4.7 branch from CVS, configures the settings file, creates the database, optionally runs some local SQL, and opens Safari. Here's the script (it lost the indentation, oh well):


if [ $# = "0" ]; then
  echo "newdrupal47: usage: newdrupal47 sitename db_user"
  exit 1;

# You may want to set HOST to be your box's domain name
# If second argument is nonzero we were given a db_user;
# use it instead of defaulting to root
if [ -n "$2" ]; then

# This is the location of an SQL file to run after the Drupal
# database has been given to MySQL. I use it to insert one
# line into the user roles table and one line into the user
# table, thus establishing the admin user.

# This is the location of your htdocs directory.
echo "Changing directory to $DIR"
cd $DIR

echo "Retrieving Drupal 4.7 branch..."
# Pull down drupal 4.7
cvs -z3 checkout -r DRUPAL-4-7 drupal

echo "Configuring..."
# Rename the drupal site from "drupal" to whatever the first parameter was,
# e.g. newdrupal47 drupaltest results in a directory named drupaltest
mv $DIR/drupal $DIR/$1

# Make a copy of the default settings folder and name it localhost
# You may want to substitute your machine's DNS name
cp -R $DIR/$1/sites/default $DIR/$1/sites/$HOST

MYSQL_LOC=`which mysql`
if [ -x $MYSQL_LOC ]; then
  echo "Enter the database password for user '$DB_USER'"
  echo -n "Password: "
  read -s PASS

  # set local database connection
  # -i means edit file in-place
  # we search only lines 85-90 in the settings file
  sed -i '' 85,90s#username:password@localhost/databasename#$DB_USER:$PASS@localhost/$1# $DIR/$1/sites/$HOST/settings.php

  # set the base url
  sed -i  '' 108,$HOST/$1# $DIR/$1/sites/$HOST/settings.php

  echo "Creating database..."
  mysqladmin -u$DB_USER -p$PASS create $1
  $MYSQL_LOC -u$DB_USER -p$PASS $1 < $DIR/$1/database/database.4.1.mysql
  echo "Checking for local configuration..."
  if [ -r $LOCAL_SQL ]; then
    echo "Found local configuration; executing SQL"

echo "Done"
#opens Safari to the new site on OS X
open http://$HOST/$1


Subscribe to SysArchitects RSS