![]() |
John VanDyk has been innovating with information technology for more than 20 years. Read more... |
Deduplicating Drupal history table
Recently I had to do a migration of an old Drupal site with 4.6 million rows in the history table. The history table records when a user has viewed a node. A record looks like this:
nid | uid | timestamp |
---|
Because old versions of Drupal didn't enforce a unique index on the nid-uid combination, sometimes race conditions let two records in with the same nid-uid combo but different timestamps. I got rid of duplicate records with the following process.
1. Find out how many duplicates we have:
SELECT nid, uid, count( * ) AS n
FROM history
GROUP BY nid, uid
HAVING n > 1;
In this case I had 276 records.
2. Since I didn't really care which of the two duplicate records was kept, I could use this approach to quickly throw out duplicates:
ALTER IGNORE TABLE history ADD UNIQUE INDEX unique_index ( uid, nid );
3. Now I discarded the unique index, since I was only using it as a tool to remove duplicates. When I import the data into a modern Drupal, uniques will be enforced anyway.
ALTER TABLE history DROP INDEX unique_index;
4. Repeat step 1, with 0 records in the result set.
- Log in to post comments