Running Batch Processes in an Update Hook... in Bed

Sometimes you have to do a whole lot of stuff during an update. Sure you could just write a standard hook_update_n() function and try processing 50,000 nodes but if you do you're almost guaranteed to run out of memory or timeout or generally make your web servers unhappy. Luckily you can use Drupal's batch API within your update hook. Actually that's a bit of a misnomer because the update process is already a batch process, but lets not get hung up on the details. FYI, this is often referred to as a multi-pass update.

The first step is to realize that the hook_update_n() function has a little-known argument called $sandbox. The only thing you need to do to make your update hook run multiple times during a Drupal update is to set $sandbox equal to a decimal between 0 and 1. As an example, lets say we want our update to run five times. We could write code like this:

function mymodule_update_7001(&$sandbox) {
  $sandbox['total'] = 0;
  $sandbox['current'] = isset($sandbox['current']) ? $sandbox['current']++ : 0;

  // If finished == 1, then Drupal knows we have finished all the passes needed by this update.
  $sandbox['#finished'] = $sandbox['current'] / $sandbox['total'];

Note that the $sandbox variable is passed-by-reference (that's what the "&" is for) so that if you change it, the change will persist between passes. Now lets use a slightly (read very slightly) more realistic example. What if we wanted to add the words "in bed" to the end of the title of every node of type "fortune_cookie":

function mymodule_update_7001(&$sandbox) {
  // If this is the first pass through this update function then set some variables.
  if (!isset($sandbox['total'])) {
    $result = db_query('SELECT nid FROM {node} WHERE type="fortune_cookie"');
    $sandbox['total'] = $result->rowCount();
    $sandbox['current'] = 0;
  }

  // How many nodes should be processed per pass. The higher this number is, the faster your update will
  // complete, but the more likely your server will run out of memory or timeout.
  $nodes_per_pass = 10;

  // Get the nodes to process during this pass.
  $result = db_query_range('SELECT nid FROM {node} WHERE type="fortune_cookie"', $sandbox['current'], $nodes_per_pass);
  while ($row = $result->fetchAssoc()) {
    // Load the node, edit its title, and save the node.
    $node = node_load($row['nid']);
    $node->title = $node->title . ' in bed';
    node_save($node);

    // Lets tell the site admin what we are doing. You could write to a log here, or a watchdog message or whatever...
    drupal_set_message(t('We processed node @nid', array('@nid' => $node->nid)));

    // Increment "current" by 1.
    $sandbox['current']++;
  }

  // Set the value for finished. If current == total then finished will be 1, signifying we are done.
  $sandbox['#finished'] = ($sandbox['current'] / $sandbox['total']);

  if ($sandbox['#finished'] === 1) {
    drupal_set_message(t('We processed @nodes nodes. DONE!!!', array('@nodes' => $sandbox['total'])));
  }
}

And voila, we've processed 50,000 nodes ... in bed.