Benjamin Juang (ibneko) wrote,
Benjamin Juang

Rawr, phpbb2->phpbb3 conversion SUCKS.

I've been trying to convert a phpBB2 forum to a phpBB3 forum.

One of the biggest problems I keep running into is that posts get lost at a certain point. I'm not entirely sure why, or how.

But the issue I'm going to address here is that upgrading a LARGE forum database (ours is at 3 GB, with around 4 million posts) was really slow, especially at step 16 of 28, filling the phpbb_posts table.

Why? To ensure compatibility with other converters, other databases, different designs, the phpBB developers chose to use a SELECT query with LIMIT x, y. Where x = number of rows at which you want to start querying, and y = number of rows to return.

When x gets really big, the speed at which queries will come back will get really slow, in the tens of seconds range.

So instead, I hacked install_convert.php so that when it starts processing phpbb_posts, it'll disregard the skip_rows parameter (which establishes 'x') and instead set x to 0, while adding to the WHERE part of the query a condition stating that we only want to select posts with post_id greater than the largest post_id we have in our new phpBB3 phpbb_posts table.

Here's a diff.

tora:~ benjamin$ diff -w -u /Users/benjamin/phpBB3/install/install_convert.php ~/Library/Caches/BBEdit\ FTP\ Temp/install_convert-3.php 
--- /Users/benjamin/phpBB3/install/install_convert.php	2008-12-12 16:20:38.000000000 -0800
+++ /Users/benjamin/Library/Caches/BBEdit FTP Temp/install_convert-3.php	2009-02-01 19:06:31.000000000 -0800
@@ -58,10 +58,10 @@
 	// Batch size, can be adjusted by the conversion file
 	// For big boards a value of 6000 seems to be optimal
-	var $batch_size = 2000;
+	var $batch_size = 12000;
 	// Number of rows to be inserted at once (extended insert) if supported
 	// For installations having enough memory a value of 60 may be good.
-	var $num_wait_rows = 20;
+	var $num_wait_rows = 120;
 	// Mysqls internal recoding engine messing up with our (better) functions? We at least support more encodings than mysql so should use it in favor.
 	var $mysql_convert = false;
@@ -1171,6 +1171,12 @@
 			// Where
 			$sql .= (!empty($schema['where'])) ? "\nWHERE (" . $schema['where'] . ')' : '';
+            if ($current_table == 16){
+			    $hack_result = $src_db->sql_query("SELECT post_id FROM fluff_forumdb_phpbb3.phpbb_posts ORDER BY post_id DESC LIMIT 1");
+                $hack_row = $src_db->sql_fetchrow($hack_result);
+			    $last_post_id = ($hack_row['post_id'] ? $hack_row['post_id'] : 0);
+                $sql .= " AND posts.post_id > $last_post_id ";
+            }
 			// Group By
 			if (!empty($schema['group_by']))
@@ -1230,7 +1236,11 @@
 				// Take skip rows into account and only fetch batch_size amount of rows
+				if ($current_table == 16){
+                    $___result = $src_db->sql_query_limit($sql, $convert->batch_size, 0);
+                } else {
 				$___result = $src_db->sql_query_limit($sql, $convert->batch_size, $skip_rows);
+                }
 				if ($convert->mysql_convert && $same_db)
@@ -1403,6 +1413,10 @@
 			if (still_on_time() && $counting < $convert->batch_size/* && !defined('DEBUG_EXTRA')*/)
 				$skip_rows = 0;
+                if ($current_table == 16){
+                    $msg = sprintf("step 17 done  WAIT");
+                    return;
+                }

Mind, I'm still running this right now, so I don't know if it definitely works*, but estimated runtime for my posts table is down to 4 hours and it's holding a steady 330/s row processing rate (as opposed to starting at 330/s and dropping down to less than 100/s row and taking over 14 hours).

*as in, there may be errors down the road?

[edit] As far as I could tell, this worked perfectly.
So if the issue you're experiencing with upgrading phpBB 2.0 to phpBB 3.0 is due to the fact that the upgrade process is too slow at step 17 of 28 (phpbb_posts table), this patch should be safe to use. It will decrease processing time by a significant amount: I think processing step 17 was cut down from 14-16 hours down to a bit over 4 hours.
Tags: hacking, mysql, php, phpbb

Comments for this post were disabled by the author