Background: For a current project, I need to do a lot of number crunching. I'm scoring data for an experiment, and there are about 70 participants, with a full set of data each. To calculate everything for one person, it takes about 3 seconds. (There are a bunch of items, then it's all added up, etc.) Multiply that by 70 and it's over the 30 second limit for most servers, plus I don't want to stress the server that much/often anyway. And while that's going on, we're still actively scoring the data and it's always changing, but we want to preview the results as we go.
The second detail that is important is that a lot of these calculations are recursive (so I might look at one section, or an average across all sections, etc.).
Solution: I needed something that could cache these calculations rather than reprocessing them. But if the data changes, I need it to be up to date, so I can't use any caching that would have a delay. It also shouldn't be limited to just database-wide changes, but able to handle one part of the database changing and the rest staying the same. I didn't find anything obvious in PHP or MySQL that fit exactly, so I wrote the following.
Outline:
1) Track activity as the data is entered into the database. If you're running a car dealership, just track activity for whoever enters the new cars into the database.
2) Whenever you calculate the final result (for any significant calculation), store it in the database.
3) Before calculating it, though, check if it's already in the database and see if the time on that calculation is later than the last activity that modified the database.
So here's what I ended up with.
My function base:
PHP Code:
<?php
function safeimplode($a,$sepatator) { //implode only works on strings and numbers
if (!is_array($a)) { return false; }
foreach($a as $c=>$b) {
if (!(is_numeric($b)||gettype($b)=='string')) {
$a[$c] = gettype($b);
}
}
return implode($a,$sepatator);
}
function cache_functionid($function,$args) { //generate a unique and human-readable identifier for the calculation-- the function's name, plus its arguments
return $function.'('.safeimplode($args,',').');'.md5(serialize($args));
}
function cache_store($f,$v) {
$f = mysql_real_escape_string($f);
$v = mysql_real_escape_string($v);
mysql_query('DELETE FROM `mycache` WHERE `function`=\''.$f.'\'');
mysql_query('INSERT INTO `mycache` (`function`,`value`,`date`) VALUES (\''.$f.'\',\''.$v.'\',\''.time().'\')');
}
function cache_get($f,$after) { //get the value for a cached function, after the time $after
$f = mysql_real_escape_string($f);
$after = mysql_real_escape_string($after);
$q = mysql_query('SELECT `value` FROM `mycache` WHERE `function`=\''.$f.'\' AND `date`>\''.$after.'\'');
if ($r = mysql_fetch_assoc($q)) {
return $r['value'];
}
return false;
}
function lastactivity($id) { //this function must be adjusted to your system!
$id = mysql_real_escape_string($id);
$q = mysql_query('SELECT `date` FROM `myactivity` WHERE `id`=\''.$id.'\' ORDER BY `date` DESC');
if ($r = mysql_fetch_assoc($q)) {
return $r['date'];
}
return 0;
}
?>
One benefit you can see here is that this isn't just MySQL caching. It's also possible to cache all of the calculations done with PHP (including looping through many things, etc.).
Now, to its use in a calculation:
PHP Code:
function calculate_score($argument1,$arg2) {
$args = func_get_args(); //get the arguments; this must be done here, not in another function call due to PHP, at least PHP<5.3
$fid = cache_functionid(__FUNCTION__,$args); //get a unique name for this calculation
$cached = cache_get($fid,lastactivity($id)); //is the cached version available?
if ($cached!==FALSE) { return $cached; } //if so, just use that, stop
//DO SOME BIG CALCULATIONS HERE!
$score = 'example'; //calculating!...
cache_store($fid,$s); //store this in the cache for next time
return $s; //return the final result
}
The tricky part is getting the consistent name for the function and arguments, but what I have above works. It just needs to be done in a slightly weird way due to limitations in PHP.
This should be applied to all functions with complicated calculations, and even at multiple levels of layering-- if you're calculating several things recursively (as I am) then this really helps because it no longer needs to recalculate, for example, each individual score and the average, but it can actually reuse those calculations.
Now if I load a page once it may take a very long time (or even time out, and I need to restart it to keep going with all of the calculations), but then it will all be in place. In fact, once that's done, it shouldn't ever need to recalculate everything because most of it will remain constant (not all of the data changes all the time, just gradually as we enter more things).
So although that first load is slow, every time after that it's almost instantaneous.
Questions and comments are welcome. It's working for me at the moment, and I haven't had any problems.
Bookmarks