PDA

View Full Version : Custom Dynamic (up-to-date) Caching for Queries/Calculations with MySQL and PHP



djr33
09-21-2012, 06:35 AM
Background: For a current project, I need to do a lot of number crunching. I'm scoring data for an experiment, and there are about 70 participants, with a full set of data each. To calculate everything for one person, it takes about 3 seconds. (There are a bunch of items, then it's all added up, etc.) Multiply that by 70 and it's over the 30 second limit for most servers, plus I don't want to stress the server that much/often anyway. And while that's going on, we're still actively scoring the data and it's always changing, but we want to preview the results as we go.
The second detail that is important is that a lot of these calculations are recursive (so I might look at one section, or an average across all sections, etc.).

Solution: I needed something that could cache these calculations rather than reprocessing them. But if the data changes, I need it to be up to date, so I can't use any caching that would have a delay. It also shouldn't be limited to just database-wide changes, but able to handle one part of the database changing and the rest staying the same. I didn't find anything obvious in PHP or MySQL that fit exactly, so I wrote the following.

Outline:
1) Track activity as the data is entered into the database. If you're running a car dealership, just track activity for whoever enters the new cars into the database.
2) Whenever you calculate the final result (for any significant calculation), store it in the database.
3) Before calculating it, though, check if it's already in the database and see if the time on that calculation is later than the last activity that modified the database.




So here's what I ended up with.

My function base:

<?php

function safeimplode($a,$sepatator) { //implode only works on strings and numbers
if (!is_array($a)) { return false; }
foreach($a as $c=>$b) {
if (!(is_numeric($b)||gettype($b)=='string')) {
$a[$c] = gettype($b);
}
}
return implode($a,$sepatator);
}

function cache_functionid($function,$args) { //generate a unique and human-readable identifier for the calculation-- the function's name, plus its arguments
return $function.'('.safeimplode($args,',').');'.md5(serialize($args));
}

function cache_store($f,$v) {
$f = mysql_real_escape_string($f);
$v = mysql_real_escape_string($v);
mysql_query('DELETE FROM `mycache` WHERE `function`=\''.$f.'\'');
mysql_query('INSERT INTO `mycache` (`function`,`value`,`date`) VALUES (\''.$f.'\',\''.$v.'\',\''.time().'\')');
}
function cache_get($f,$after) { //get the value for a cached function, after the time $after
$f = mysql_real_escape_string($f);
$after = mysql_real_escape_string($after);
$q = mysql_query('SELECT `value` FROM `mycache` WHERE `function`=\''.$f.'\' AND `date`>\''.$after.'\'');
if ($r = mysql_fetch_assoc($q)) {
return $r['value'];
}
return false;
}

function lastactivity($id) { //this function must be adjusted to your system!
$id = mysql_real_escape_string($id);
$q = mysql_query('SELECT `date` FROM `myactivity` WHERE `id`=\''.$id.'\' ORDER BY `date` DESC');
if ($r = mysql_fetch_assoc($q)) {
return $r['date'];
}
return 0;
}
?>

One benefit you can see here is that this isn't just MySQL caching. It's also possible to cache all of the calculations done with PHP (including looping through many things, etc.).

Now, to its use in a calculation:

function calculate_score($argument1,$arg2) {
$args = func_get_args(); //get the arguments; this must be done here, not in another function call due to PHP, at least PHP<5.3
$fid = cache_functionid(__FUNCTION__,$args); //get a unique name for this calculation
$cached = cache_get($fid,lastactivity($id)); //is the cached version available?
if ($cached!==FALSE) { return $cached; } //if so, just use that, stop

//DO SOME BIG CALCULATIONS HERE!
$score = 'example'; //calculating!...

cache_store($fid,$s); //store this in the cache for next time
return $s; //return the final result
}
The tricky part is getting the consistent name for the function and arguments, but what I have above works. It just needs to be done in a slightly weird way due to limitations in PHP.

This should be applied to all functions with complicated calculations, and even at multiple levels of layering-- if you're calculating several things recursively (as I am) then this really helps because it no longer needs to recalculate, for example, each individual score and the average, but it can actually reuse those calculations.

Now if I load a page once it may take a very long time (or even time out, and I need to restart it to keep going with all of the calculations), but then it will all be in place. In fact, once that's done, it shouldn't ever need to recalculate everything because most of it will remain constant (not all of the data changes all the time, just gradually as we enter more things).
So although that first load is slow, every time after that it's almost instantaneous.


Questions and comments are welcome. It's working for me at the moment, and I haven't had any problems.

jscheuer1
09-29-2012, 06:53 AM
Just one quick observation. This is a hot thread and has no replies. Well one now. But it's hot because a lot of people have viewed it.

djr33
09-29-2012, 07:21 AM
Hm. I wonder if that means that it's useful for people or if they read it and find it unhelpful and move on without replying ;)
But I do hope it's helpful to some. I wouldn't mind some feedback if anyone has insight about this type of situation (I admit it's a little specific to what I'm doing).

bernie1227
09-29-2012, 07:30 AM
I only saw the thread when John bumped it, but I think that this would definitely be helpful to people who want to cache queries.

djr33
09-29-2012, 07:57 AM
Thanks. I hope so. And just for the record, it's been working very well without any bugs at all and my website/system is now much faster than it was. It hasn't done the 30-second time-out thing at all since the first time I ran it (to get some values setup for many of the potential queries) and now it pauses for a moment while calculating the new values but clearly not recalculating everything. And then after loading any page once, it loads almost instantaneously the next time around.

This project is pretty cool because there are so many calculations and it is so slow that I can really see the caching working in action-- it's exactly the right amount of delay to show that it's doing what I expected (and not over- or under-caching).