Log in

View Full Version : Offline Pages



bluewalrus
04-08-2011, 02:44 PM
Does anyone know of a pre-existing code to pull a pages code, all resources of that page (images, javascript, css, etc.), and copy them to another directory and rewrite the references so the are pulled from the new directory or at least so the absolute ones become relative? If not I'm planning on making this with filegetcontents, mkdir(create the dir to hold the file/files), preg_replace(find all original resources, make relative), fileputcontents(put source of original page), and copy(copy resources to the new dir). Is there anything I'm missing here? This is all on the same server so security wont be an issue (or can be corrected if it is). Thanks.

djr33
04-08-2011, 03:24 PM
Nope, but that seems like a useful tool. Let me know if you need help conceptually.

As a general note, remember that HTML tags don't have a required order, so you could have <img src=....> or <img ....src=...> etc.

It seems fairly straightforward to do this (not easy, but systematic), with <img> <script> <link>, etc. But there's a problem: some resources are referenced within external .js or .css files. And that would be more difficult. For example, many scripts on DD would be missing media if you attempted this.

And that's not even thinking about iframes.

Note that there are some other embedded items like flash (or other plugins) that you might not immediately think of but could be important.

bluewalrus
04-10-2011, 01:41 AM
Thanks, I hadn't thought of calls inside of the resources will have to work that out as well. Do you know how I can send a variable defined elsewhere with the preg_replace_callback?


$css = preg_replace_callback("/<link.*?href=\"(.*?)\".*/i", "filtersource($domain)", $source);

function filtersource($location, $domain) {
}


This fails


Warning: preg_replace_callback() [function.preg-replace-callback]: Requires argument 2, 'filtersource(http://www.jove.com/)', to be a valid callback in /public_html/a.php on line 14

djr33
04-10-2011, 02:12 AM
I don't think you can do that. Maybe there is some way to use a class to make that work? Or a global variable...

Maybe this:

$class = new ClassObj;
$class->myvar = 'myval';

$css = preg_replace_callback("/<link.*?href=\"(.*?)\".*/i", "filtersource()", $source);

function filtersource($location) {

///this:
$myvar = $GLOBALS['class']->myvar;

}

There should be a better way, but I'm not sure what it is...


By the way, if you're looking for more information on the topic, you're basically doing the same thing as a proxy, attempting to take an HTML file and retrieve all necessary secondary files and transmit (in this case, save) them.

jscheuer1
04-10-2011, 10:22 AM
There's a getRelativePath function submitted by a user to php.net:

http://www.php.net/manual/en/function.realpath.php#97885

on the realpath page.

I've used it and it worked for what I wanted to do with it. It probably could be used/adapted to your purposes,

bluewalrus
04-14-2011, 04:53 PM
Do you know how I can send a variable defined elsewhere with the preg_replace_callback?


$css = preg_replace_callback("/<link.*?href=\"(.*?)\".*/i", "filtersource($domain)", $source);

function filtersource($location, $domain) {
}


This fails

Found it on php.net, "To access a local variable within a callback, use currying (delayed argument binding). For example"



<?php
function curry($func, $arity) {
return create_function('', "
\$args = func_get_args();
if(count(\$args) >= $arity)
return call_user_func_array('$func', \$args);
\$args = var_export(\$args, 1);
return create_function('','
\$a = func_get_args();
\$z = ' . \$args . ';
\$a = array_merge(\$z,\$a);
return call_user_func_array(\'$func\', \$a);
');
");
}

function on_match($transformation, $matches)
{
return $transformation[strtolower($matches[1])];
}

$transform = array('a' => 'Well,', 'd'=>'whatever', 'b'=>' ');

$callback = curry(on_match, 2);
echo preg_replace_callback('/([a-z])/i', $callback($transform), 'Abcd');

echo "\n";
?>

Didn't find this when I was at this point though so you can also use the global as djr33 suggests.