Log in

View Full Version : making a proxy



bluewalrus
07-07-2010, 12:06 AM
So I can get the code of most pages with this code but I can't keep form requests from originating page from going to back there. I don't think there is an easy way to do this but am just wondering.


<?php
if (!isset($_POST['URL'])) {
?>
<form action="#" method="post">
URL: <input type="text" name="URL"><br />
<input type="submit" />
</form>
<?php
}
if (isset($_POST['URL'])) {
$page = file_get_contents($_POST['URL']);
$page = str_replace("<head>", "<head>\n<base href=\"" . $_POST['URL'] . '" />', $page);
$page = str_replace("<HEAD>", "<head>\n<base href=\"" . $_POST['URL'] . '" />', $page);
echo $page;
}
?>

djr33
07-07-2010, 07:16 PM
That is only the most basic way to approach fixing it, and there's not going to be any easy way.

You'll need something much more complex than that, since there will be images and other concerns. I believe that images that way would be searched for on your server, for example.

Also, a base href will only work on relative links. Absolute links or "/" links will do something else (go to the main site, or go to your root, respectively).

I played around with this idea a while ago and I got it working well with just the text... that was enough for me at the time. But creating a full proxy server would require, I think, some other language than PHP, at least if you want to do it on the scale of a real proxy site. PHP is theoretically capable, but having something that just forwards requests would be a lot easier. Considering streaming, for example. Or just media files... that's complex.

If this is just for fun, keep playing with it. If not, I'd suggest a very different approach (even though I really like PHP).


As for parsing the page, what you have would work to some degree, but to make it really effective you'd need to loop through every href or src on the page and change paths (including absolute URLs). I suppose an xml parser might work for xhtml sites, or something similar for html.


One possibility would be to consider using Javascript (injected, the same way you're adding the base tag above), and using that onUnload to check any URL against some algorithm and proceed from there. That's a good option, though complex, and wouldn't really help with loading images or similar things.

katierosy
07-08-2010, 12:51 PM
you are posting URL from the originating page as it appears in this variable : $_POST['URL'];may be you do not want to post this url from client, because you already know the url.

In this case perhaps you want to keep all url in an array or in mysql database and read and process each "URL" there after.

Using "str_replace" in this case is good to us. There are other methods available in regular expressions, and using strpos also seen in some cases of web scrapping , one should go with a method which is simple and easy in the present situation.