Log in

View Full Version : Extract links from a external webpage



jacksont123
01-29-2007, 07:23 PM
How do I extract links from an external webpage and display them with php?

Example:
I would have php extract all the links from: http://www.php-mysql-tutorial.com/
and display the links in the php file.

mburt
01-29-2007, 07:42 PM
Sorry, but as far as I know of, you can't extract/open anything from an external webpage.
This would be the logical way to do it:

$page = file("http://www.php-mysql-tutorial.com/");
foreach ($page as $num => $lines) {
$links = strpos($lines,"<a");
$linksend = strpos($lines,"</a>");
echo substr($lines,$links,$linksend+3);
}
But you can't get contents from external pages... I'll keep looking, but I don't think it's possible.

jacksont123
01-29-2007, 08:10 PM
Sorry, but as far as I know of, you can't extract/open anything from an external webpage.
This would be the logical way to do it:

$page = file("http://www.php-mysql-tutorial.com/");
foreach ($page as $num => $lines) {
$links = strpos($lines,"<a");
$linksend = strpos($lines,"</a>");
echo substr($lines,$links,$linksend+3);
}
But you can't get contents from external pages... I'll keep looking, but I don't think it's possible.
Thats exactly what I wanted. Thanks.
Is it possible to make one that only extracts certain links in a folder.
ie: one that would extract all links in the folder "jump"

mburt
01-29-2007, 08:15 PM
Sure. Edit the search query:

$links = strpos($lines,"<a href=\"subfolder here\\");

Twey
01-29-2007, 08:35 PM
For a more robust solution, the PHP5 DOM module (http://www.php.net/dom) is capable of parsing HTML. You can then use getElementsByTagName() to find all the links in a document.

If you have PHP4, you'll need to use php-html (http://php-html.sourceforge.net/). Then,
<?php
include('htmlparser.inc');

$urls = array();
HtmlParser remotePage = new HtmlParser(file_get_contents('http://www.example.com/some/page.html'));

while($remotePage->parse())
if($remotePage->iNodeType == NODE_TYPE_ELEMENT && strtolower($remotePage->iNodeName) == "a" && isset($remotePage->iNodeAttributes['href']))
array_push($urls, $remotePage->iNodeAttributes['href']);
?>You'll then have a nice list of URLs in $urls.
But you can't get contents from external pages...You will, however, need to have allow_url_fopen enabled in your php.ini.

/EDIT: XML_HTMLSax (http://pear.php.net/package/XML_HTMLSax) is also an option. Probably a better one, since it's been accepted into PEAR.

mburt
01-29-2007, 09:30 PM
Wow... there's alot of PHP functions there that are DOM related :eek:

jacksont123
02-03-2007, 10:57 PM
whoops. wrong topic. sorry