Schmoopy
08-23-2009, 12:34 PM
Hey there, I'm trying to get information from products on amazon, and am using an HTML parser to do so, but what I don't understand is that 2 URLs come out with the same contents, even though they are different.
The HTML parser I'm using can be found here: http://simplehtmldom.sourceforge.net/index.htm
Here are the amazon URLs:
http://www.amazon.co.uk/s/ref=nb_ss?url=search-alias%3Dstripbooks&field-keywords=9780140311082&x=0&y=0
The number is generated from a barcode scanner, but the URL above is the output of the search, whereas the URL below is if I had clicked through to the first link on the results page:
http://www.amazon.co.uk/Worst-Witch-Puffin-Books/dp/0140311084/ref=sr_1_1?ie=UTF8&s=books&qid=1251029691&sr=1-1
If you click those 2 URLs, you'll see the content is completely different...
However, when I use the HTML parser I get them returning the same data:
$html = file_get_html('http://www.amazon.co.uk/s/ref=nb_ss?url=search-alias%3Dstripbooks&field-keywords=9780140311082&x=0&y=0');
$other = file_get_html('http://www.amazon.co.uk/Worst-Witch-Puffin-Books/dp/0140311084/ref=sr_1_1?ie=UTF8&s=books&qid=1251029691&sr=1-1');
echo $html->plaintext;
echo $other->plaintext;
// They will both echo the same data, even though they are different URLs
It's as if the parser is clicking through automatically on the first page, and going straight through to the product information. Anyone who can explain this?
The HTML parser I'm using can be found here: http://simplehtmldom.sourceforge.net/index.htm
Here are the amazon URLs:
http://www.amazon.co.uk/s/ref=nb_ss?url=search-alias%3Dstripbooks&field-keywords=9780140311082&x=0&y=0
The number is generated from a barcode scanner, but the URL above is the output of the search, whereas the URL below is if I had clicked through to the first link on the results page:
http://www.amazon.co.uk/Worst-Witch-Puffin-Books/dp/0140311084/ref=sr_1_1?ie=UTF8&s=books&qid=1251029691&sr=1-1
If you click those 2 URLs, you'll see the content is completely different...
However, when I use the HTML parser I get them returning the same data:
$html = file_get_html('http://www.amazon.co.uk/s/ref=nb_ss?url=search-alias%3Dstripbooks&field-keywords=9780140311082&x=0&y=0');
$other = file_get_html('http://www.amazon.co.uk/Worst-Witch-Puffin-Books/dp/0140311084/ref=sr_1_1?ie=UTF8&s=books&qid=1251029691&sr=1-1');
echo $html->plaintext;
echo $other->plaintext;
// They will both echo the same data, even though they are different URLs
It's as if the parser is clicking through automatically on the first page, and going straight through to the product information. Anyone who can explain this?