Trying to use Simple HTML DOM to scrape. Very impressed with the design and the doc but it doesn't seem to work for me:
HTML snippet from my target page, I am interested in getting the href attribute the second element.
Code:
<div class="column span-1">
<a href="/archive?m=s&i=37394">
<img width="30" height="30" src="/static/img/left_big_enabled.png" alt="previous item" style="margin-top: 70px;"/>
</a>
<a href="/archive?m=s&i=7837">
<img width="30" height="15" src="/static/img/left_ff_enabled.png" alt="first item" style="margin-top: 40px;"/>
</a>
</div>
This is the only div of class span-1 btw.
Here is my code, this 2 lines seems to work:
PHP Code:
$html = new simple_html_dom();
$html->load_file($url);
But trying to drill down to find what I want? First I tried
PHP Code:
$ret = $html->find('img[src=/static/img/left_ff_enabled.png]')->parent();
but that says I was trying to apply parent method to non object.
This code
PHP Code:
foreach ($html->find('div.span-1',0) as $item) {
print_r($item);
}
or the same thing without 0 as second argument of find
outputs pages and pages of repetitive stuff with lots of RECURSION in it, but it never outputs the object I really want
which is the second one with the href target being /archive?m=s&i=7837 I search for 7837 in the output and it doesn't show up. It seems to me like this might be buggy and it is recursing infinitely or way too big and slow anyway. I have read all the examples and the documentation
I tried ->children(1) zero based, right? and that prints nothing
I tried ->children(0) and that prints that huge amount of stuff.
And it seems to be backing UP the tree and pulling in things that are from completely different branches. This was supposed to make scraping easier not harder but maybe I am missing something. what am I doing wrong. Thank you.
Bookmarks