Results 1 to 3 of 3

Thread: Simple HTML DOM can't get children

  1. #1
    Join Date
    Apr 2008
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Simple HTML DOM can't get children

    Trying to use Simple HTML DOM to scrape. Very impressed with the design and the doc but it doesn't seem to work for me:

    HTML snippet from my target page, I am interested in getting the href attribute the second
    HTML Code:
     a
    element.
    Code:
    <div class="column span-1">
      <a href="/archive?m=s&i=37394">
         <img width="30" height="30" src="/static/img/left_big_enabled.png" alt="previous item" style="margin-top: 70px;"/>
      </a>
      <a href="/archive?m=s&i=7837">
         <img width="30" height="15" src="/static/img/left_ff_enabled.png" alt="first item" style="margin-top: 40px;"/>
      </a>
    </div>
    This is the only div of class span-1 btw.
    Here is my code, this 2 lines seems to work:

    PHP Code:
    $html = new simple_html_dom();
    $html->load_file($url); 
    But trying to drill down to find what I want? First I tried
    PHP Code:
    $ret $html->find('img[src=/static/img/left_ff_enabled.png]')->parent(); 
    but that says I was trying to apply parent method to non object.

    This code
    PHP Code:
    foreach ($html->find('div.span-1',0) as $item) {
       
    print_r($item);

    or the same thing without 0 as second argument of find

    outputs pages and pages of repetitive stuff with lots of RECURSION in it, but it never outputs the object I really want
    which is the second one with the href target being /archive?m=s&i=7837 I search for 7837 in the output and it doesn't show up. It seems to me like this might be buggy and it is recursing infinitely or way too big and slow anyway. I have read all the examples and the documentation

    I tried ->children(1) zero based, right? and that prints nothing
    I tried ->children(0) and that prints that huge amount of stuff.
    And it seems to be backing UP the tree and pulling in things that are from completely different branches. This was supposed to make scraping easier not harder but maybe I am missing something. what am I doing wrong. Thank you.

  2. #2
    Join Date
    Apr 2008
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default well it is solved

    I/m not quite sure how but I got the original idea ->parent()->href to work.

    Maybe there was an invisible typo in the first try, who knows.

  3. #3
    Join Date
    Feb 2010
    Location
    Moscow, Russia
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    $html->find() always returns an array, even if there is only one element in it. So, just get the first element of the array.

    Make something like this:
    PHP Code:
    $images $html->find('img[src=/static/img/left_ff_enabled.png]');
    $image $images[0];
    $ret $image->parent(); 

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •