Advanced Search

Results 1 to 4 of 4

Thread: Save a complete webpage *and the pages it links to*

  1. #1
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    11,830
    Thanks
    231
    Thanked 659 Times in 647 Posts

    Default Save a complete webpage *and the pages it links to*

    I'm a member of another forum, and my PM inbox is nearly full. It uses outdated phpBB software with no "export" option, and saving these messages would require manually saving every single one, and the result would be an unorganized mess.

    I've been wondering if there's a way to save a webpage *and the pages it links to*, so that when I click "file>save" it actually saves all of those links. Of course technically it probably links to all of the pages on the internet, if we follow each link, then each link at that link, and so forth. But I could either set it to just one-level, or to a limited directory, or whatever.

    Another way of phrasing this is whether it's possible to save all files in a certain directory via HTTP rather than FTP, assuming all of the links are known/knowable. (I'm not trying to access hidden files.) Technically, this would include PHP-generated dynamic pages, such as those at a forum.

    Of course this could potentially be used for bad purposes (stealing an entire forum), but let's focus on the good uses.

    I know I could write this myself in PHP, but it would be a pain, and I'd probably be better off just saving all of the messages manually.

    Any ideas?
    Daniel - Freelance Web Design | <?php?> | <html>| Deutsch | italiano | español | português | català | un peu de français | Ninasoma Kiswahili | 日本語の学生でした。| درست العربية

  2. #2
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    27,637
    Thanks
    42
    Thanked 2,896 Times in 2,868 Posts
    Blog Entries
    12

    Default

    I think you can set something like that up on a server using PHP and an IE browser, but it's complicated.

    What is done, if your server is Windows based, it should have IE installed on it or you can install it if it's not, you can then use PHP to drop to the OS and run the browser, navigate to the links one by one by 'reading' them off of the page, capturing the each page to a file as you go. You could even concat to one file.

    But it just occurs to me typing this, that it might be easier to get the links using a file_get_contents on the main index page for your mailbox and parsing that to get the links you're interested in, then do a loop on those to file_get_contents on each that writes them to a single file, something like (for inside the loop):

    PHP Code:
    file_put_contents('mymail.txt'file_get_contents('protocol://domain.com/path/thispage.ext'), FILE_APPEND); 
    You could replace file_get_contents('protocol://domain.com/path/thispage.ext') in the above with a subroutine to get just the text part of the message if you like.

    Just a general idea.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  3. The Following User Says Thank You to jscheuer1 For This Useful Post:

    djr33 (Yesterday)

  4. #3
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    11,830
    Thanks
    231
    Thanked 659 Times in 647 Posts

    Default

    I could write it myself (using file_get_contents(), or something along those lines), but this would be somewhat complicated because:
    1. I'd rather just have a program that does it for me (if such a program exists-- I've found a couple that claim to do things like that on Google, although I'm not sure how well they work. I'm still looking around).
    2. This particular case involves cookies (I need to be logged in), so that makes using PHP a little trickier. Not impossible, just trickier.
    3. I'd like to have the links automatically set. It wouldn't be too hard to just loop through and save everything, but I'd like to have, for example, the timestamps from the private messages still useful for organizing them.
    Daniel - Freelance Web Design | <?php?> | <html>| Deutsch | italiano | español | português | català | un peu de français | Ninasoma Kiswahili | 日本語の学生でした。| درست العربية

  5. #4
    Join Date
    Jan 2011
    Location
    Southeastern CT
    Posts
    496
    Thanks
    32
    Thanked 21 Times in 21 Posts

    Default

    The firefox add on "down them all" could be something like what you want.It will down load links on a webpage and images...

    http://www.downthemall.net/

    I wonder if it could help you somehow Daniel?
    Thanks,

    Bud

Similar Threads

  1. Extract links from a external webpage
    By jacksont123 in forum PHP
    Replies: 6
    Last Post: 02-03-2007, 09:57 PM
  2. Replies: 12
    Last Post: 12-29-2006, 09:38 PM
  3. HELP! Links on my webpage aren't working
    By sandmountaingirl in forum Other
    Replies: 4
    Last Post: 05-27-2005, 04:27 PM
  4. Replies: 5
    Last Post: 05-26-2005, 03:21 PM
  5. Replies: 1
    Last Post: 01-06-2005, 09:40 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •