Results 1 to 10 of 10

Thread: Save a complete webpage *and the pages it links to*

  1. #1
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default Save a complete webpage *and the pages it links to*

    I'm a member of another forum, and my PM inbox is nearly full. It uses outdated phpBB software with no "export" option, and saving these messages would require manually saving every single one, and the result would be an unorganized mess.

    I've been wondering if there's a way to save a webpage *and the pages it links to*, so that when I click "file>save" it actually saves all of those links. Of course technically it probably links to all of the pages on the internet, if we follow each link, then each link at that link, and so forth. But I could either set it to just one-level, or to a limited directory, or whatever.

    Another way of phrasing this is whether it's possible to save all files in a certain directory via HTTP rather than FTP, assuming all of the links are known/knowable. (I'm not trying to access hidden files.) Technically, this would include PHP-generated dynamic pages, such as those at a forum.

    Of course this could potentially be used for bad purposes (stealing an entire forum), but let's focus on the good uses.

    I know I could write this myself in PHP, but it would be a pain, and I'd probably be better off just saving all of the messages manually.

    Any ideas?
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  2. #2
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    29,130
    Thanks
    44
    Thanked 3,228 Times in 3,189 Posts
    Blog Entries
    12

    Default

    I think you can set something like that up on a server using PHP and an IE browser, but it's complicated.

    What is done, if your server is Windows based, it should have IE installed on it or you can install it if it's not, you can then use PHP to drop to the OS and run the browser, navigate to the links one by one by 'reading' them off of the page, capturing the each page to a file as you go. You could even concat to one file.

    But it just occurs to me typing this, that it might be easier to get the links using a file_get_contents on the main index page for your mailbox and parsing that to get the links you're interested in, then do a loop on those to file_get_contents on each that writes them to a single file, something like (for inside the loop):

    PHP Code:
    file_put_contents('mymail.txt'file_get_contents('protocol://domain.com/path/thispage.ext'), FILE_APPEND); 
    You could replace file_get_contents('protocol://domain.com/path/thispage.ext') in the above with a subroutine to get just the text part of the message if you like.

    Just a general idea.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  3. The Following User Says Thank You to jscheuer1 For This Useful Post:

    djr33 (06-06-2013)

  4. #3
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    I could write it myself (using file_get_contents(), or something along those lines), but this would be somewhat complicated because:
    1. I'd rather just have a program that does it for me (if such a program exists-- I've found a couple that claim to do things like that on Google, although I'm not sure how well they work. I'm still looking around).
    2. This particular case involves cookies (I need to be logged in), so that makes using PHP a little trickier. Not impossible, just trickier.
    3. I'd like to have the links automatically set. It wouldn't be too hard to just loop through and save everything, but I'd like to have, for example, the timestamps from the private messages still useful for organizing them.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  5. #4
    Join Date
    Jan 2011
    Location
    Southeastern CT
    Posts
    596
    Thanks
    43
    Thanked 28 Times in 28 Posts

    Default

    The firefox add on "down them all" could be something like what you want.It will down load links on a webpage and images...

    http://www.downthemall.net/

    I wonder if it could help you somehow Daniel?
    Thanks,

    Bud

  6. The Following User Says Thank You to ajfmrf For This Useful Post:

    djr33 (06-07-2013)

  7. #5
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    1,740
    Thanks
    82
    Thanked 90 Times in 88 Posts

    Default

    djr33, are you referring to the addon Scrapbook for Firefox?

    My internet used to go out frequently at my last residence so I used it to download a website or two so that I could get my internet fix when the internet went out for a day or two. I have not used it for a while since the internet has been better at my current location.
    To choose the lesser of two evils is still to choose evil. My personal site

  8. The Following User Says Thank You to james438 For This Useful Post:

    djr33 (06-07-2013)

  9. #6
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    ajfmrf, that's almost perfect. I actually already had that for other reasons (to manage downloads), but I hadn't realized there was an option like it. Basically, it automates the saving-each-page process, which is close to what I want. The only problem is that it doesn't update all the hyperlinks so that the pages are linked together. That would be the most desirable feature, although for the moment that's sufficient to back up the information. (There's also a small problem that it only downloads the actual .html files, not external .css and so forth, so the formatting is a bit off. But in this case it doesn't matter too much for me.)

    James, I'll have to check that out. It sounds like a good option
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  10. #7
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    James, that's really cool. It works well. But in this case I must be logged in to view my PMs, and the repeated automated requests appear to cause the forum to log me out automatically (not sure why-- maybe to prevent what I'm trying to do, or maybe "security"). If I can work around that, I'll be happy with it. If not, it'll be useful for other (non-login) things.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  11. #8
    Join Date
    Jun 2013
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Please let me know how you get on with this.

    I was trying to dl a site to my harddrive, but had the same problem you came across djr33. I used the downthemall! firefox plugin, however, when checking the saved .html files it obviously not saving the session, ie logging me out.

  12. #9
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    No update. I didn't figure out any way around the login problem, but for the moment my PM box on the other forum isn't full any more, so it's not a concern. I'm still casually looking for a solution, but mostly working on other projects. I'll reply here if I happen to come across anything though.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  13. #10
    Join Date
    Jun 2013
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by djr33 View Post
    No update. I didn't figure out any way around the login problem, but for the moment my PM box on the other forum isn't full any more, so it's not a concern. I'm still casually looking for a solution, but mostly working on other projects. I'll reply here if I happen to come across anything though.

    LOL yeah, I just figured what the hell as well. I think creating a script is a bit extreme, as there must be a simple way to do this that's already been done.

Similar Threads

  1. Extract links from a external webpage
    By jacksont123 in forum PHP
    Replies: 6
    Last Post: 02-03-2007, 10:57 PM
  2. Replies: 12
    Last Post: 12-29-2006, 10:38 PM
  3. HELP! Links on my webpage aren't working
    By sandmountaingirl in forum Other
    Replies: 4
    Last Post: 05-27-2005, 05:27 PM
  4. Replies: 5
    Last Post: 05-26-2005, 04:21 PM
  5. Replies: 1
    Last Post: 01-06-2005, 10:40 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •