Page 2 of 2 FirstFirst 12
Results 11 to 15 of 15

Thread: Rip text from website

  1. #11
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    That's a good start and there are a lot of options as John says.

    One thing I can add is that to use file_get_contents(), or really any functions that would do the same thing, you need to set your server to allow external URLs. If that parameter is disabled it will only work for local files. If you have problems with it, check this. If it works, then your server is already configured to handle it.

    The parameter is called allow_url_fopen. There might be some more ways to configure all of it, but generally speaking that should be enough info for now.


    Also, I want to give you some more advice about the nature of your request: while your purposes may be entirely legal, the methods you are using could be potentially against the terms of service for the website. While I don't personally care that much what you do, the station from which you are taking this data certainly will. They probably want to protect this data just as much as you want to keep it. The way that this might become an issue is that you aren't "listening to the radio" but instead are automating a process in an unusual way. It is normal to listen to the radio. It is not normal to create an automated script that just checks the playlist and skips actually listening to the music. This means that your behavior (rather, your server's behavior) may get flagged by their server as an automated request and they may block you. This entirely depends on how obvious it is (that the behavior of your server is different from a regular visitor) and how aware they are of the activity on the server. Furthermore, they could even just check that the IP of the request is the same as a server and therefore is probably an automated request, rather than an IP without a website attached to it-- probably a regular user.
    Do what you'd like, but be aware that the legal situation is complex, not just in what you do, but also how you do it. And regardless of legality, a host can block you if they feel like it, and if that happens there's nothing you can do about it (as far as I know, there are no legal rights associated with being able to access a domain).
    Of course one simple option for all of this is to just ask them to work with you, letting you have access to their playlists (either through this method or another) and perhaps you could trade. They might say no, but in that case at least you'd know for sure that you aren't going to randomly get blocked, or even sued (you don't need to be anything illegal to get sued, just to be proven guilty). I imagine that without asking for permission this could eventually lead to a cease and desist letter which basically means "stop or we'll try something else". At that point, of course, you could look for another solution, but getting their cooperation beyond that is unlikely, I'd guess.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  2. #12
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    I didn't think of that:

    Quote Originally Posted by Marcymarc View Post
    it would need to be set to run every 5 minutes for example or any numeral that I specify.
    Even every 5 minutes could possibly be construed as an assault on their bandwidth. Do it much more frequently and it almost certainly would be. If they discovered it, and that wouldn't be too hard for them to do, they would have to block it.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  3. #13
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Yes, and if you do it that often, that dramatically increases the odds that that particular connection will be noticed by the host and they will block you. As I said, regardless of legality, they can block you if you want, so the only way to stop that from happening is to "play nice" (and if you choose to, ask permission).
    Plus, does the playlist actually change every 5 minutes? There must be a more efficient way to approach this.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  4. #14
    Join Date
    Sep 2010
    Posts
    12
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Default

    I won't be running this script all the time, maybe for about 3 hours here, and 2 hours there. Can be set to run about 5-10 minutes, I would need to change this setting.

    If I run this script that John posted, it captures the whole site, all of it, but not the songs, it captures the text on the page as id="LastPlayedArtist-1, id="LastPlayedArtist-2 and id="LastPlayedArtist-3 etc. Same as when you view it in a browser and view source.

    So all I want it just the actual songs, but not all the other HTML code with it. The PHP would have to read it in such a way so that it would generate the songs.

    If you can please put a sample script together, that would be very helpful.
    Last edited by Marcymarc; 10-11-2010 at 07:18 AM.

  5. #15
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    That's because those values are generated by javascript. They're not part of the source code of the page. To capture something like that in an automated fashion, this (requires Excel and VBA):

    http://www.associatedcontent.com/art...ba.html?cat=55

    looks promising.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •