PDA

View Full Version : RSS Display Box



siluet
03-26-2011, 01:19 PM
If your thumbnail is somewhere else in the xml structure of the feed (other than in the description tag), we need to know where and find a way to retrieve it. If it comes to that, you could give us a link to the feed so we could check it out.

Note: The current simplepie.inc (it's been updated several times since this script came out) allows you to retrieve the full description without needing to be edited. But to switch to it, both outputbody.php and main.php would need to be edited because some of the function names have changed, as has the way that cache times are calculated. And it might be a bit more complicated, as the updated simplepie.inc does a better job at filtering out images, so part of that might have to be dealt with in configuring the feed with it as well.

I want to retrieve and display a full article (full-text content, images). Which parts of code needs to be tweaked for this purpose?

jscheuer1
03-26-2011, 02:15 PM
Did you try the advice from that thread?

siluet
03-26-2011, 05:47 PM
no I didn't, as this tweak is for retriving a description, whereas I need to retrieve a full-text content(full article). As far I understand, description is only part of xml structure, not a full content.

Thanks.

jscheuer1
03-26-2011, 07:10 PM
I'd say try it. If it doesn't get you what you want, it should at least be a jumping off point.

Feeds vary. Is that the feed you want to use?

siluet
03-26-2011, 08:59 PM
I have tried tweak from that thread, but its not working for me, I didn't find any changes in feed.

I thought it as some universal approach, but if its a feed-specific, then, for example, for these feeds:
http://rss.kenwood.co.jp/f6761/rss.xml
www.dpreview.com/feeds/latest.xml

jscheuer1
03-27-2011, 02:51 AM
This script cannot pull anything from the feed that isn't there. The Kenwood feed is very minimalistic. Here's a typical item in its raw xml format:


<item>
<title>Information on the Kenwood Booth at IWCE 2011</title>
<link>http://rss.kenwood.co.jp/item_140140_2670610_6761.html</link>
<description></description>
<pubDate>Mon, 07 Mar 2011 11:33:24 +0900</pubDate>
</item>

Notice, no description. So all the script can get on that is the title, link and date.

The Digital Photography Review feed is similar:


<item>
<title>Nikon Coolpix L120</title>
<pubDate>Wed, 09 Feb 2011 01:00:00 GMT</pubDate>
<link>http://www.dpreview.com/news/1102/11020909nikonsuperzooms.asp</link>
<guid>http://www.dpreview.com/news/1102/11020909nikonsuperzooms.asp</guid>
<description>Also: &lt;a href="/reviews/specs/Nikon/nikon_cpl120.asp"&gt;Specifications&lt;/a&gt; and &lt;a href="/products/shop/nikon_cpl120"&gt;Prices&lt;/a&gt;.</description>
</item>

It contains just a little more information per item.

Because I caught your post before last before you edited it, I think I know what you're after - something like what feedex.net can give you. The feedex for the same item for Kenwood looks like so (portions wrapped):


<item><title>Information on the Kenwood Booth at IWCE 2011</title><link>http://www.kenwood.co.jp/en/news/2011/20110307_01.html</link><description>&lt;h1&gt;
Information on the Kenwood Booth at IWCE 2011&lt;br&gt;
Offering total wireless communications systems based on the theme of&lt;br&gt;
“Digital Systems &amp;amp; Multimedia Solutions.”
&lt;/h1&gt;
&lt;div&gt;
&lt;p&gt;
&lt;strong&gt;Kanagawa, Japan, March 7, 2011 —&lt;/strong&gt; Kenwood Corporation,
an operating company of the JVC Kenwood Group, will exhibit at this year’s International Wireless
Communications Expo (IWCE), the world’s largest exposition and trade fair for radio communication
equipment and systems, to be held March 9-11 in Las Vegas. Visitors to Kenwood’s booth will be able
to view total wireless communications systems optimized for a variety of applications ranging from the
business &amp;amp; industry market to the public safety market. . . .

And it goes on and on. The way they get that information is by sending a spider to crawl the link for each item and then retrieve the data from those links and place it into the description field for each item.

The RSS Display Box script doesn't do that. It could parse the feedex version of the feed though.

However, the feedex terms of service are such that it may only be used for personal/non-commercial purposes unless one is willing to pay a minimal fee. But your responsibility doesn't stop there, you have to secure permission from the primary feed site to use its extended content.

It's possible that one could set up one's own service like feedex to crawl the link from each item and retrieve its data. That's beyond the scope of the RSS Display Box script, and probably of this forum. There would also be copyright considerations as to how much of the information from the feed site one can display. This would be true of a feedex feed as well if used for commercial or even just non-personal purposes.

If you want to try to develop a feedex type service with the help of folks in this forum, you can ask about it in the PHP section of this board. But I think that is, as I said, a bit beyond the scope of this forum.

If you're willing to do most of the work though, you might get some help with it in the PHP section.

siluet
03-27-2011, 11:48 AM
Yes, currently I use feedex.net service to get expanded content (free service is limitated to 5 items per feed), as I know, some of such services use SimplePie engine to parse the feed. First of all, I have no intention to create own online service like feedex.net, it's for my personal needs only, personal web site.
I just wanted adjust SimplePie script to get full-lenght content without the necessity of using an external service. There is no legal issues related to using primary feed site to use its extended content.

So, the SimplePie can retrieve extended content only if it present inside <description></description> or <longdesc></longdesc> tags in original xml feed? It does not do neither searching nor parsing source URL inside <link></link> tags?

jscheuer1
03-27-2011, 02:05 PM
What feedex is doing is something like (in PHP):


$requested_page = file_get_contents($url);

Where $url is the item link. They then take just the body, strip out any remaining scripts and style, detect and convert to the proper encoding if necessary, convert links and src attributes that would be broken into either absolute paths or (in the case of certain links) links back to the extended feed (which depends upon some tests of these links), parse the remaining code into xml friendly tags and append that to the <description> tag in the new extended feed. Some special attention may be paid to image tags to ensure the images aren't beyond certain dimension limits. Other tests and tweaks of the content so gathered are done. I noticed with Dynamic Drive's feed, feedex stripped out ads and headers from the linked pages.

You cannot even do this if your server doesn't allow it, it's a security setting. It can be blocked by the remote site as well, if they so choose. Feedex might have a way around that.

So, to make a long story short:

Yes. SimplePie can retrieve extended content only if it's present inside the feed.

And when you say it's a personal site, that doesn't mean that you may use another site's content without their permission. Are you the only person who views this personal site? If not, showing their content without their permission could be seen as a violation of their copyright.

By personal non-commercial use, feedex means only you get to see the results. And that you aren't using that information for any kind of commercial purpose, like if you were a competitor of the feed site and wanted to grab certain key bits of information from them in an automated fashion.

Also, you have no right to republish all or most of another site's page unless they give you permission. It doesn't matter how small your audience is. Unless it is very very small and you can guarantee that it will remain very very small, you could get into trouble.

The only exception to that I know of is for a critical review. Even that has limitations as to how much you can garner and present.

siluet
03-27-2011, 10:16 PM
1. Seems, if full content present inside the feed, we can retrieve it without changing simplepie.inc file, by adding custom template in outputbody.php which defines body outputs?

{
else if ($template=="Custom"){
?>
<DIV class="rsscontainer">
<div class="rsstitle"><a href="<?php echo $item->get_permalink(); ?>"><?php echo $item->get_title(); ?></a></div>
<div class="rssdate"><?php echo $item->get_date('d M Y g:i a'); ?></div>
<div class="rssdescription"><?php echo $item->get_description(); ?></div>
<div class="rsscontent"><?php echo $item->get_content(); ?></div>
</DIV>
<?
}
Correct? If the full content does not exist inside the feed, template will return just all data prior to "rsscontent" section?

2. what's correct syntax for JavaScript code on html page when we want to fetch all items, and display 5: should we use zero?
showbbc.set_items_shown(0, 5) or
showbbc.set_items_shown(5)?

3. is there another way to display feed on page, to make a feed search engines friendly, by using PHP to return the content directly, not via Javascript code?

jscheuer1
03-28-2011, 01:48 PM
No, there's no get_content() function for items in the version (1.0 b3.2) of simplepie.inc used by this script. If you get the date, title, link and description, you pretty much have it all. Many feeds only have these. Some don't even have all of them. If the feed has other stuff, you can get that if simplepie has a way to, or if you use another feed parser that does. Apparently all versions from 1.0 on have it, you could update. But as I said before, that would necessitate changes to other files used by this script. And get_content still doesn't get you anything that isn't there - it will use the description if nothing more elaborate is available. In the case of the Kenwood feed, where there's nothing like that, not even a description, it would return nothing.


showbbc.set_items_shown(0, 5) For more info, see:

http://www.dynamicdrive.com/dynamicindex18/rssdisplaybox/rssdisplaybox_ref.htm


You can, but feeds are inherently not SEO friendly. They change too often. If you want to construct such a thing see:

http://simplepie.org/wiki/setup/sample_page

and:

http://simplepie.org/wiki/

Both use the most recent version of simplepie.

Or, if you hunt around, there may be something like that already. But there wouldn't be the type of real time updates and pagination like javascript affords.

siluet
03-29-2011, 08:17 PM
well, many feeds don't change too often.

the above SimplePie example for php pages:
http://simplepie.org/wiki/setup/sample_page
Is there way call php script from the html page? My pages are dynamically generated html pages. I dont want changing the AddType line for PHP files to send .html and .htm files through the PHP interpreter.

For some reason, RSS Display Box not show this feed:
http://www.unsum.com/fetch?id=1475009
Where is the problem?