PDA

View Full Version : text without html tags retrieved by php's fopen(http:// function.



kcmakwana
12-14-2008, 11:44 AM
I want to retrieve one page from remote site by using fopen(http://www.somesite/somepage.html);
next, I store the content of this page to $content variable with fgets function. entire content is stored.but the original page i try to retrieve has tables with 5 columns and 50 rows but my $content does not contain html tags,like <table> <tr> <td> etc.I wans the same. I havent used any preg_match to strip html tags. Any solution?:mad:

hmsnacker123
12-14-2008, 12:28 PM
Hope This Helps:


function strip_html_tags( $text )
{
$text = preg_replace(
array(
// Remove invisible content
'@<head[^>]*?>.*?</head>@siu',
'@<style[^>]*?>.*?</style>@siu',
'@<script[^>]*?.*?</script>@siu',
'@<object[^>]*?.*?</object>@siu',
'@<embed[^>]*?.*?</embed>@siu',
'@<applet[^>]*?.*?</applet>@siu',
'@<noframes[^>]*?.*?</noframes>@siu',
'@<noscript[^>]*?.*?</noscript>@siu',
'@<noembed[^>]*?.*?</noembed>@siu',
// Add line breaks before and after blocks
'@</?((address)|(blockquote)|(center)|(del))@iu',
'@</?((div)|(h[1-9])|(ins)|(isindex)|(p)|(pre))@iu',
'@</?((dir)|(dl)|(dt)|(dd)|(li)|(menu)|(ol)|(ul))@iu',
'@</?((table)|(th)|(td)|(caption))@iu',
'@</?((form)|(button)|(fieldset)|(legend)|(input))@iu',
'@</?((label)|(select)|(optgroup)|(option)|(textarea))@iu',
'@</?((frameset)|(frame)|(iframe))@iu',
),
array(
' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ',
"\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0", "\n\$0",
"\n\$0", "\n\$0",
),
$text );
return strip_tags( $text );
}


(Credit: http://nadeausoftware.com/articles/2007/09/php_tip_how_strip_html_tags_web_page)

kcmakwana
12-14-2008, 12:33 PM
I want the page as it is,means I want all tags which the page originally have, I think,your code is to remove the tags.

hmsnacker123
12-14-2008, 07:27 PM
Oh, most apologies, Errm have you tried the file_get_contents (http://uk3.php.net/file_get_contents) function?

kcmakwana
12-15-2008, 07:14 PM
let me try, file_get_content may work.

kcmakwana
12-23-2008, 06:35 PM
i don't know why,but fopen(http:// function returns entire remote webpage without any html tags, it replaces all tags with ^M or Carriage Return character. for example If there is <td>subject</td>, it will retrieve ^Msubject^M. ,BTW, file_get_content works fine. i m using php 4.2. on apache on linux server. I tried remote page from remote server as well as local page on same server.But now, My purpose is solved with file_get_content. thx.

diltony
12-24-2008, 09:11 PM
<?php
$handle = fopen("http://www.dynamicdrive.com/", "rb");
$contents = '';
while (!feof($handle)) {
$contents .= fread($handle, 8192);
}
fclose($handle);
echo "<xmp>$contents</xmp>";
?>

To actually store the data
replace last line with
$contents=urlencode("$contents");
then save it to ur dbase like that.
To retrieve, just use your urldecode to reverse the encoding.
I am using php 5 now, but i used this same code when i was using php 4.2 so it should work!

hmsnacker123
12-26-2008, 12:48 AM
OK, Try this:


<?php
$page = '';
$fh = fopen('http://www.yourwebsitehere.com/','r') or die('Unable to open file.');
while(! feof($fh)){
$page .= fread($fh, 1048576);
}
echo $page;
fclose($fh);
?>


Alternativly you can use cURL:



<?php
$c = curl_init('http://www.yourwebsitehere.com/');
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
$page = curl_exec($c);
curl_close($c);
?>

kcmakwana
01-08-2009, 08:10 PM
yes it is also working