PDA

View Full Version : Can I automatically save PHP generated pages as HTM?



wkenny
12-02-2005, 12:16 PM
I converted most of the pages on my site to PHP to make updating simpler using includes. Then I discovered that Google and Yahoo had dropped the pages from their indexes because the pages were being generated on the fly.

Is it possible to have a PHP program which would read in .PHP filenames from a text file, build pages from the PHP code and automatically save the built page on the server as HTM with the same filename with a .HTM extension.

For example, if the text file has a line /mydir/mypage.php the program should automatically create /mydir/mypage.htm, overwriting an existing file if present.

Twey
12-02-2005, 12:23 PM
You want to use mod_rewrite to make the pages look like static HTML. Read http://www.fluidthoughts.com/howto/mod_rewrite/.

wkenny
12-02-2005, 02:36 PM
Hi Twey

Thanks for the reply. I do not think this is an option for a number of reasons.
1) I am using a hosting service, and I doubt if they will allow me to change any Apache related settings
2) Even if they did, I would not have a clue how to go about it (I do not understand most of whats in the page you referred me to)
3) The search engines would probably not index the pages because of the re-direct. Yahoo show how they handle re-directs and, unless I've misinterpreted, when a re-direct is to a page in the same domain they index the referring page rather than the target page
4) When I converted to PHP, I browsed the site with Lynx and it displayed content as expected. But the engines apparently did not bother to examine the content, just dropped the page (even though they crawl the site daily). If they are clever enought to work out that the page is not actually static, I am left with the same problem.
5) Even if I re-direct, what am I re-directing from and to?

I have already converted back to HTM by browsing the PHP page and saving it as .htm manually (this has already got some of the pages back in the Google index), so I would prefer to stick with this method, but obviously doing it manually is time-consuming and tedious.

Text on one of the links from the page you referred me to says " It is easy to directly pre-generate a page to its static form" but does not say how. This is what I need.

Twey
12-02-2005, 02:58 PM
I see. You'll have a slight problem there, in that it'll not be dynamic if you do so, updating itself on time schedule (if you run a cron job or similar) rather than when the user accesses it. However...

<?php
$files = file("filelist.txt");
for($i = 0; $i < count($files); $i++) {
ob_start();
include($files[$i]);
$page = ob_get_contents();
ob_end_clean();
if(strpos($files[$i], ".php") > -1) {
$file = fopen(substr($files[$i], -4) . ".html");
fputs($file, $page);
fclose($file);
}
}
?>This will read files from filelist.txt and, if each has a .php extension, generate a static page, the name of which will be the original filename with a .html extension instead of .php. You could, if you wanted, add in code to walk through your directories. By the way, if you don't have dynamic content in your pages (as it would seem, if you can replace them so easily with static pages) you shouldn't name them .php in the first place. PHP parsing static pages slows down the server response time to no particular use.

wkenny
12-02-2005, 03:59 PM
Thanks again Twey. That looks exactly like what I need.

The pages are in two languages and are semi-dynamic in that the data on many of them changes fairly regularly.

Prior to switching to PHP I had to edit the HTM directly, cutting and pasting the data and then uploading the new page (I didn't want to use scripts to ensure accessibilty). With PHP pages I was able to change data simply by modifying an included text file.

Also, new pages are added quite often with exactly the same layout as exisitng pages. I was able to automate the creation of new pages using PHP just by copying any existing page with a new filename. The routine includes the correct data and generates meta tags and other variable text based on the filename.

Finally I wanted every page in each language to have a link to all other pages in that language so I used includes for navigation. This saved me from having to edit all existing pages each time a new one is added.

Hopefully with the aid of your script, I will be able to retain the advantages of using PHP while still having static pages.

Thanks again.

wkenny
12-02-2005, 07:32 PM
Hi Twey

Got the script working but I needed to change the fopen slightly so I'm posting the change in case anybody else wants to use it.

$file = fopen(substr($files[$i], 0,strlen($files[$i])-4) . ".html",'w');

Once again many thanks.

Twey
12-02-2005, 07:44 PM
Whoops, yes. Sorry.

imi_99
12-27-2006, 05:07 PM
Hi,
I got dynamic website in php. it returns images and contents when hit on some page.

When i run this script it give me this error,


Warning: main(index.php ): failed to open stream: No such file or directory in /mnt/w0401/d04/s45/b0225e10/www/test/dynamic.php on line 5

Warning: main(index.php ): failed to open stream: No such file or directory in /mnt/w0401/d04/s45/b0225e10/www/test/dynamic.php on line 5

Warning: main(): Failed opening 'index.php ' for inclusion (include_path='.:/usr/local/nf/lib/php') in /mnt/w0401/d04/s45/b0225e10/www/test/dynamic.php on line 5


It says error on line 5 here " include ($files[$i]);"

can you please help me how can i run this script working. I want to run is ASAP

thanks in advance

Twey
12-27-2006, 11:24 PM
You have a trailing space after one of the filenames in your list.

imi_99
12-27-2006, 11:43 PM
You have a trailing space after one of the filenames in your list.

Thanks Twey,

It is realy helpful and I fixed this problem. I was trying to run this script from 3 days and could not find where was error. I never noticed about a space.

thanks again

imi

:D

mwinter
12-28-2006, 04:33 AM
The search engines would probably not index the pages because of the re-direct. Yahoo show how they handle re-directs and, unless I've misinterpreted, when a re-direct is to a page in the same domain they index the referring page rather than the target page

I should think you have misunderstood. The redirect will either be internal (the Web server itself chooses a different file) or via a HTTP redirect. A search engine wouldn't even know about the former, and it would be completely broken if it ignored the latter.

Search engines will not process "redirects" implemented through client-side scripts, and they are wary of meta element "redirects". The latter can easily be abused.

Can you post a link to where you read this information?



Text on one of the links from the page you referred me to says " It is easy to directly pre-generate a page to its static form" but does not say how. This is what I need.

What can be done is to check the modification time of each source that makes up a document, and compare it against the time of the generated document (if it exists). If the generated document is older than the newest source, it is regenerated. Eventually, a temporary redirect is returned that refers to the generated version.

Something along the lines of:


<?php
/* This file name needs to be determined somehow. The name of the PHP file is sensible, but not
* necessarily appropriate. Do no use an absolute path (without making the necessary changes,
* below).
*/
$destination = '...';
/* A list of files used to construct the document. */
$sources = array('header.fragment', 'content.fragment', 'footer.fragment');

if (!file_exists($destination . '.lock')) {
touch($destination . '.lock');
}
$lockFile = fopen($destination . '.lock', 'r');
flock($lockFile, LOCK_SH);
if (isStale($destination, $sources)) {
/* Upgrade lock before writing. */
flock($lockFile, LOCK_EX);
$destinationFile = fopen($destination, 'w');
ob_start();
/* Output contents... */
fwrite($destinationFile, ob_get_contents());
fclose($destinationFile);
ob_end_clean();
}
flock($lockFile, LOCK_UN);
fclose($lockFile);

/* Change the domain name as necessary. */
$destinationUri = 'http://www.example.com' . rtrim(dirname($_SERVER['PHP_SELF']), '/')
. '/' . $destination;
header('Location: ' . $destinationUri);
?>
<!DOCTYPE html "-//W3C//DTD HTML 4.01/EN" "http://www.w3.org/TR/html4/strict.dtd">

<html>
<head>
<title>Continue to <?php echo $destinationUri; ?></title>
</head>

<body>
<p>Please continue to <a href="<?php echo $destinationUri; ?>"><?php echo $destination; ?></a>.</p>
</body>
</html>
<?php
function isStale($destination, $sources) {
if (!file_exists($destination)) {
return true;
}

$generationTime = filemtime($destination);
foreach ($sources as $source) {
if ($generationTime < filemtime($source)) {
return true;
}
}
return false;
}
?>



In principle it's simple, but it can get awkward with many sources without some abstraction. One also needs to check that it's less resource-consuming to test the modification times than it is to generate the document (as well as cache-related headers like ETags, and Last-Modified, and Content-Length to facilitate persistent connections).

Mike

imi_99
12-29-2006, 12:23 AM
Hi again,

Come accorss other issue while solving it. SO need your suggestion again.

I can remove spaces using trim function,

PHP Code:
include (trim($files[$i]));


But Suppose if file name is index2.php, then how can I remove space and (OR only number) number so that I can save file as index.html

imi_99
12-29-2006, 10:38 PM
Hi everyone,

While running this script for the website I come across one new issue.
I have found solution by using ereg_replace method to remove number from filename.

But new issue come accross while testing site from last 2-3 days.
When I run this script then It get all data from index.php files and save that data as index.html file. But I have seen 3-4 times that once I created the index.html file all data get changed without running script again and it happened 3-4 times.
Also data I get is different than compare to data I usually get on running script.

This is very bad news because if I update file and leave running the website as default index.html page then users on website cant see what it should be displayed on website.

So question from above all programmers, might be you passed through this error. can you please explain about this bit and solution.

thanks in advance


imi