Log in

View Full Version : Modifying a search script to display full entry..??



fred11
04-15-2013, 02:25 AM
Hi
I'm using a script which I found on the internet which will search a pre-defined text file and if a keyword matches - it will display an EXTRACT of the text file entry which contains the keyword.
It works well - but I need the script to display the full entry(ies) in the text file (each entry is on a separate line) that match the keyword(s).
I've been on this for a week now - and I'm just a beginner with php.
Can anyone help me???
It seems that the code is stopping the full text entry from displaying because of an extract limiter.... (I think...)

I've removed the css from the code below as code was too big to display here...


<?

// English Configuration
$my_server = "http://".getenv("SERVER_NAME").":".getenv("SERVER_PORT"); // Your Server (generally no changes needed)
$my_root = getenv("DOCUMENT_ROOT"); // Your document root (generally no changes needed)
$s_dirs = array("/dir1"); // Which directories should be searched ("/dir1","/dir2","/dir1/subdir2")? --> $s_dirs = array(""); searches the entire server
$s_skip = array("..",".","subdir2"); // Which files/dirs do you like to skip?
$s_files = "html|htm|HTM|HTML|php3|php4|php|txt|docx"; // Which files types should be searched? Example: "html$|htm$|php4$"
$min_chars = "2"; // Min. chars that must be entered to perform the search
$max_chars = "30"; // Max. chars that can be submited to perform the search
$default_val = "Search Terms Here..."; // Default value in searchfield
$limit_hits = array("100"); // How many hits should be displayed, to suppress the select-menu simply use one value in the array --> array("100")
$message_1 = "Invalid Searchterm!"; // Invalid searchterm
$message_2 = "Please enter at least '$min_chars', highest '$max_chars' characters."; // Invalid searchterm long ($min_chars/$max_chars)
$message_3= "Your search results are listed below for:"; // Headline searchresults
$message_4 = "Sorry, no matches were found in the <b>Database</a>.<br />"; // No hits
$message_5 = "results"; // Hits
$message_6 = "Match case"; // Match case
$no_title = "From the Database:"; // This should be displayed if no title or empty title is found in file
$limit_extracts_extracts = ""; // How many extratcts per file do you like to display. Default: "" --> every extract, alternative: 'integer' e.g. "3"
$byte_size = "999990"; // How many bytes per file should be searched? Reduce to increase speed

//ini_set("error_reporting", "2047"); // Debugger

// search_form(): Gibt das Suchformular aus
function search_form($HTTP_GET_VARS, $limit_hits, $default_val, $message_5, $message_6, $PHP_SELF) {
@$keyword=$HTTP_GET_VARS['keyword'];
@$case=$HTTP_GET_VARS['case'];
@$limit=$HTTP_GET_VARS['limit'];
echo
"<form action=\"$PHP_SELF\" method=\"GET\">\n",
"<input type=\"hidden\" value=\"SEARCH\" name=\"action\">\n",
"<input type=\"text\" name=\"keyword\" class=\"text\" size=\"10\" maxlength=\"150\" value=\"";
if(!$keyword)
echo "$default_val";
else
echo str_replace("&amp;","&",htmlentities($keyword));
echo "\" ";
echo "onFocus=\" if (value == '";
if(!$keyword)
echo "$default_val";
else
echo str_replace("&amp;","&",htmlentities($keyword));
echo "') {value=''}\" onBlur=\"if (value == '') {value='";
if(!$keyword)
echo "$default_val";
else
echo str_replace("&amp;","&",htmlentities($keyword));
echo "'}\"> ";
$j=count($limit_hits);
if($j==1)
echo "<input type=\"hidden\" value=\"".$limit_hits[0]."\" name=\"limit\">";
elseif($j>1) {
echo
"<select name=\"limit\" class=\"select\">\n";
for($i=0;$i<$j;$i++) {
echo "<option value=\"".$limit_hits[$i]."\"";
if($limit==$limit_hits[$i])
echo "SELECTED";
echo ">".$limit_hits[$i]." $message_5</option>\n";
}
echo "</select> ";
}
echo
"<input type=\"submit\" value=\"OK\" class=\"button\">\n",
"<br>\n",
"<span class=\"checkbox\">$message_6</span> <input type=\"checkbox\" name=\"case\" value=\"true\" class=\"checkbox\"";
if($case)
echo " CHECKED";
echo
">\n",
"<br>\n",
"<a href=\"http://www.xxxxx.com/\" class=\"ts\" target=\"_blank\">Powered by xxxxxxxx</a>",
"</form>\n";
}


// search_headline(): Ueberschrift Suchergebnisse
function search_headline($HTTP_GET_VARS, $message_3) {
@$keyword=$HTTP_GET_VARS['keyword'];
@$action=$HTTP_GET_VARS['action'];
if($action == "SEARCH") // Volltextsuche
echo "<h1 class=\"result\">$message_3 '<i>".htmlentities(stripslashes($keyword))."</i>'</h1>";
}


// search_error(): Auf Fehler testen und Suchfehler anzeigen
function search_error($HTTP_GET_VARS, $min_chars, $max_chars, $message_1, $message_2, $limit_hits) {
global $HTTP_GET_VARS;
@$keyword=$HTTP_GET_VARS['keyword'];
@$action=$HTTP_GET_VARS['action'];
@$limit=$HTTP_GET_VARS['limit'];
if($action == "SEARCH") { // Volltextsuche
if(strlen($keyword)<$min_chars||strlen($keyword)>$max_chars||!in_array ($limit, $limit_hits)) { // Ist die Anfrage in Ordnung (min. '$min_chars' Zeichen, max. '$max_chars' Zeichen)?
echo "<p class=\"result\"><b>$message_1</b><br>$message_2</p>";
$HTTP_GET_VARS['action'] = "ERROR"; // Suche abbrechen
}
}
}


// search_dir(): Volltextsuche in Verzeichnissen
function search_dir($my_server, $my_root, $s_dirs, $s_files, $s_skip, $message_1, $message_2, $no_title, $limit_extracts, $byte_size, $HTTP_GET_VARS) {
global $count_hits;
@$keyword=$HTTP_GET_VARS['keyword'];
@$action=$HTTP_GET_VARS['action'];
@$limit=$HTTP_GET_VARS['limit'];
@$case=$HTTP_GET_VARS['case'];
if($action == "SEARCH") { // Volltextsuche
foreach($s_dirs as $dir) { // Alle Verzeichnisse in $s_dirs durchsuchen
$handle = @opendir($my_root.$dir);
while($file = @readdir($handle)) {
if(in_array($file, $s_skip)) { // Alles in $skip auslassen
continue;
}
elseif($count_hits>=$limit) {
break; // Maximale Trefferzahl erreicht
}
elseif(is_dir($my_root.$dir."/".$file)) { // Unterverzeichnisse durchsuchen
$s_dirs = array("$dir/$file");
search_dir($my_server, $my_root, $s_dirs, $s_files, $s_skip, $message_1, $message_2, $no_title, $limit_extracts, $byte_size, $HTTP_GET_VARS); // search_dir() rekursiv auf alle Unterverzeichnisse aufrufen
}
elseif(preg_match("/($s_files)$/i", $file)) { // Alle Dateien gemaess Endungen $s_files
$fd=fopen($my_root.$dir."/".$file,"r");
$text=fread($fd, $byte_size); // 50 KB
$keyword_html = htmlentities($keyword);
if($case) { // Gross-/Kleinschreibung beruecksichtigen?
$do=strstr($text, $keyword)||strstr($text, $keyword_html);
}
else {
$do=stristr($text, $keyword)||stristr($text, $keyword_html);
}
if($do) {
$count_hits++; // Treffer zaehlen
if(preg_match_all("=<title[^>]*>(.*)</title>=siU", $text, $titel)) { // Generierung des Link-Textets aus <title>...</title>
if(!$titel[1][0]) // <title></title> ist leer...
$link_title=$no_title; // ...also $no_title
else
$link_title=$titel[1][0]; // <title>...</title> vorhanden...
}
else {
$link_title=$no_title; // ...ansonsten $no_title
}
echo "<a href=\"$my_server$dir/$file\" target=\"_self\" class=\"result\">$count_hits. $link_title</a><br>"; // Ausgabe des Links
$auszug = strip_tags($text);
$keyword = preg_quote($keyword); // unescapen
$keyword = str_replace("/","\/","$keyword");
$keyword_html = preg_quote($keyword_html); // unescapen
$keyword_html = str_replace("/","\/","$keyword_html");
echo "<span class=\"extract\">";
if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,5})/i", $auszug, $match, PREG_SET_ORDER)); {
if(!$limit_extracts)
$number=count($match);
else
$number=$limit_extracts;
for ($h=0;$h<$number;$h++) { // Kein Limit angegeben also alle Vorkommen ausgeben
if (!empty($match[$h][3]))
printf("<i><b>> </b> %s<b>%s</b>%s <b>..</b></i><br />", $match[$h][1], $match[$h][3], $match[$h][4]);
}
}
echo "</span><br><br>";
flush();
}
fclose($fd);
}
}
@closedir($handle);
}
}
}


// search_no_hits(): Ausgabe 'keine Treffer' bei der Suche
function search_no_hits($HTTP_GET_VARS, $count_hits, $message_4) {
@$action=$HTTP_GET_VARS['action'];
if($action == "SEARCH" && $count_hits<1) // Volltextsuche, kein Treffer
echo "<p class=\"result\">$message_4</p>";
}

?>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>terraserver.de/search</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#03629C" vlink="#03629C" alink="#9D9D9D">
<table border="0" cellspacing="1" cellpadding="0" bgcolor="#03629C">
<tr align="left" valign="top">
<td>
<table border="0" cellspacing="0" cellpadding="3" bgcolor="#FFFFFF">
<tr align="left" valign="top">
<td>
<?
// search_form(): Gibt das Suchformular aus
search_form($HTTP_GET_VARS, $limit_hits, $default_val, $message_5, $message_6, $PHP_SELF);
?>
</td>
</tr>
</table>
</td>
</tr>
</table>
<?
// search_headline(): Ueberschrift Suchergebnisse
search_headline($HTTP_GET_VARS, $message_3);
// search_error(): Auf Fehler testen und Suchfehler anzeigen
search_error($HTTP_GET_VARS, $min_chars, $max_chars, $message_1, $message_2, $limit_hits);
// search_dir(): Volltextsuche in Verzeichnissen (siehe config.php4)
search_dir($my_server, $my_root, $s_dirs, $s_files, $s_skip, $message_1, $message_2, $no_title, $limit_extracts, $byte_size, $HTTP_GET_VARS);
// search_no_hits(): Ausgabe 'keine Treffer' bei der Suche
search_no_hits($HTTP_GET_VARS, $count_hits, $message_4);
?>
</body>
</html>

james438
04-15-2013, 04:35 AM
That is a lot of code to debug.

The code you posted is rather out of date. Error reporting is turned off and shouldn't be. The German combined with English is a little distracting too. To start, I suggest replacing $HTTP_GET_VARS with $_GET and turning error reporting (the @ symbol before most of the functions. In fact I didn't see any place where @ was used for anything other than to turn off error reporting) back on.

As far as I can tell you can display the full results by replacing:


if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,5})/i", $auszug, $match, PREG_SET_ORDER)); {

with:


if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,})/i", $auszug, $match, PREG_SET_ORDER)); {

There could be a number of other bugs in the script due to it being somewhat old. Still, it does look like this could be a fun script to play around with.

fred11
04-15-2013, 05:57 AM
Hi James438

Thanks for your help in this perplexing code...!

Using the code replacement as you've suggested below, throws an error...


if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,})/i", $auszug, $match, PREG_SET_ORDER)); {

Changing it to:


if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,0})/i", $auszug, $match, PREG_SET_ORDER)); {

...fixes the error/code but it doesn't help with displaying the full entry...


This is the section of the code that controls the word count before and after the keyword - I'm sure it would need to be changed in order for the full entry to be displayed...?


$auszug = strip_tags($text);
$keyword = preg_quote($keyword); // unescapen
$keyword = str_replace("/","\/","$keyword");
$keyword_html = preg_quote($keyword_html); // unescapen
$keyword_html = str_replace("/","\/","$keyword_html");
echo "<span class=\"extract\">";
if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,0})/i", $auszug, $match, PREG_SET_ORDER)); {
if(!$limit_extracts)
$number=count($match);
else
$number=$limit_extracts;
for ($h=0;$h<$number;$h++) { // Kein Limit angegeben also alle Vorkommen ausgeben
if (!empty($match[$h][3]))
printf("<i><b>> </b> %s<b>%s</b>%s <b>..</b></i><br />", $match[$h][1], $match[$h][3], $match[$h][4]);
}
}
echo "</span><br><br>";
flush();
}
fclose($fd);
}
}
@closedir($handle);
}
}
}

Any other clues..?

james438
04-15-2013, 07:04 AM
What is the error code that you are getting? The following is what I am using and is a modified version of your code:

<?

// English Configuration
$my_server = "http://".getenv("SERVER_NAME").":".getenv("SERVER_PORT"); // Your Server (generally no changes needed)
$my_root = getenv("DOCUMENT_ROOT"); // Your document root (generally no changes needed)
$s_dirs = array("/"); // Which directories should be searched ("/dir1","/dir2","/dir1/subdir2")? --> $s_dirs = array(""); searches the entire server
$s_skip = array("..",".","subdir2"); // Which files/dirs do you like to skip?
$s_files = "html|htm|HTM|HTML|php3|php4|php|txt|docx"; // Which files types should be searched? Example: "html$|htm$|php4$"
$min_chars = "2"; // Min. chars that must be entered to perform the search
$max_chars = "30"; // Max. chars that can be submited to perform the search
$default_val = "Search Terms Here..."; // Default value in searchfield
$limit_hits = array("100"); // How many hits should be displayed, to suppress the select-menu simply use one value in the array --> array("100")
$message_1 = "Invalid Searchterm!"; // Invalid searchterm
$message_2 = "Please enter at least '$min_chars', highest '$max_chars' characters."; // Invalid searchterm long ($min_chars/$max_chars)
$message_3= "Your search results are listed below for:"; // Headline searchresults
$message_4 = "Sorry, no matches were found in the <b>Database</a>.<br />"; // No hits
$message_5 = "results"; // Hits
$message_6 = "Match case"; // Match case
$no_title = "From the Database:"; // This should be displayed if no title or empty title is found in file
$limit_extracts_extracts = ""; // How many extratcts per file do you like to display. Default: "" --> every extract, alternative: 'integer' e.g. "3"
$byte_size = "999990"; // How many bytes per file should be searched? Reduce to increase speed

//ini_set("error_reporting", "2047"); // Debugger

// search_form(): Gibt das Suchformular aus
function search_form($_GET, $limit_hits, $default_val, $message_5, $message_6, $PHP_SELF) {
$keyword=$_GET['keyword'];
$case=$_GET['case'];
$limit=$_GET['limit'];
echo
"<form action=\"$PHP_SELF\" method=\"GET\">\n",
"<input type=\"hidden\" value=\"SEARCH\" name=\"action\">\n",
"<input type=\"text\" name=\"keyword\" class=\"text\" size=\"10\" maxlength=\"150\" value=\"";
if(!$keyword)
echo "$default_val";
else
echo str_replace("&","&",htmlentities($keyword));
echo "\" ";
echo "onFocus=\" if (value == '";
if(!$keyword)
echo "$default_val";
else
echo str_replace("&","&",htmlentities($keyword));
echo "') {value=''}\" onBlur=\"if (value == '') {value='";
if(!$keyword)
echo "$default_val";
else
echo str_replace("&","&",htmlentities($keyword));
echo "'}\"> ";
$j=count($limit_hits);
if($j==1)
echo "<input type=\"hidden\" value=\"".$limit_hits[0]."\" name=\"limit\">";
elseif($j>1) {
echo
"<select name=\"limit\" class=\"select\">\n";
for($i=0;$i<$j;$i++) {
echo "<option value=\"".$limit_hits[$i]."\"";
if($limit==$limit_hits[$i])
echo "SELECTED";
echo ">".$limit_hits[$i]." $message_5</option>\n";
}
echo "</select> ";
}
echo
"<input type=\"submit\" value=\"OK\" class=\"button\">\n",
"<br>\n",
"<span class=\"checkbox\">$message_6</span> <input type=\"checkbox\" name=\"case\" value=\"true\" class=\"checkbox\"";
if($case)
echo " CHECKED";
echo
">\n",
"<br>\n",
"<a href=\"http://www.xxxxx.com/\" class=\"ts\" target=\"_blank\">Powered by xxxxxxxx</a>",
"</form>\n";
}


// search_headline(): Ueberschrift Suchergebnisse
function search_headline($_GET, $message_3) {
$keyword=$_GET['keyword'];
$action=$_GET['action'];
if($action == "SEARCH") // Volltextsuche
echo "<h1 class=\"result\">$message_3 '<i>".htmlentities(stripslashes($keyword))."</i>'</h1>";
}


// search_error(): Auf Fehler testen und Suchfehler anzeigen
function search_error($_GET, $min_chars, $max_chars, $message_1, $message_2, $limit_hits) {
global $_GET;
$keyword=$_GET['keyword'];
$action=$_GET['action'];
$limit=$_GET['limit'];
if($action == "SEARCH") { // Volltextsuche
if(strlen($keyword)<$min_chars||strlen($keyword)>$max_chars||!in_array ($limit, $limit_hits)) { // Ist die Anfrage in Ordnung (min. '$min_chars' Zeichen, max. '$max_chars' Zeichen)?
echo "<p class=\"result\"><b>$message_1</b><br>$message_2</p>";
$_GET['action'] = "ERROR"; // Suche abbrechen
}
}
}


// search_dir(): Volltextsuche in Verzeichnissen
function search_dir($my_server, $my_root, $s_dirs, $s_files, $s_skip, $message_1, $message_2, $no_title, $limit_extracts, $byte_size, $_GET) {
global $count_hits;
$keyword=$_GET['keyword'];
$action=$_GET['action'];
$limit=$_GET['limit'];
$case=$_GET['case'];
if($action == "SEARCH") { // Volltextsuche
foreach($s_dirs as $dir) { // Alle Verzeichnisse in $s_dirs durchsuchen
$handle = opendir($my_root.$dir);
while($file = readdir($handle)) {
if(in_array($file, $s_skip)) { // Alles in $skip auslassen
continue;
}
elseif($count_hits>=$limit) {
break; // Maximale Trefferzahl erreicht
}
elseif(is_dir($my_root.$dir."/".$file)) { // Unterverzeichnisse durchsuchen
$s_dirs = array("$dir/$file");
search_dir($my_server, $my_root, $s_dirs, $s_files, $s_skip, $message_1, $message_2, $no_title, $limit_extracts, $byte_size, $_GET); // search_dir() rekursiv auf alle Unterverzeichnisse aufrufen
}
elseif(preg_match("/($s_files)$/i", $file)) { // Alle Dateien gemaess Endungen $s_files
$fd=fopen($my_root.$dir."/".$file,"r");
$text=fread($fd, $byte_size); // 50 KB
$keyword_html = htmlentities($keyword);
if($case) { // Gross-/Kleinschreibung beruecksichtigen?
$do=strstr($text, $keyword)||strstr($text, $keyword_html);
}
else {
$do=stristr($text, $keyword)||stristr($text, $keyword_html);
}
if($do) {
$count_hits++; // Treffer zaehlen
if(preg_match_all("=<title[^>]*>(.*)</title>=siU", $text, $titel)) { // Generierung des Link-Textets aus <title>...</title>
if(!$titel[1][0]) // <title></title> ist leer...
$link_title=$no_title; // ...also $no_title
else
$link_title=$titel[1][0]; // <title>...</title> vorhanden...
}
else {
$link_title=$no_title; // ...ansonsten $no_title
}
echo "<a href=\"$my_server$dir/$file\" target=\"_self\" class=\"result\">$count_hits. $link_title</a><br>"; // Ausgabe des Links
$auszug = strip_tags($text);
$keyword = preg_quote($keyword); // unescapen
$keyword = str_replace("/","\/","$keyword");
$keyword_html = preg_quote($keyword_html); // unescapen
$keyword_html = str_replace("/","\/","$keyword_html");
echo "<span class=\"extract\">";
if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,5})/i", $auszug, $match, PREG_SET_ORDER)); {
if(!$limit_extracts)
$number=count($match);
else
$number=$limit_extracts;
for ($h=0;$h<$number;$h++) { // Kein Limit angegeben also alle Vorkommen ausgeben
if (!empty($match[$h][3]))
printf("<i><b>> </b> %s<b>%s</b>%s <b>..</b></i><br />", $match[$h][1], $match[$h][3], $match[$h][4]);
}
}
echo "</span><br><br>";
flush();
}
fclose($fd);
}
}
closedir($handle);
}
}
}


// search_no_hits(): Ausgabe 'keine Treffer' bei der Suche
function search_no_hits($_GET, $count_hits, $message_4) {
$action=$_GET['action'];
if($action == "SEARCH" && $count_hits<1) // Volltextsuche, kein Treffer
echo "<p class=\"result\">$message_4</p>";
}

?>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>terraserver.de/search</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#03629C" vlink="#03629C" alink="#9D9D9D">
<table border="0" cellspacing="1" cellpadding="0" bgcolor="#03629C">
<tr align="left" valign="top">
<td>
<table border="0" cellspacing="0" cellpadding="3" bgcolor="#FFFFFF">
<tr align="left" valign="top">
<td>
<?
// search_form(): Gibt das Suchformular aus
search_form($_GET, $limit_hits, $default_val, $message_5, $message_6, $PHP_SELF);
?>
</td>
</tr>
</table>
</td>
</tr>
</table>
<?
// search_headline(): Ueberschrift Suchergebnisse
search_headline($_GET, $message_3);
// search_error(): Auf Fehler testen und Suchfehler anzeigen
search_error($_GET, $min_chars, $max_chars, $message_1, $message_2, $limit_hits);
// search_dir(): Volltextsuche in Verzeichnissen (siehe config.php4)
search_dir($my_server, $my_root, $s_dirs, $s_files, $s_skip, $message_1, $message_2, $no_title, $limit_extracts, $byte_size, $_GET);
// search_no_hits(): Ausgabe 'keine Treffer' bei der Suche
search_no_hits($_GET, $count_hits, $message_4);
?>
</body>
</html>

Part of the difficulty with this script is that we are using different versions of php. I am using version 5.3.21

You can use the following to show what version of php you are using.

<?php
phpinfo();
?>

fred11
04-15-2013, 09:48 AM
Hi
The error is just a blank page with no results.
The results fail to write to html.

I have tested the php script using a variety of php 5 versions.

This really isn't the problem though...

The problem is that the results will not display fully - only as an extract - based on the keyword searched.
For example, if I search the file for the word "apples" - the results are returned as such:

...3 granny apples... (with the keyword in bold) but the whole entry is:
...3 granny apples are hanging from the tree... (which is what I want the script to print out...)

james438
04-15-2013, 04:15 PM
Please use the code I showed you. The error results should now display. The corrected code I posted was tested and does work on my end and displays full results.

The problem is primarily, but not entirely, with your regular expressions here:


if(preg_match_all("/((\s\S*){0,3})($keyword|$keyword_html)((\s?\S*){0,5})/i", $auszug, $match, PREG_SET_ORDER)); {