Here's the standalone page as promised:
PHP Code:
<?php
$t1 = microtime(1); //start counting...
//connect to DB
mysql_connect('HOST','USER','PASS') or die ('db conn problem'); /////EDIT THIS
mysql_select_db('DBNAME'); /////EDIT THIS
//MultiByte String setup
//setup php for working with Unicode data
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
function mbstring2chararray($s,$maxcharlen=4) {
if ($maxcharlen!=4) {
echo 'WARNING: mbstring2chararray() can only do [up to] 4byte chars now.';
//this should be later fixed, but the syntax is hard...
}
$o = array();
for($x=0;$x<strlen($s);$x++) {
$char = $s[$x];
$skipx = 0; //assume skipping no characters
//now go through each byte number:
//if the sequence of bytes is one character, then group them:
//a 2byte char
if (($x<(strlen($s)-1))&&mb_strlen($s[$x].$s[$x+1])===1) {
$char = $s[$x].$s[$x+1]; $skipx = 1;
}
//a 3byte char
if (($x<(strlen($s)-2))&&mb_strlen($s[$x].$s[$x+1].$s[$x+2])===1) {
$char = $s[$x].$s[$x+1].$s[$x+2]; $skipx = 2;
}
//a 4byte char
if (($x<(strlen($s)-3))&&mb_strlen($s[$x].$s[$x+1].$s[$x+2].$s[$x+3])===1) {
$char = $s[$x].$s[$x+1].$s[$x+2].$s[$x+3]; $skipx = 3;
}
///will not currently do more than 4byte chars!
//how many following letters [and loop cycles] do we skip?
$x = $x+$skipx;
$o[] = $char; //store it
}
return $o; //return final array
}
//transform a user search to a mysql code 'WHERE' string
function search2mysql($q,$column,$maxterms=10,$ifnull=1) {
//$q string to convert;
//$column [string or string array to concatenate] to be searched;
//$maxterms limits search length to prevent overloading the server; 0 is unlimited;
//$ifnull 0 or 1: compensate for possible null values? default is YES
if (!is_string($q)||$q=='') { return FALSE; } //must have a [nonblank] string...
$q = mbstring2chararray($q); //split query into [multibyte] character array
$qwords = array(); //setup an array to store the content 'words'
$ac=0; //setup array counter for search content array
$inquote = 0; //setup for check for being within quoted [literal] search data: 0 [false] or 1 [true]
$plevel = 0; //setup counter for levels of parenthetical embedding
$qstring = ''; //setup a blank term for the query syntax, later to get content too
$newword = 1; //status for whether we're at a word boundary
$negwords = array(); //setup blank array for tracking negative terms ("NOT")
$cswords = array(); //setup blank array for tracking case-sensitive terms [within quotes]
$qlen = count($q); //how many characters?
for($l=0;$l<$qlen;$l++) { //go through each character of the input and parse:
//open quote: not in quotes and at word boundary
if ($q[$l]=='"'&&$newword==1&&$inquote==0) {
$inquote=1;
}
//end quote: found while at the end of a word and while in quotes
elseif ($q[$l]=='"'&&$inquote==1&&($l==($qlen-1)||$q[$l+1]==' '||$q[$l+1]==')')) {
$inquote=0;
$cswords[] = $ac; //add this to a list of words to keep case sensitive
}
//start parentheses: at new word and not in quotes
elseif ($q[$l]=='('&&$newword==1&&$inquote==0) {
$qstring .= '#('; //add the parenthesis and marker [for later]
$plevel++;
if ($plevel<0) { //error: can't have close before open
return FALSE;
}
}
//end parentheses: at end of word and not in quotes
elseif ($q[$l]==')'&&($l==($qlen-1)||$q[$l+1]==' '||$q[$l+1]==')')&&$inquote==0) {
$qstring .= ')#'; //add the parenthesis and marker [for later]
$plevel--;
}
//a space is a break unless in a quote:
elseif ($q[$l]==' '&&$inquote==0) {
$newword=1;
$ac++;
}
//do OR which must be both the beginning and end of a word:
elseif ($newword==1&&$q[$l]=='O'&&($l<($qlen-1))&&$q[$l+1]=='R'&&($l==($qlen-2)||$q[$l+2]==' '||$q[$l+1]=='(')&&$inquote==0) {
$l++; //skip past the R
$l++; //skip past extra space to avoid issues
$qstring .= ' OR ';
}
//deal with - marker for "NOT" terms
elseif ($newword==1&&$q[$l]=='-'&&($l<($qlen-1))&&$q[$l+1]!='('&&$inquote==0&&($l==0||$q[$l-1]!='-')) {
$negwords[] = $ac;
}
//deal with - marker for "NOT" parentheticals
elseif ($newword==1&&$q[$l]=='-'&&($l<($qlen-1))&&$q[$l+1]=='('&&$inquote==0&&($l==0||$q[$l-1]!='-')) {
$qstring .= '#NOT ';
}
else { //not syntactic, must be content
if ($newword==1) { //begin a new word
$newword=0; //no longer a new word
$qwords[$ac] = ''; //setup blank string
$qstring .= '#*'.$ac.'*#'; //add the reference + markers [for later]
}
$qwords[$ac] .= $q[$l]; //add the letter
}
}
//errors:
if ($inquote==1) { //uneven quotes
return FALSE;
}
if ($plevel!=0) { //uneven parentheses
return FALSE;
}
if (in_array('',$qwords)) { //a blank word was found: input is badly formatted
return FALSE;
}
if (count($qwords)==0) { //no blank searches...
return FALSE;
}
if ($maxterms!=0&&count($qwords)>$maxterms) { //over max length: prevent overload
return FALSE;
}
if (strpos($qstring,'###')!==FALSE) { //weird error with markers
return FALSE;
}
//end errors
$qstring = str_replace('##',' AND ',$qstring); //adjacent terms need to be conjoined with AND
$qstring = str_replace('#','',$qstring); //no need for floating markers now
//format $column including concatenation as needed
if (!is_array($column)) { //must switch string to array
$skipconcat=1; //skip concatenation later
$column = array($column); //convert to array
}
foreach($column as $colk=>$col) { //format each column individually
$col = '`'.$col.'`'; //format as column
if ($ifnull==1) {
$col = 'IFNULL('.$col.',\'\')'; //avoid null errors
}
$column[$colk] = $col; //store
}
$column = implode(', ',$column); //implode them into a list
if (!isset($skipconcat)) { //concatenate them:
$column = 'concat('.$column.')';
}
//escape data, then format as a condition
foreach($qwords as $qwkey=>$qword) { //go through the words
//escaping:
$qword = mysql_real_escape_string($qword); //make safe!
$qword = str_replace('%','\\%',$qword); //escape % so it's not a wildcard
$qword = str_replace('_','\\_',$qword); //escape _ so it's not a wildcard
//format as condition:
$not = in_array($qwkey,$negwords)?' NOT ':' '; //is it negated?
//handle case sensitivity: lcase matching normal; in quotes is case sensitive
$lcase = ''; //set to blank by default
if (!in_array($qwkey,$cswords)) { //this is a non-case-sensitive word [otherwise do nothing special]
$lcase = 'lcase('; //set col prefix
$not = ')'.$not; //add col end to trailing $not var
$qword = mb_strtolower($qword); //make string lowercase
}
$qwords[$qwkey] = $lcase.$column.$not.'LIKE \'%'.$qword.'%\''; //format as a mysql query condition
}
//now that the terms and query are setup separately, time to put them together
$invar=0; //tracker value
$q = ''; //reset as blank string for output
for($x=0;$x<strlen($qstring);$x++) { //go through the whole query string
if ($qstring[$x]=='*'&&$invar==0) { //start compiling a key
$invar=1; //set as compiling key
$qwkey = ''; //setup blank string as key
}
elseif ($qstring[$x]=='*'&&$invar==1) { //key end reached
$q .= $qwords[$qwkey]; //add the word to the query
$invar=0; //reset
}
elseif($invar==1) { //continue compiling replace key
$qwkey .= $qstring[$x];
}
else { //it's just syntax
$q .= $qstring[$x];
}
}
return $q; //return the final CONDITION(s) for WHERE; use as required
}
///do the stuff and setup the output!
$out='';
if (isset($_POST['query'])) {
$condition = search2mysql($_POST['query'],array('COL1','COL2')); /////EDIT THIS
$r = mysql_query("SELECT * FROM `YOURTABLE` WHERE $condition ORDER BY `ORDERCOLUMN`;"); ////EDIT THIS
$out = '<hr><b>Results found: '.mysql_num_rows($r).'</b><hr>';
$count=0;
while (($result = mysql_fetch_assoc($r))&&$count<10) {
extract($result);
$out .= "<b>$COL1<b><hr>";
$count++;
}
$out .= microtime(1)-$t1;
}
?>
<html>
<head>
<title>search test</title>
</head>
<body>
<form action="?" method="post">
Query: <textarea name="query"><?php if (isset($_POST['query'])) { echo $_POST['query']; } ?></textarea> <input type="submit" value="submit">
</form>
<p style="font-family:courier;font-size:80%;"><?php echo isset($condition)?($condition===FALSE?'error':$condition):''; ?></p>
<p><?php echo $out; ?></p>
</body>
</html>
How to set it up:
1. At the top you must setup your database connection. Do that as you'd like. 2 lines to edit using my current syntax.
2. Near the bottom (in the last bit of PHP) you must edit two lines to reflect your DB's structure.
I'm well aware that the HTML there is lazy and that the PHP even (aside from the function) may be messy. But it works for testing.
This uses mysql_real_escape_string(), so I think it's reasonably safe, but I certainly wouldn't advise yet doing anything public with it. Select statements luckily aren't that dangerous, but be a bit careful...
Also, if you have any trouble reading it (it's well commented, but also very complex), feel free to ask. The overall logic of the situation is:
1. Go through character by character and split up the input into content bits and syntax bits.*
2. Toss the syntax bits into a pseudo query without any content, leaving some markers for later
3. Toss the content bits into an array of 'words' to search for later, keeping references [array keys] to the markers in the syntax
4. Parse all of this now individually: A) make sure the syntax of the query is not badly formed; B) parse all of the "words" into the syntax for a "LIKE" condition in MySQL
5. Now that the parts are ready, replace the markers with the corresponding "words", and put the whole query back together.
(6. Place this after 'WHERE' in a MySQL query and do the search.)
(*Regex might seem better here, but due to the immense recursive complexity here, I'm not even going to think about trying that.)
Bookmarks