Page 2 of 4 FirstFirst 1234 LastLast
Results 11 to 20 of 37

Thread: Theories of a search algorithm

  1. #11
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Ok, that makes sense. From what you've both said, this sounds reasonable and probably much more useful than spending more endless hours trying to hack mediawiki to function within my site or something else. The rest of the functions should be easy enough, but I didn't want to get stuck with a search that doesn't work well after all that.

    I can deal with all of the issues mentioned above, mostly outside of MySQL. I'll remove common words like "a", "the", etc. But I don't want to search an index without them because sometimes you do want those words-- such as when you put them in quotes. There's a lot of user-input parsing to deal with, but the actual search should be easy enough.

    I also don't often expect to have a huge number of results, or at least not unless someone searches a word that is too relevant to be useful.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  2. #12
    Join Date
    Aug 2005
    Location
    Other Side of My Monitor
    Posts
    3,494
    Thanks
    5
    Thanked 105 Times in 104 Posts
    Blog Entries
    1

    Default

    You could put a limit on the search results. Kind of like how Google does with their "Similar results hidden, click here to show results" thing. You could limit the results to show the first 5 or 10 or 20, then say "there were more but found to be less relevant, would you like to see them?" Or something.
    {CWoT - Riddle } {Freelance Copywriter} {Learn to Write}
    Follow Me on Twitter: @InkingHubris
    PHP Code:
    $result mysql_query("SELECT finger FROM hand WHERE id=3");
    echo 
    $result

  3. #13
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    I just did some testing to see what happens with all of this.

    I took the database from an old forum and search through the posts.
    There are 27000 posts.

    For each matching post, I echoed the username of the poster and the full text of the post. Obviously in a search this may not be needed, but it will give an idea of the higher end of resource-consumption.


    For a standard against which to compare, echoing ALL 27000 posts took 13 seconds.

    Now searching through using LIKE %term%:
    0 matches: 0 seconds.
    232 matches: 3 seconds.
    2500 matches: 5 seconds.



    While that's a bit worrying I don't really see much of a problem with that.


    And now echoing only the id of the post:
    2500 matches: .3 seconds.



    Finally, if I limit the DISPLAYED results to 10 and only echo the post IDs, then:
    232 matches (10 IDs displayed): .2 seconds
    2500 matches (10 IDs displayed): .25 seconds.

    (And if I remove all output and just loop through [doing nothing] for 10 entries, then it takes .12 seconds.)


    Does that sound like a reasonable time to you?

    If a few people are using the system at once, will it begin to slow down the server?

    It's certainly not looking terrible, but it's still showing a little lag on the server.



    Also, as an interesting note, I believe that I have now figured out that MySQL caches results: once the initial search takes .2 seconds (for the last round of testing) after that it's always about .001 seconds.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  4. #14
    Join Date
    Jan 2007
    Location
    Davenport, Iowa
    Posts
    2,137
    Thanks
    96
    Thanked 98 Times in 96 Posts

    Default

    Also, as an interesting note, I believe that I have now figured out that MySQL caches results: once the initial search takes .2 seconds (for the last round of testing) after that it's always about .001 seconds.
    Good to know.
    To choose the lesser of two evils is still to choose evil. My personal site

  5. #15
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default Negation in MySQL: 'NOT'?

    This was originally in a new thread but I realized I should merge it here.


    I'm working on dynamically generating queries for a search engine and it's very complex. The good news is that I have gotten basically everything else working and it can do multiple levels of parentheticals and use OR. And is implied by spaces.
    This all works.

    But now I need to add negation.

    So:
    a OR (b c)

    That's easy enough.

    But how would I do:
    a OR -(b c)

    Meaning: "find something where X is like a or X is not like b and not like c"


    The basic solution here is to find a true negation operator in MySQL, but I am not sure how this works.

    Both ! and NOT are operators, but I can't find ANY reference on them aside from specific cases (like "NOT LIKE" and "NOT EXISTS").

    Is this a valid query:
    ...WHERE NOT (`field` LIKE '%term%')

    So that will find results that do not have 'term' anywhere in them?



    The problem is that if I can't find an operator that will work across parentheticals then I'll need to "multiply out" all parentheticals and that will just be awful...
    (remember, this is all dynamic)
    Last edited by djr33; 05-06-2010 at 02:51 AM.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  6. #16
    Join Date
    Apr 2008
    Location
    So.Cal
    Posts
    3,643
    Thanks
    63
    Thanked 516 Times in 502 Posts
    Blog Entries
    5

    Default

    other than creating a really long query -maybe like WHERE (`a` LIKE '%b%' (NOT (`a` LIKE '%c%'))) - I'm not sure. Maybe you could create a VIEW and run a second query on it to filter out the NOT results. Or do it in php.

    Have you done much experimenting with either of those?

  7. #17
    Join Date
    Jul 2006
    Location
    just north of Boston, MA
    Posts
    1,806
    Thanks
    13
    Thanked 72 Times in 72 Posts

    Default

    Code:
    ... WHERE (field NOT LIKE 'x' AND field NOT LIKE 'y')

  8. #18
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Is there no way at all to invert the value of a section? I understand that I could "multiply it out", but that would nearly impossible because this is generated automatically from any possible input including multiple levels:
    Search input: "a OR -(b -(c d OR -e)))"

    To create a dynamic system capable of multiplying out the negation, that would require tracking layers of parentheticals, multiple levels of negation and other factors. It's possible, but while I was thinking through how hard it would be, I decided it's best to find some other way because it's so hard...


    Is there some reason mysql can't do a higher-order negation? Is this part of its efficiency/processing?

    I really have no idea how mysql works at a technical level, so perhaps I'm missing something.


    ---
    traq, no, I have not done much experimenting yet. I thought it would be best to ask first because mysql confuses me especially with complex queries. I can handle all the normal single-level queries, but once it gets like this I wanted to check a few things first.

    Code:
    WHERE (`a` LIKE '%b%' (NOT (`a` LIKE '%c%')))
    Is this valid syntax? My parser is already generating this (from 'b -(c)').
    Actually, it would generate:
    Code:
    WHERE `a` LIKE '%b%' AND NOT (`a` LIKE '%c%')
    I could adjust the syntax slightly if the extra parentheses help or something.

    ---

    In short, I just need to know if "NOT" can act as a standalone operator negating what's inside a parenthetical section.

    Code:
    WHERE (NOT (term))
    Is that valid within a where statement?

    I have been searching google, but I can't find any reference to NOT as a standalone operator. It's only referenced within constructs such as "NOT LIKE" and "NOT EXISTS" etc.


    Here's the most "useful" page I've found, but it's not clear at all:
    http://dev.mysql.com/doc/refman/5.1/...l#operator_not
    Last edited by djr33; 05-05-2010 at 05:24 PM.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  9. #19
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Apparently this works:
    http://www.hscripts.com/tutorials/my...erator.php#not

    So you can negate a parenthetical section.

    I'll test this out later. I wish this was actually documented somewhere. In another tutorial it said you could only use NOT before IN, BETWEEN, and EXISTS....
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  10. #20
    Join Date
    Apr 2008
    Location
    So.Cal
    Posts
    3,643
    Thanks
    63
    Thanked 516 Times in 502 Posts
    Blog Entries
    5

    Default

    Code:
    WHERE (`a` LIKE '%b%' (NOT (`a` LIKE '%c%')))
    I've been experimenting with stuff like that lately; I think I did try that particular statement, but I'd have to check. I may have an extra set of parenthesis in there. I'll let you know.

    I think the big stumbling block here is that you can write WHERE(`a` LIKE '%b%' AND NOT `a` LIKE '%c%') but you can't just write WHERE(`a` LIKE '%b%' NOT '%c%')...

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •