Log in

View Full Version : mySQL LIKE



nikomou
12-30-2008, 01:38 AM
Hello,

I have a field in my table with a string of letters, each letter represents a "feature" of a product.. e.g. if a product has touch screen and has bluetooth the string would be tb.

This gets a little complicated when a product has lots of features like this
scbmrtl

Using mySQL LIKE, i would want to search for products with multiple features.

How can i construct search string in order for this to work?


Features LIKE '$searchstring'
i have tried %t%b% but this doesnt always work... like in the string above..

james438
12-30-2008, 08:44 AM
select column from table where column regexp '[scbmrtl].*[scbmrtl]'

Is the best I can think of. The above will match 'ss' and 'scb' and 'sss' and 'eppsppt' but not 's' or 'zvc'. It is designed to look for at least two letters (from your list) in your column even if the same one is listed twice.

I was wracking my brains to try and get LIKE to work, but couldn't get anything to work. The more I thought about it your problem was begging for a regexp to step in. It is best to avoid using regexp except when you need it, because regexp tends to be server heavy. This may not be the case with MySQL queries like it is with PHP, but I doubt it. To read up more on regexp and MySQL go here (http://dev.mysql.com/doc/refman/5.0/en/regexp.html).

EDIT: Actually I think I could write out the code where LIKE would be used instead of using regexp, but the query would be around 15 to 20 times longer and would probably be just as server heavy as the above code if not more so.

Twey
12-30-2008, 01:33 PM
This is a fine example of how not to design your tables.

There are two options, both of which are superior to this:Use a separate column boolean for each feature, or Use a bitset.Using separate columns may make your table quite wide, but is also the simplest (and probably most 'correct') way of doing it. The biggest disadvantage is that when adding a new feature you must add a whole new column. If you're not going to be adding new features to the table very often, that probably doesn't matter.

Using a bitset, on the other hand, involves only a single column, and you can search the table for (a) specific feature(s) with the built-in binary operators. Adding a feature involves only setting the appropriate value. However, this does make the data a little harder to decipher at a glance (documentation for the meaning of each position in the bitset must be stored elsewhere, probably in the code), and won't be quite as efficient as the separate columns (although still much more efficient than your original strings).

Note that % does match any number of characters, including none; your example above should work.

nikomou
12-30-2008, 03:37 PM
hello, i did the obvious and just used this for each of the features:

.. AND Features LIKE '%b%' AND Features LIKE '%t%' .. etc

thanks for your help :)

james438
12-31-2008, 08:24 AM
Twey, when I design tables, my goal is to try to keep it as simple and efficient as possible. Having a seperate column for each boolean feature just sounds needlessly complicated. The same is true with bitset, however I must add that I am not very familiar with bitset nor of any good reason to use bitset. I am sure there is a good reason, but I am simply not aware of one. I tried reading up on bitset over in dev.mysql and got a little more understanding of what and how bitset is used, but not why.


Note that % does match any number of characters, including none; your example above should work. The first part is true. the second part of your statement is not. What if 'b' and 't' were reversed in position? Then it would not work.

nikomou, your example will only work in certain circumstances. The way you have it written all of the features must be present in order for the statement to be true as opposed to two or more non matching features being present in a given field.

I am happy to see that people do read this part of the forum. MySQL seems like a rather neglected subject.

Twey
12-31-2008, 02:35 PM
What if 'b' and 't' were reversed in position? Then it would not work.Yes, but why not simply sort the string first, or insert new features in alphabetical order? That way you don't have to worry about them being in orders other than the ones you specified.

That's all academic anyway, though, because:
Twey, when I design tables, my goal is to try to keep it as simple and efficient as possible.... you've managed to find the least simple and least efficient manner of tackling this particular problem :)
Having a seperate column for each boolean feature just sounds needlessly complicated.When thinking of SQL, don't consider what the tables look like; they only have to be created once. Consider, instead, the queries. The separate columns are actually the simplest approach. I think that any layman could say with reasonable confidence that SELECT * FROM table WHERE has_bluetooth AND has_touchscreen is considerably less complicated than SELECT * FROM table WHERE features LIKE '%b%' AND features LIKE '%t%', but what they probably couldn't point out is the relative expense of performing those string operations: both the individual columns option and the bitset option are effectively O(1) on the match, but matching an arbitrary substring is O(m*n) in the worst case, with m being the length of the substring and n the length of the string, and you do it many many times in each query — at least once for each row in the table, making the whole thing O(m*n*o) (o being the number of rows), even if you're only doing a single match in the query. Including a number of matches p brings it up to O(m*n*o*p). Now, m (the length of the substring) will technically only ever be 1 in your case, so we can remove it from the equation, but n can be reasonably be expected to be around 10, p from 1 to 10, and o to be in the hundreds in even a small table — depending on the success of your site it might end up in the hundreds of thousands. On top of this there's overhead from parsing the LIKE pattern, which is not a simple substring but a more complicated domain-specific language, which gives us particularly large constants.

npsari
01-05-2009, 12:05 AM
Use explode()

Then you match every single letter alone (Using LIKE)

Then you display all the results together

This is the best way i can think of

Twey
01-05-2009, 03:31 AM
In the code, define (PHP):
define('BLUETOOTH', 1);
define('TOUCHSCREEN', 2);
define('OTHERFEATURE', 4);
define('NEWFEATURE', 8);Then:
mysql_query(sprintf('SELECT * FROM products WHERE features_bitset & %d',
BLUETOOTH + TOUCHSCREEN));Inserting them is done with +:
mysql_query(sprintf('INSERT INTO products (features_bitset) values (%d)',
BLUETOOTH + OTHERFEATURE + TOUCHSCREEN));And that's a bitset. Very simple, you just have to keep track of which value means what.

james438
01-05-2009, 09:16 AM
Sorry for the delay in response to your last statement. I just didn't think that there was much for me to ask or contribute further to this thread. My response to nikomou may have been the least efficient, but I was trying to work with the scenario that I was given without altering his table. Honestly the idea of using bit set or creating a potentially long table just didn't occur to me.

When thinking of SQL, don't consider what the tables look like; they only have to be created once.
I am more familiar with my own situation as opposed to others, but I find that I am often altering my tables as I find better and more efficient ways to organize, store, and retrieve my data. I also want my table to make as much sense to me as possible if I find that I need to go back and change or fix something.

This might scare you a bit, but I have one table that I designed which stores a rather large member timesheet within each row, because I did not want to create a new table for each day's worth of data generated. To make matters more complicated almost every column is a person's name, about 100, and since names are added, subtracted, and/or corrected frequently the table columns must be altered almost daily as well. Because of this I have modified my MySQL table creation to make my tables as understandable (mostly to me) as possible. Especially since I may need to go back and correct someone's name or improve something. I say that it may scare you, because you may know of a simpler method, but it was the best I could think of at the time ;). Yes, adding a column for each boolean feature does still sound needlessly complicated to me.


SELECT * FROM table WHERE has_bluetooth AND has_touchscreen
Sorry, could you explain this statement? I am not really seeing how this works. MySQL is an area where I feel I have only gotten my feet wet, so I have a great deal left to learn yet as opposed to PHP which I have worked much more with and am more comfortable with.


And that's a bitset. Very simple, you just have to keep track of which value means what. I am sure this is implied, but I would store the values in the database ;). I am currently reading up on bitset, by the way, as this is new to me. I only first heard about it in this thread. It sounds promising and I may end up redesigning some of my database search programs to incorporate it.

Twey
01-06-2009, 07:26 AM
I am sure this is implied, but I would store the values in the databaseThe overhead of the subquery would pretty much negate the speed benefits.
Yes, adding a column for each boolean feature does still sound needlessly complicated to me.Complicated? How could anything be simpler? It's certainly simpler than that convoluted mess of LIKE queries.
This might scare you a bit, but I have one table that I designed which stores a rather large member timesheet within each row, because I did not want to create a new table for each day's worth of data generated. To make matters more complicated almost every column is a person's name, about 100, and since names are added, subtracted, and/or corrected frequently the table columns must be altered almost daily as well.It's about time someone introduced you to the wonders of relationships (http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html). If your table changes frequently then there's something fundamentally wrong with your design. Specific, dynamic data like names should not be realised as columns. Nor should new tables be created with that frequency. If you want to post the relevant parts of your database layout, I'll normalise it for you.
MySQL is an area where I feel I have only gotten my feet wet, so I have a great deal left to learn yet as opposed to PHP which I have worked much more with and am more comfortable with.Clearly. I strongly suggest you read through a good tutorial before doing anything more with them. Bitsets are one thing, but you really should at least be familiar with relationships, and I disapprove of any teaching method which omits them from the curriculum.
Sorry, could you explain this statement?It's almost the simplest query imaginable — certainly the simplest using SELECT, WHERE, and AND. It says 'select every column from every row from the table "table" where the has_bluetooth column and the has_touchscreen column are both true.'