Hi,
How can I check if a php string include English characters?
My site is in hebrew, so I want to check if string is in Hebrew or English.
Thanks
Hi,
How can I check if a php string include English characters?
My site is in hebrew, so I want to check if string is in Hebrew or English.
Thanks
There may be a function already made for this, but here's something that might work:
Not sure if this is what you want, as this will only tell you if the string contains at least one english character, it's not foolproof. I'm also not sure if Hebrew uses any of the characters in the Greek alphabet at any point in the language.PHP Code:
<?php
$letters = array('a','b','c','d','e','f','g','h',
'i','j','k','l','m','n','o','p',
'q','r','s','t','u','v','w','x','y','z');
$string = 'Atlas';
$has_english = false;
foreach($letters as $letter) {
if(strpos($string, $letter)) {
$has_english = true;
break;
}
}
echo 'The string ';
echo ($has_english) ? 'contains' : 'does not contain';
echo ' English characters.'; // Outputs "The string contains English characters."
?>
Schmoopy, this isn't a major point, but note that the Greek alphabet is a third system (alpha α, beta β, gamma γ, etc). The Latin alphabet or Roman alphabet is used for writing most of the European languages including English. (A third system, the Cyrillic alphabet is used for Slavic languages like Russian. And as a bit of trivia, it's interesting to note that the Greek alphabet is actually the script from which the others are based; and it can actually be linked all the way back to the Phoenician script, which in turn is related to Egyptian hieroglyphs. And in fact, a vast majority of the writing systems in the world, including Hebrew (and Arabic), are derived from Phoenician. The only major writing systems not derived from Phoenician are Chinese (and related systems) and the ancient Mayan system. But that's off topic)
-----------
d-machine, Schmoopy's answer will give you a basic check: if it has any English at all, call it English. That's not a guarantee because I am guessing that sometimes a string might contain both. I think that it might be more accurate to check for Hebrew characters (slightly), but the same problem could occur. For that reason, the best answer might be to count all of the characters in the string and compare whether English or Hebrew has more. Even that might not be completely accurate, in a case such as:
The man said in Hebrew: ".......long Hebrew quote......."
What is the purpose of this code? I'm guessing you are taking user input and trying to format it correctly on the page using left to right or right to left directions for the HTML elements?
One option would be to split it up by paragraphs (line breaks) and check the first symbol* of each paragraph to see if it's English or Hebrew.
(*skipping past punctuation to find a real letter, perhaps)
There is a complication, at least when dealing with Hebrew. What character encoding are you using? Are you sure it's being used consistently? The best solution will be to use unicode because it is standardized. This will then also allow you to know exactly what each character is. But if you are mixing encodings or using another, then any code designed to check for Hebrew unicode characters won't work at all.
The most accurate way to check if there is Hebrew, assuming unicode, will be to look to see if the string contains any characters in the unicode range for Hebrew.
http://en.wikipedia.org/wiki/Unicode...ebrew_alphabet
You can also use the same method for English if you want, using the unicode range for the letters rather than typing them out individually. Note that you might want to include capital letters in addition to lowercase.
However, maybe the best solution for all of this will be to use a third party language guesser. A good one is part of google's language API (and google translate):
http://code.google.com/intl/en-US/apis/language/
There's some info in this discussion here:
http://stackoverflow.com/questions/1...-string-in-php
You can use Javascript as that suggests or you can use PHP with the newer version of the API:
http://code.google.com/intl/en-US/ap...n_snippets_php
If you do use google, remember that it might guess any language. So I think the best solution will be to check if it matches Hebrew. If not, assume English.
Last edited by djr33; 04-02-2011 at 07:14 PM.
Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum
Ah, sorry, I meant to say the (Latin?) alphabet... whichever one is A-Z. I thought it was Greek, but it seems I'm mistaken.
I think we all understood what you meant. I was just adding that for clarity. Yes, "Latin" is the most common term for it. (That's still actually a little misleading, though: Latin only had capital letters and not all 26 we use in English, like u and w. And other languages that use the "Latin" alphabet may not have all of those or may have some extras, like ç or ø or ß. Regardless, checking for those 26 will usually tell you that it's some European language, but not much more than that. I think it should be officially called the "English alphabet" but no one actually uses that term. The reason for this is that computer systems were designed with English. So in a broad sense, "Latin" is a better term, but specifically for computer encodings, "English" seems more accurate. That's why a symbol like ß might be somewhere completely different in the unicode system, even though in German it's just as "normal" as any other letter, like 's' or 't'.)
[This is what happens when a linguist answers a programming question, haha. Sorry for being off topic. Some of this actually might be important if you're trying to separate European languages, but if you're just comparing to Hebrew as in this question it's probably mostly irrelevant.]
Last edited by djr33; 04-02-2011 at 10:26 PM.
Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum
Bookmarks