Results 1 to 2 of 2

Thread: Detect the language of a string

  1. #1
    Join Date
    Jun 2013
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Detect the language of a string

    I have a string coming from a form. Should I identify with PHP language in which it was written. I have seen that there are methods with the Google API but I think there are two problems, the first is that after a defined number of research I have to pay, the second that I get requests from libreirie Additional Zend I can not install on my hosting. Maybe I'm wrong?
    Is there a way 100% free and does not require libraries difficult to install on a Hosting?

  2. #2
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,162
    Thanks
    263
    Thanked 690 Times in 678 Posts

    Default

    There are a few things to know about this. First, minimal text won't give you a very good guess-- the more you have, the better the guess will be, but it will still be a guess. Writing systems will give you a good clue (Chinese vs. English for example) but not between similar languages.

    So one thing you can consider is why you want to do this-- for example, if you need to detect this for the purpose of text formatting (left-to-right vs. right-to-left, etc.) or text-encoding*, then you can probably do this based on the writing system itself (which isn't too hard to figure out-- you could even do that yourself just by checking the range of unicode input).

    [*However, in most cases the best solution for text-encoding is always going to be unicode-- no major disadvantages, and it will encode anything.]


    You will need to use an external service (there's nothing you can install on the server, beyond an API). Google is quite good, but they've made some weird decisions with the Translate API, and I would highly recommend against relying on it-- they discontinued it with little notice to developers, which was, compared to expectations, somewhat shocking (and received a lot of negative feedback). So Google probably isn't the answer (even though the technology is good). At this point I'm not sure what they're offering (or how long they will continue to support it), but I wouldn't trust it.

    Other companies do have similar services. In general, language detection is easier than translation because there are often a few key indicators that can separate languages (such as writing system and details therein). In difficult cases, it would need to rely on a dictionary to guess about words in the text. There is a relevant failure rate.

    Many of these are available for Javascript (as APIs), so you might need to adapt it for PHP.

    As for Google, I'm not sure about those technical details. There's probably some workaround (such as adapting the Javascript API). What you need is a way to send a request and get the response-- then you need a way to format it. The PHP API is probably not technically required, just the part that requests the info. But whether you're able to adapt it will be based on your skill in PHP (and your time). You can probably do it if you want to put the time into it.


    As a general answer, you will need to search for various services and look at what they offer.

    https://www.google.com/search?q=language+detection+api
    The first result, detectlanguage.com, looks relevant. I haven't tried it**, but they offer JSON responses (which is a common format and easy to use even if you get it raw) and information about using it in PHP.

    [**I just tested it out now, and I really like it. The response is clear and includes a confidence level so you know how reliable the guess is. And I tried a handful of languages, and the results were good.]

    There is a limit on the number of queries (5,000/day), but that's a lot, and you should realize that paying a fee is not unreasonable here-- you're using an external service for something you can't do yourself. I realize that might not fit your budget, but that's the nature of this project.

    One option would be to try to write your own script, at least to get some general idea about the language. If you do figure it out, then you could use that; if not, you could query their service, but that way only when you need to.


    If you have other questions about this, let me know-- this is an area I'm interested in (an intersection of my interest in coding and studying Linguistics).
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

Similar Threads

  1. php string language
    By d-machine in forum PHP
    Replies: 4
    Last Post: 04-02-2011, 10:21 PM
  2. Append string to the end of string via a loop
    By l_kris06 in forum JavaScript
    Replies: 1
    Last Post: 06-24-2009, 02:59 PM
  3. Resolved parseInt() is still recognizing string as string only
    By JShor in forum JavaScript
    Replies: 3
    Last Post: 06-21-2009, 12:14 AM
  4. Detect Characters in String
    By T Horton in forum JavaScript
    Replies: 1
    Last Post: 05-18-2008, 01:23 AM
  5. Language String not displying
    By Private_Guy in forum PHP
    Replies: 2
    Last Post: 08-20-2007, 03:09 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •