Results 1 to 3 of 3

Thread: Process text within a web page

  1. #1
    Join Date
    Jan 2009
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Process text within a web page

    Hi,

    I'm trying to write a web site that 'translates' another web page. There is already a site that does this;

    http://www.rinkworks.com/dialect/

    but it just replaces text based a database of words and phrases; I need to parse each sentence using algorithms.

    I've been trying to develop need a php script that loads a web page, extracts each sentence from the page, processes it, then inserts the processes sentence back into the page preserve the page formatting.

    I've been looking at DOM but am so new to web programming its still a bit confusing.

    If anyone has any pointers on how this could be done I'd be very grateful. I do realise it's not trivial!

    Cheers.

  2. #2
    Join Date
    Nov 2008
    Posts
    58
    Thanks
    0
    Thanked 7 Times in 7 Posts

    Default

    Use file_get_contents() function to read the page

    Use an HTML parser to parse and get the relevant parts of the page

    For basic PHP tutorials see: PHP form processing

  3. #3
    Join Date
    Jun 2005
    Location
    英国
    Posts
    11,876
    Thanks
    1
    Thanked 180 Times in 172 Posts
    Blog Entries
    2

    Default

    DOM is the least of your worries Machine translation is a big field, and very incomplete. You could perhaps look at the source of Apertium for some ideas about one way of doing it, but I doubt you'll get much out of it without a solid theoretical grounding in linguistics and natural-language processing. I hate to be negative, but if you're struggling with DOM then you have a couple of years' study ahead of you in computer science alone before you'll be ready to tackle this sort of problem with any confidence.
    Twey | I understand English | 日本語が分かります | mi jimpe fi le jbobau | mi esperanton komprenas | je comprends français | entiendo español | tôi ít hiểu tiếng Việt | ich verstehe ein bisschen Deutsch | beware XHTML | common coding mistakes | tutorials | various stuff | argh PHP!

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •