Log in

View Full Version : Process text within a web page



raplh
01-17-2009, 11:13 AM
Hi,

I'm trying to write a web site that 'translates' another web page. There is already a site that does this;

http://www.rinkworks.com/dialect/

but it just replaces text based a database of words and phrases; I need to parse each sentence using algorithms.

I've been trying to develop need a php script that loads a web page, extracts each sentence from the page, processes it, then inserts the processes sentence back into the page preserve the page formatting.

I've been looking at DOM but am so new to web programming its still a bit confusing.

If anyone has any pointers on how this could be done I'd be very grateful. I do realise it's not trivial!

Cheers.

prasanthmj
01-20-2009, 06:40 PM
Use file_get_contents() (http://in.php.net/file_get_contents) function to read the page

Use an HTML parser (http://php-html.sourceforge.net/) to parse and get the relevant parts of the page

For basic PHP tutorials see: PHP form processing (http://www.html-form-guide.com/php-form/php-form-processing.html)

Twey
01-21-2009, 11:29 AM
DOM is the least of your worries :) Machine translation is a big field, and very incomplete. You could perhaps look at the source of Apertium (http://www.apertium.org/) for some ideas about one way of doing it, but I doubt you'll get much out of it without a solid theoretical grounding in linguistics and natural-language processing. I hate to be negative, but if you're struggling with DOM then you have a couple of years' study ahead of you in computer science alone before you'll be ready to tackle this sort of problem with any confidence.