Log in

View Full Version : Replacing foreign chars in a string with regexp?



Luterin
05-09-2008, 10:40 AM
I would like to replace a bunch of foreign chars to a html entity, but I can not use the htmlentities function, since the string consists of a full html page, and I don't want any tags or anything within any tag to be replaced. Just the text in between the tags.

An Example:

Input:
<a html='första_sidan.html'>Detta är en länk till en sida.<br>
<img src='färger.gif'>Och en lite färg bild...Förslag, återge...

Output:
<a html='första_sidan.html'>Detta &auml;r en l&auml;nk till en sida.<br>
<img src='färger.gif'>Och en lite f&auml;rg bild...F&ouml;rslag, &aring;terge...

The characters involved are:
å - &aring;
ä - &auml;
ö - &ouml;

Was thinking that the best sollution would be to do it with the use of preg_replace and a regular expression, since it would save alot of code and power from writing a specific funtion to do it.

But I cant get it to work, so any help and ideas are very welcome.

Thanks in advance!

(Hope the "special" chars show up, otherwise I hope you can understand what it is that I want to achieve anyway. :) )

boogyman
05-09-2008, 01:10 PM
I think the best solution would be to create a function, or you could use the str_replace function which does pretty much the same thing as preg_replace but takes us up less memory



$haystack = "Dynamic Drive is the best site ever created";
$needle = "Dynamic";
$replacement = "cimanyD";

$haystack = str_replace($needle, $replacement, $haystack);
// output cimanyD Drive is the best site ever created


if you want you can create an "array" of needles to check for and replace them accordingly




$haystack = "Detta är en länk till en sida."
$needles = array(
'ö' => '&ouml;',
'ä' => '&auml;'
'å' => '&aring;'
);

foreach($needles as $needle => $replace)
{
$haystack = str_replace($needle, $replace, $haystack);
}
// output Detta &auml;r en l&auml;nk till en sida.

Luterin
05-09-2008, 01:18 PM
Thanks, but this will replace the chars everywhere they exist in that string, which isn't what I want, since I need the special chars within the HTML tags (for filename reference etc) to be intact. I just want the actual output text to be modified.

Thanks anyway thou. :)