PDA

View Full Version : multi-browser javascript word replace



nhoss2
06-13-2008, 10:34 AM
can some one please write a script that replaces all "5" to "five" and "4" to "four" in a webpage? please note that replacing "4" and "5" are just examples I had a script before but I went on my site on internet explorer and it wasn't changing the words. It only worked in firefox. this is the script i had before:


var replacements, regex, key, textnodes, node, s;

replacements = {

"\u201cv": "c",
"\u201c": '"',
"\u201d": '"',
"\u2026": "...",
"\u2002": " ",
"\u2003": " ",
"\u2009": " ",
"\u2013": "-",
"\u2014": "--",
"\u2122": "(tm)"};
regex = {};
for (key in replacements) {
regex[key] = new RegExp(key, 'g');
}

textnodes = document.evaluate(
"//text()",
document,
null,
XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE,
null);
for (var i = 0; i < textnodes.snapshotLength; i++) {
node = textnodes.snapshotItem(i);
s = node.data;
for (key in replacements) {
s = s.replace(regex[key], replacements[key]);
}
node.data = s;
} (script from dive into gease monkey)

jscheuer1
06-15-2008, 02:26 PM
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Version 5 Browser Compatible Text Conversion Script</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<script type="text/javascript">
if (document.documentElement && document.documentElement.childNodes)
window.onload = function() {
var tn = [], grabTextNodes = function(n) {
for (var i = n.length - 1; i > -1; --i)
if (n[i].nodeName != '#text' && n[i].childNodes)
grabTextNodes(n[i].childNodes);
else
tn[tn.length] = n[i];
}, replacements = {

"\u00a9": "(c)",
"\u201c": '"',
"\u201d": '"',
"\u2026": "...",
"\u2002": " ",
"\u2003": " ",
"\u2009": " ",
"\u2013": "-",
"\u2014": "--",
"\u2018": "'",
"\u2019": "'",
"\u2122": "(tm)",
"5" : "five"

}, regex = {};

grabTextNodes(document.body.childNodes);

for (var key in replacements) {
regex[key] = new RegExp(key, 'g');
for (var i = tn.length - 1; i > -1; --i)
tn[i].nodeValue = tn[i].nodeValue.replace(regex[key], replacements[key]);
};
};
</script>
</head>
<body>
“Hi Bob” ™ © 5
<div>
‘The Last Title’
</div>
</body>
</html>

nhoss2
06-16-2008, 07:54 AM
oh my god, thank you sooooooo much!! i though no one was going to reply

EDIT: last request, which is just adding one more line of code, which is replacing . (fullstop) with a (space). i tried putting the . in between the two quotation marks, but that ended up replacing everything on the page..

jscheuer1
06-16-2008, 09:22 AM
I'm not sure exactly what you mean but, a period (.) in a regular expression matches any character. If you want it to only match a period, you need to escape it:


\.

nhoss2
06-16-2008, 11:41 PM
yea, im a bit crap at explaning things, but i think you just solved my problem. thanks youre a legend

nhoss2
06-17-2008, 08:39 AM
ugh, it didnt work.. what i wanted to do was replace a full stop with a space. i put
"\.": " ", but that didnt work

DimX
06-17-2008, 09:01 AM
Because your regexps are constructed from a string you need to escape the backslash itself as well:


"\\." : " "

nhoss2
06-18-2008, 01:09 AM
oh, right. thanks alot, you guys are really helpfull

EDIT:it worked, but it only replaced one fullstop, how can i reaplce all fullstops on the page?

DimX
06-18-2008, 09:45 PM
It replaces all fullstops for me.
Make sure that 'g' is there: regex[key] = new RegExp(key, 'g'); and that you put a comma after the replacement which precedes "\\." : " ":


...,
"5" : "five",
"\\." : " "

geo100x
11-20-2008, 05:09 PM
I am try to use the code above to replace the words of normal body words in bold but instead I defend my words bold word appears <b>example</ b>.

I can tell someone how to solve this problem and instead of <b>example</ b> to defend my example ?

Thanks.

jscheuer1
11-21-2008, 06:15 AM
That can really be worked out a number of ways depending upon how you want your document affected, here's one way (case insensitive, whole words):


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<script type="text/javascript">
function toBold(str){
str = str || document.getElementById('whatstring').value;
var r = new RegExp('(\\b' + str + '\\b)', 'gi'),
bb = function(a,b){return b.bold()},
wa = document.getElementById('workarea');
wa.innerHTML = wa.innerHTML.replace(r, bb);
}
function toNorm(str){
str = str || document.getElementById('whatstring').value;
var r = new RegExp('(<b>)+( *)(' + str + ')( *)(<\\/b>)+', 'gi'),
nn = function(a,b,c,d,e){return c + d + e;},
wa = document.getElementById('workarea');
wa.innerHTML = wa.innerHTML.replace(r, nn);
}
</script>
</head>
<body>
<div id="workarea">
<b>This </b> is an example with other examples for this.
</div>
<div>
<input type="text" id="whatstring"><br>
<input onclick="toBold()" type="button" value="Bold">
<input onclick="toNorm()" type="button" value="Normal">
</div>
</body>
</html>

bfinoradin
11-29-2010, 03:03 AM
Hi John – how would you implement \b (whole word) in your first example? :confused:

thanks as always!




regex[key] = new RegExp(key, 'g');

jscheuer1
11-29-2010, 05:33 AM
Well, \b doesn't mean 'whole word'. It means 'word boundary'. That is it represents any character that cannot be a part of a word.

In a new RegExp constructor, the \ must be escaped - I think it's twice, so you would have (you haven't specified how you want to use the \b) something like:


regex[key] = new RegExp('\\\b' + key, 'g');

If you want more help:

Please post a link to a page on your site that contains the problematic code so we can check it out.

bfinoradin
11-29-2010, 05:57 AM
Ah, I see. The problem I am having is (for instance) if I want to replace "a", currently the script will replace the "a" within "about", or if I chose to replace the word "locate" it will affect "relocate" as well. For my application of the script it would ideally replace words only as typed, albeit case insensitive.

As you can see here (http://www.benfinoradin.info/polar/) it is currently removing all occurrences of "a".

Here (http://www.benfinoradin.info/polar/list2.js) is what I am using:

if (document.documentElement && document.documentElement.childNodes)
window.onload = function() {
var tn = [], grabTextNodes = function(n) {
for (var i = n.length - 1; i > -1; --i)
if (n[i].nodeName != '#text' && n[i].childNodes)
grabTextNodes(n[i].childNodes);
else
tn[tn.length] = n[i];
}, replacements = {


'a' : "_",


}, regex = {};

grabTextNodes(document.body.childNodes);

for (var key in replacements) {
regex[key] = new RegExp(key, 'g');
for (var i = tn.length - 1; i > -1; --i)
tn[i].nodeValue = tn[i].nodeValue.replace(regex[key], replacements[key]);
};
};

jscheuer1
11-29-2010, 06:46 AM
Some things you might like to correct - On fox.php the style section comes before the opening <HTML> tag. It should be external and come in the head section. On the same page the script tag comes after the closing </HTML> tag. It should come before the closing </BODY> tag.

On to the script. No comma allowed after the last entry in an Object. This affects only IE and some older browsers as far as I know. Most if not all other modern browsers now error correct for this:



replacements = {


'a' : "_",


}

That's only one entry. If you had more than one:


replacements = {


'a' : "_",
'b' : "+"

}

OK, I tried this out, it needs to be escaped only once:


if (document.documentElement && document.documentElement.childNodes)
window.onload = function() {
var tn = [], grabTextNodes = function(n) {
for (var i = n.length - 1; i > -1; --i)
if (n[i].nodeName != '#text' && n[i].childNodes)
grabTextNodes(n[i].childNodes);
else
tn[tn.length] = n[i];
}, replacements = {


'a' : "_"


}, regex = {};

grabTextNodes(document.body.childNodes);

for (var key in replacements) {
regex[key] = new RegExp('\\b' + key + '\\b', 'g');
for (var i = tn.length - 1; i > -1; --i)
tn[i].nodeValue = tn[i].nodeValue.replace(regex[key], replacements[key]);
};
};

bfinoradin
11-29-2010, 01:58 PM
Awesome – thanks John, a huge help as usual.

One more question for you… I think this would fundamentally change the structure of the script, but how would I implement replacement where it is replacing each character of filtered words individually, as the have done here (http://www.java2s.com/Tutorial/JavaScript/0520__Regular-Expressions/Useregularexpressiontoremovebadwords.htm). In other words… I define "*" as the replacement, and "this gets filtered" would become "**** **** ********".

jscheuer1
11-29-2010, 04:40 PM
Well, you don't want to have the bad words one place and then see them *'ed out in another place as that example code offers to do. But its method can be incorporated. However, its method isn't supported in some legacy browsers and I'm not sure how to test for and recover from that, or if it's even worth bothering to. It's fine in IE 5.5+ and virtually any other browsers' "version 5+" (that includes any browser written after Netscape 5 (like Firefox - all versions, Safari - all versions) hit the scene many, many years ago).

I also notice that you seem willing to insert this script at the end of the page, in which case we can gain both significant speed up and potential flexibility in the use of other scripts for the pages that use this script by using an anonymous function rather than the window.onload:


if (document.documentElement && document.documentElement.childNodes)
(function(){ // anonymous function instead of onload
var tn = [], grabTextNodes = function(n) {
for (var i = n.length - 1; i > -1; --i)
if (n[i].nodeName != '#text' && n[i].childNodes)
grabTextNodes(n[i].childNodes);
else
tn[tn.length] = n[i];
}, replacements = {


'\\ba\\b' : "_", // \\b added individually for those that we want it for
'(former director|nuclear watchdog)' : null // set the replacement value to null for those items to get the * treatment


}, regex = {};

grabTextNodes(document.body.childNodes);

for (var key in replacements) {
regex[key] = new RegExp(key, 'gi'); // i added to g for case insensitive regex, removed \\b - now added above
for (var i = tn.length - 1; i > -1; --i)
if(replacements[key]) // if a value was specified, use the old method
tn[i].nodeValue = tn[i].nodeValue.replace(regex[key], replacements[key]);
// otherwise use the new method of asterisk replacement:
else tn[i].nodeValue = tn[i].nodeValue.replace(regex[key], function(a){return a.replace(/[^ ]/g, "*");});
};
})(); // proper syntax for closing the anonymous function

Notes:


The * replacement regex, [^ ] is changed from . in the link you gave on that, as this will preserve spaces (if any) in the replaced text for bad words only (value null entries).



Notice the syntax for bad words use of the () and | tokens. This grouping may also be used with the other syntax:


'\\b(a|and)\\b' : "_",

But it needs to go inside the \\b's.


I used two word bad words for each. Single word or other multiples may be used. There's no limit to the amount of | symbols:


'(former director|nuclear watchdog|another bad phrase|singlebadword)' : null

or:


'former director' : null,
'nuclear watchdog' : null,
'(another bad phrase|singlebadword)' : null