PDA

View Full Version : Replacing document.getElement



Falkon303
02-21-2009, 12:01 AM
Ok, so I had an idea....

Firstly, I don't know jack about DOM, but secondly, I like the methods of Jquery, and the $('#whatever').html programming.

But here is the deal.



function showme(value)
{
document.getElementById(value) = document.form1.value;
alert(document.getElementById(value).innerHTML);
}


Is something like this possible? doesn't work with that, but the concept of replacing "document.getElementById" with a valid DOM method of getting an element would enable people to use "document.getElementById" without it farking up.

If a library could be made to convert the "document.getElementById" method to the proper DOM method, it would be convenient for old scripts working valid.

- Best

- Ben

jscheuer1
02-21-2009, 03:41 PM
First off, the keyword value is semi-reserved, especially as concerns form inputs and textareas, so is a poor choice here.

Second, if you want backward compatibility and are dealing with forms anyway, just use the document.forms methods.

Third, in jQuery the $('#someId') function will not work if the browser doesn't support the document.getElementById() method. It is not a backward compatibility method, it is a function to extend element objects in modern browsers that already support the document.getElementById() method.

If you want a function that would be as equivalent as possible to document.getElementById() and that would be backward compatible and still allow the use of document.getElementById() as the name of the function/method, you can do:


if(!document.getElementById)
document.getElementById = function(id){
return document.layers? document.layers[id] : document.all[id];
}

Or, if you want to make a shortcut that is backward compatible and always use that:


function myGet(id){
return document.layers? document.layers[id] : document.getElementById? document.getElementById(id) : document.all[id];
}

If you want both a shortcut, and to be able to use document.getElementById() in your code as well and have it be backward compatible:


function myGet(id){
if(myGet.backward)
return document.layers? document.layers[id] : document.all[id];
return document.getElementById(id);
}
myGet.backward = !document.getElementById;
if(myGet.backward)
document.getElementById = function(id){return myGet(id);};

But these aren't really all that useful. One would need a version 4 browser to not have document.getElementById() support (NS 4, or IE 4).

Those browsers will have difficulty rendering valid HTML, let alone running any modern javascript, regardless of whether or not something like this is used.

Twey
02-21-2009, 04:31 PM
That's not good, because it will also return name-indexed elements (which may be collections).

You should check:
if (!document.getElementById)
document.getElementById = (function(coll) {
function getElementById(id) {
var els = coll[id];

// No such element.
if (!els)
return null;

// Element was the desired element.
if (els.id === coll.id)
return els;

// Element is presumably a NodeList (if it isn't, --i >= 0 will
// fail).
// Grab an element that looks right (it might not be what's
// intended if there's more than one element with the same
// ID, but that's invalid anyway)
for (var i = els.length; --i >= 0; )
if (els[i].id === id)
return els[i];

// Move along, boys, nothing to see here.
return null;
}

return getElementById;
})(document.layers || document.all);

jscheuer1
02-21-2009, 05:05 PM
That's not good, because it will also return name-indexed elements (which may be collections).

Correct, but a fine point. If the HTML that the code is to run against is well organized, it will be fine. However, none of this is terribly useful, as already mentioned.

The fact is that the HTML used with older methods, in order for it to be accessible in those browsers must be of a special breed, a subset of valid modern code, with perhaps some invalid but error correctable stuff thrown in for NS 4, or even IE 4. If one is to go to all that trouble, one could as easily also make sure one didn't create named elements (or whatever) in conflict with elements of a given id.

And as I mentioned before, the javascript itself would be severely limited. Or (to elaborate) at least the branching of it would have to go to dumbed down versions for version 4 browsers. In the latter case, great care often must be exercised in these branchings, as even code not meant for and effectively branched away from in version 4 browsers via normal branching procedures can still cause version 4 browsers to barf simply by virtue of appearing during the first pass of the script parser.

Twey
02-21-2009, 05:13 PM
Yes, of course it's not really worth supporting browsers that old any more.

I feel the original poster was missing something, though — document.getElementById() is the 'proper DOM method'. The problem is that older browsers don't always support newer standards like DOM, but the ones that don't even have getElementById() support really are very broken nowadays.

Falkon303
02-21-2009, 07:26 PM
Yes, of course it's not really worth supporting browsers that old any more.

I feel the original poster was missing something, though — document.getElementById() is the 'proper DOM method'. The problem is that older browsers don't always support newer standards like DOM, but the ones that don't even have getElementById() support really are very broken nowadays.


Thankx for clearing that up Twey.

I think what happened was a misunderstanding at one point. Most of my scripting would use "<a onclick="function(this.id);">test</a>", instead of attaching the function to the a tag prior (which I learned is actually alot more organized.... no more "tag hunting"). When I did this, my function also had "document.getElementById()", and someone had mentioned that my method was improper usage of the DOM.

Instead of knowing about pre-tagging elements, I thought that comment meant my "document.getelementbyid" method, since I was aware of doc.form methods and some others, it didn't click, and I thought document.getelementbyid itself was improper.

However, with the ability to pre-tag elements with functions, I am wondering if there is a simple way to avoid the onclick="function(value);" altogether.

I think if I used



<div id="test" name="test">blah!</div>
<script type="text/javascript">
$ = document.getElementById;
var alrtdata = "this is my alert";
var htmldata = "this is my innerhtml data";
function alrt()
{alert(alrtdata);}
function inhtml()
{$("test").innerHTML = htmldata;}
$("test").attachEvent("onclick", alrt);
$("test").attachEvent("onmouseover", inhtml);
</script>


It would be a *better* approach than tagging the element, correct?

jscheuer1
02-21-2009, 08:42 PM
Browsers will not allow you to farm out document.getElementById like that:



$ = document.getElementById;

If you are going to use it, just use it. The $ really shouldn't be used in javascript, the fact that jQuery and other libraries do so doesn't make it right.

There are various ways to get the element clicked on when you've attached/added the event.

As long as there is no nesting of elements, this works well:


function alertText(e){
e = e || window.event;
var t = e.target || e.srcElement;
alert(t.firstChild.nodeValue);
}

A working demo:


<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<script type="text/javascript">
function alertText(e){
e = e || window.event;
var t = e.target || e.srcElement;
alert(t.firstChild.nodeValue);
if(e.preventDefault) e.preventDefault();
return false;
}

function myInit(){
var a = document.getElementById('test');
if (window.addEventListener)
a.addEventListener('click', alertText, false);
else if (window.attachEvent)
a.attachEvent('onclick', alertText);
}

if (window.addEventListener)
window.addEventListener('load', myInit, false);
else if (window.attachEvent)
window.attachEvent('onload', myInit);
</script>
</head>
<body>
<a href="#" id="test">What's This?</a>
</body>
</html>

Twey
02-22-2009, 12:11 AM
$("test").attachEvent("onclick", alrt);
$("test").attachEvent("onmouseover", inhtml);One of the hazards of giving a function like that such a short name is that you start to feel like you're accessing a variable or something. You aren't: retrieving elements like that is expensive, and should always be cached. A long name helps you remember to do that.

There's nothing wrong with attaching little bits of JS in your document. These hooks are often much more elegant than the equivalents, and the browser can always ignore them if it doesn't want to work with your script.

While document.getElementById() is perfectly valid DOM, innerHTML is not part of any standard, and should be avoided wherever possible (and fallback provided when not). Treating code as strings is rarely a good idea.

John: Why did you choose Transitional and Western-1 over Strict and UTF-8? Nowadays there shouldn't be a reason to use either; they have both been thoroughly relegated to the realms of legacy.

jscheuer1
02-22-2009, 01:30 AM
John: Why did you choose Transitional and Western-1 over Strict and UTF-8? Nowadays there shouldn't be a reason to use either; they have both been thoroughly relegated to the realms of legacy.

I use HTML 4.01 Transitional for all of my impromptu projects/demos. Things like this. You will notice:



<body>
<a href="#" id="test">What's This?</a>
</body>

Definitely invalid for HTML 4.01 Strict. I do validate strict for anything I work on for any length of time though, that is if it is/can be supported under that DTD.

UTF-8 versus iso-8859-1? More just a habit. I'm open to persuasion here. One thing I have noticed though (at least I think this is true) - certain text characters are not supported in UTF-8. Most importantly for me, the © char. I like my on page script comments to sport it where warranted/merited. And, as far as I know, UTF-8 turns it into a 'garbage' (varies, in Fx here it's �) char. I know I can use (c), but I just like ©. Does the charset matter all that much? Isn't that one of those 'at the discretion of the designer' sort of things? I mean, as long as the content is valid in the charset, what's the problem? Other folks need to (well actually not need, but it can be so much more convenient) use other charsets for various languages, is that also wrong in your opinion?

Twey
02-22-2009, 11:41 AM
Definitely invalid for HTML 4.01 Strict.For Transitional too. It's obviously a code fragment, and the reader can be expected to understand as much. They might be newbies, but they're not stupid :p If you're talking about nesting the <a> directly within the <body>, then while that's true, you should really consider just making the code correct instead. Transitional should not be used for new pages, and I don't think we should be encouraging the habit. It's not difficult to write valid code once you get used to it.
One thing I have noticed though (at least I think this is true) - certain text characters are not supported in UTF-8. Most importantly for me, the © char.UTF-8 is a Unicode encoding. It can represent most of Unicode, which, in practice, means every character you'll ever need (there are some really obscure ones in higher realms of Unicode that it can't support; for these you should use one of the larger Unicode encodings such as UTF-32, but like I say, you'll almost certainly never need them). It certainly has a © character. This is in contrast to Western-1, which supports only a very limited set of characters — Unicode contains a vast superset of the characters of Western-1. For Western (mostly ASCII) text, your default encoding should be UTF-8, which is ASCII-compatible and capable of representing ASCII characters in one byte. The only time you would want to use another encoding is for backwards compatibility with some old (non-Unicode-capable) software, or when encoding a lot of text with a high ratio of characters with high Unicode codepoints, such as Chinese text: in order to represent ASCII characters in one byte, UTF-8 requires three bytes per Han character, whereas UTF-16 requires two bytes for any character.
And, as far as I know, UTF-8 turns it into a 'garbage' (varies, in Fx here it's �) char.This sounds like either your editor is encoding it incorrectly or your server is telling the browser that it ought to be in another charset. Remember that a server header is the preferred method of setting an encoding, and (as with any server header) will override the equivalent <meta> tag if found in the document. The <meta> you've used here should be useful only in case the user decides to save a local copy of the page, in which case server headers will obviously be unavailable.
Other folks need to (well actually not need, but it can be so much more convenient) use other charsets for various languages, is that also wrong in your opinion?Legacy software notwithstanding, they will need to use a charset other than Western-1, which can only represent the Latin alphabet and variations on it, but UTF-8 should suffice for pretty much anyone unless they speak Martian. The only consideration is efficiency: as I said above, in certain cases UTF-16 may be more efficient. Either way, a Unicode encoding is the way to go.

jscheuer1
02-22-2009, 02:45 PM
I'm talking about the literal hex a9 text character copyright symbol. I see it turned to garbage all the time in 'view source' of UTF-8 documents where it appears in comments, and if it is used instead of its entity in the content of the document, the same thing happens. It's not even valid under UTF-8. Any important work I do using the iso-8859-1 charset is validated, so I'm not using any characters that it doesn't support. However, if UTF-32 is better, I may actually switch, it's only a clip in my editor. Nope, just checked, apparently a Window's EOF or something quite ordinary is invalid in UTF-32. The error I get is:


Unrecognised BOM 3c2f6874

Whatever that means. It comes at the last line of the file:


</html>

Now that's just stupid. The page is just fine in iso-8859-1 or in UTF-8 if I take out the literal © char.

There is nothing invalid I know of about the code from my previous post that has the demo using the transitional DOCTYPE.

I still see HTML 4.01 Strict as too tight or at least tighter than I want to be at times. Other times I want to be strict. I'm much more comfortable working in it than I used to be. It handles images poorly (I know it's valid, but it ticks me off - seems like unnecessary slavishness to minutia). When I'm answering many questions I really don't want to stop and take the time to work out code as strict. At least my use of Transitional is generally valid, unlike the way XHTML DOCTYPES are bandied about.

Twey
02-22-2009, 03:25 PM
I'm talking about the literal hex a9 text character copyright symbol. I see it turned to garbage all the time in 'view source' of UTF-8 documents where it appears in comments, and if it is used instead of its entity in the content of the document, the same thing happens.As I said, it sounds like your server and/or editor are using the wrong encoding. Check them. The copyright symbol is definitely supported in Unicode — in fact, the 0xA9 to which you refer is actually the Unicode codepoint, U+00A9.
However, if UTF-32 is better, I may actually switch, it's only a clip in my editor.Woah Nelly. UTF-32 is four times the size of UTF-8 or ASCII for Latin text. I said it might be necessary if you needed access to some really obscure characters with very high codepoints. The copyright symbol is not one of them — any decent Unicode encoding will be able to represent that.
Unrecognised BOM 3c2f6874Hm, odd error — the BOM should be optional. Oh, Windows does like it, though. You may have to set your editor to manually insert the BOM.
There is nothing invalid I know of about the code from my previous post that has the demo using the transitional DOCTYPE.No, I meant the fragment to which you referred. Markup fragments generally will be invalid on their own, and the readers should be aware of this.
It handles images poorly (I know it's valid, but it ticks me off - seems like unnecessary slavishness to minutia).How so?
When I'm answering many questions I really don't want to stop and take the time to work out code as strict.Well, as I said, once you get used to it it comes naturally. Adjusting to something new is always a struggle when you first start, but it does get easier :)
At least my use of Transitional is generally valid, unlike the way XHTML DOCTYPES are bandied about.Well, in a manner of speaking — but then so is HTML2. It is preferable to leading people into unknowingly using an XHTML DOCTYPE, certainly, but I think it would be better all around if you used Strict, even if your code weren't valid. At least that way when the user comes to validate their page, they can easily see the errors and fix them, rather than being fooled into thinking that their page is satisfactory.

jscheuer1
02-22-2009, 05:19 PM
Sorry, I am unable to validate this document because on line 16 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.

The error was: utf8 "\xA9" does not map to Unicode

Here's line 16:


How does a literal © char (&copy;) look in UTF-8? Does it even validate?

Here's the page:

http://home.comcast.net/~jscheuer1/side/utf_8.htm

Here's the validation link:

http://validator.w3.org/check?verbose=1&uri=http%3A%2F%2Fhome.comcast.net%2F~jscheuer1%2Fside%2Futf_8.htm

Snookerman
02-22-2009, 05:36 PM
Mine validates just fine: 2515
I don't get the � char in Fx either. I wrote and saved it in Dw.

Twey
02-22-2009, 06:21 PM
I can't explain the validator error, but reading that stream through the console, the character that displays badly is not U+00A9 — the byte seems to contain the value 251? Your editor doesn't seem to be saving it correctly.

jscheuer1
02-22-2009, 06:48 PM
I just discovered that my editor (Edit Pad Pro, it is an older version, that may explain it) cannot 'save directly' as UTF-8! I must convert the file and then re-save it. It adds a BOM at the beginning of the file though this may be easily deleted by backspacing over it. Not the best situation, I may have to switch editors, a real drag, as I'm so used to and have so heavily configured this one. Even upgrading to a newer version of it would be disruptive.

Thanks though guys for the heads up.

Twey
02-22-2009, 07:57 PM
Personally I've come to love a little disruption — it does so add excitement to one's life, and has the benefit of allowing one to discover better alternatives (*cough*emacs (http://www.gnu.org/software/emacs/)*cough*)! :p

jscheuer1
02-23-2009, 05:11 AM
emacs looks arcane to me. Anyways, I think I will stick to what I know. I've been playing with EPP's feature for UTF-8 conversion. It doesn't appear to be too hard. I'll need to play around with it a bit more, but it also looks as though once you've saved a file as UTF-8 encoded, as long as you don't add any characters which when encoded as ANSI will be a problem in UTF-8 (like that \xa9 char), it will continue to be valid UTF-8. Like all the characters available on the keyboard (without doing anything special) appear to be OK. It's just a shame that it cannot be set to save by default in that encoding, and that this situation gives rise to possible gaffs when a file is converted, then altered, then, reconverted in one direction or the other or both, etc.

jscheuer1
02-23-2009, 06:12 AM
Ah, I've been looking into this a bit more, and it appears that the ANSI encoding and the UTF-8 encoding (without the BOM, which is not recommended anyway) are virtually identical. It's just that when you use certain characters like \xa9 or \xae, you must precede them in ANSI encoding with a \xc2, then they look fine and validate in UTF-8. "Normal" characters are fine in both encodings. But this gives rise to odd situations - say, in an external javascript, there is no way to be certain how that will be viewed. I guess the server can determine it, but one doesn't always have control over one's host's server, certainly not over the servers other's use code that you've written on. This all points to the sad but true fact that with UTF-8 encoding on your page, your external script may or may not be seen in UTF-8. So I would think perhaps iso-8859-1 isn't so bad after all, or that regardless, one should simply avoid, what would you call it, high ASCII ?

Twey
02-23-2009, 05:24 PM
emacs looks arcane to me.Not really — it's very popular. It's one of the oldest editors still in use today, and for good reason. It takes maybe half an hour to learn the basics (there's an introduction built in) and it's certainly worth it in my opinion.
it also looks as though once you've saved a file as UTF-8 encoded, as long as you don't add any characters which when encoded as ANSI will be a problem in UTF-8 (like that \xa9 char), it will continue to be valid UTF-8.Hardly surprising, since UTF-8 (like most other single-byte encodings) is ASCII-compatible — every character below 128 is the same as in ASCII. Only above that do they tend to differ.
It's just that when you use certain characters like \xa9 or \xae, you must precede them in ANSI encoding with a \xc2,If they happen to have the right offset in the current code-page, then yes, that will work. It won't work for more esoteric characters, though. Note that there's no single 'ANSI encoding' — this group of encodings are just specifications of how the ASCII characters from 128 upward should be interpreted. There are a large variety of them, and they are not compatible (above 127).
But this gives rise to odd situations - say, in an external javascript, there is no way to be certain how that will be viewed.I'm fairly sure, though I haven't tested, that an external Javascript will be interpreted according to the encoding of the page on which it runs. Internally, it's handled as UTF-16. If one inserts the BOM, it's almost guaranteed that the encoding will be guessed correctly, unless the server has been horrendously misconfigured to force some lesser encoding, an act of gross stupidity which, if no means of overriding it were to exist, I would consider tantamount to racism.
I guess the server can determine it, but one doesn't always have control over one's host's server, certainly not over the servers other's use code that you've written on.Really, one should — I don't think I would recommend any server that didn't at least provide .htaccess control or the like. If the server doesn't specify, one can just insert the BOM and have it guessed sanely.
So I would think perhaps iso-8859-1 isn't so bad after allWell, everything you've said above also applies to ISO-8859-1, with the additional limitations that it is incapable of representing any non-Western characters.
or that regardless, one should simply avoid, what would you call it, high ASCII ?I don't know about high ASCII, but UTF-8 should be widely supported by all modern software. ASCII-compatible encodings degrade gracefully, at least for primarily ASCII-compatible text, so I don't intend to limit myself to one of the old language-specific character sets.
I've been playing with EPP's feature for UTF-8 conversion. It doesn't appear to be too hard. I'll need to play around with it a bit more, but it also looks as though once you've saved a file as UTF-8 encoded, as long as you don't add any characters which when encoded as ANSI will be a problem in UTF-8 (like that \xa9 char), it will continue to be valid UTF-8.So it goes through the document and converts all characters to UTF-8, then saves? That is indeed incredibly hackish. Have you tried a new version? Lacking decent UTF-8 support in this day and age is not something I'd accept from my editor.

jscheuer1
02-23-2009, 06:20 PM
The main point I'm making about avoiding what I'm referring to as high ASCII is that, I've seen this, and it is just as you say. If the script is served with the page, you get the page's encoding. If you view the script source, it depends upon the browser and/or how the browser fetched that source. This generally isn't too big a deal, we really only worry about how the script performs, but this can come into play as well. I'm thinking that, yes the server will play a role, depending upon how it is configured, however the BOM is not recommended according to the w3c. I'm sure you have no qualms about the BOM though, as it is only discouraged by the w3c out of deference to older browsers and servers, which may have a problem with it. That's unless I misunderstand your apparent stance that all older software and equipment should be thrown away.

Now, I'm not saying you should avoid any characters. It just seems to be a prudent practice depending upon the target environment. Like if you are running on known servers and/or can configure them as need be, do as you wish. But if you are releasing to the public for use on who knows what equipment, it just seems prudent to avoid characters that could pose a problem, that is unless they are essential to the script, or whatever it is.

Twey
02-23-2009, 07:52 PM
what I'm referring to as high ASCIIYes, but that's a misnomer. ASCII uses the first bit of the byte as a parity check, and as such only goes up to 127.


If you view the script source, it depends upon the browser and/or how the browser fetched that source.Filesystems have no innate way of storing encodings or even filetypes. With the advent of MIME-types, browsers are actually better-suited (if properly informed by the server) to get the correct MIME-type than any other piece of software, which must all rely on the same sort of heuristic algorithms the browser applies when the server does not specify an encoding (Windows infamously gets this horribly wrong; saving 'Bush hid the facts' to a text file in Notepad and then reöpening it will cause the software to incorrectly guess the contents to be Unicode and garble them).

You're half-right in that ASCII is still the highest standard of compatibility — described as 'possibly the most successful computing standard ever', it's implemented and thoroughly supported by just about all software since its inception. However, up to a certain level, the same can be said of Unicode. If you're targetting Windows 3.1 then it might be worthwhile to stick to ASCII only, but any system built within the past decade will have good support for Unicode.


That's unless I misunderstand your apparent stance that all older software and equipment should be thrown away.You make me sound like a radical :p The fact of the matter is that UTF-8 is even older than IE6. As a browser with IE5's level of standards support would not be accepted today and generally isn't even worth supporting, so software that isn't capable of Unicode is not acceptable and shouldn't be considered — including Web hosts that offer the user no power, and especially those that have ridiculous default configurations that prevent proper encoding detection.

Then, too, it's important to remember that ASCII is incredibly limiting: it doesn't allow even all the characters used in English, let alone other languages, such as the em dash (—), en dash (–), proper quotation marks (‘’, “”), ligatures (œsophagus, diæresis) and indeed the diæreses themselves (coöperate with naïve Chloë), as well as assorted other diacritics (fiancée, cursèd).

jscheuer1
02-24-2009, 01:02 AM
It seems we are, once again - basically in agreement. Our differences on this issue appear (to me at least) to primarily only revolve around circumstances of a particular implementation.

So let's get back to my stupid editor. I'm so spoiled. With a dblclick I can choose from any number of 'come with' or custom clips that can either be inserted into existing text or wrapped around a highlighted section, or inserted with the cursor brought to the point of optimal customization of the clip. Does emacs have that?

Falkon303
02-24-2009, 05:32 AM
It's interesting you bring this up (the updated/valid code issue).

All my mom could afford was dreamweaver 8, and chances are all of the code it's so brilliantly autotagging is possibly outdated....

:{

My mom got me it for X-Mas because I was plum broke (a lot of people are...go usa economy!). Now that I have my web dev job back, looks like I'll either be using notepad, or saving for a better editor.

What do you gents recommend for a good all-around editor?

Twey
02-24-2009, 03:39 PM
With a dblclick I can choose from any number of 'come with' or custom clips that can either be inserted into existing text or wrapped around a highlighted section, or inserted with the cursor brought to the point of optimal customization of the clip. Does emacs have that?It has an equivalent feature, called skeletons. Skeletons can also be bound to a key or executed by name, are (optionally) formed interactively so you can insert variables as you go (or derive them automatically with elisp), and allow specification of a 'place where stuff would go into', as Tom Breton describes it, which is where the cursor will end up if nothing's selected, or the current selection will end up if something is.


All my mom could afford was dreamweaver 8, and chances are all of the code it's so brilliantly autotagging is possibly outdated....If it's valid HTML 4.01 Strict, you're probably OK. I doubt it will be, though — even the updated versions of DreamWeaver like to create ridiculously outdated and presentational HTML and redundant Javascript. djr33 assures me it's a decent editor if you don't mind battling it every step of the way to get what you want instead of what it wants, but somehow I don't find that very convincing.


My mom got me it for X-Mas because I was plum broke (a lot of people are...go usa economy!). Now that I have my web dev job back, looks like I'll either be using notepad, or saving for a better editor.I've yet to find a proprietary editor worth its salt, although I do hear good things about TextMate for Mac. Outside that, I recommend emacs (obviously). There's a discussion on the merits of various editors in this thread (http://dynamicdrive.com/forums/showthread.php?t=41731).

jscheuer1
03-02-2009, 06:53 AM
Back to this UTF-8 business for a little bit. I just uploaded an advance PR index page for one of the sites I manage. It was drafted (not by me) in Word. I cleaned it up a bit and validated it to UTF-8. It looked fine locally. Once I uploaded it to the server though, there were a bunch of garbage charaters showing - the very ones required to convert it to UTF-8. I changed it back to ANSI and iso-8859-1, it looked fine. I'm pretty sure that this is an issue with the server because the page, as I say validated as UTF-8 and looked fine locally that way. It doesn't validate as iso-8859-1 because of all of the "non SGML" characters Word used in it. But I cannot be messing around with the server to fix this without dealing with the owner or one of his minions, and this space is donated for commercial consideration, so I'd rather not, as long as the page is viewable and looks as intended, which it does. Go figure.

On the server where my personal web pages are, this doesn't appear to be a problem. At least I've not noticed anything like that yet.

Twey
03-02-2009, 11:10 PM
It sounds like you've got a server that's forcing the charset to Western-1. That's incredibly bad practice. If you telnet in and do a HEAD request for that page, you can see what headers are being generated for it. I'd guess the Content-Type one probably goes Content-Type: text/html; charset=iso-8851-1.

jscheuer1
03-03-2009, 01:12 AM
Well, I don't think that will be necessary. Both Fx and Opera report that the page is being served incorrectly (iso-8859-1). The funny thing is, I believe it is an Apache server. I would expect something like this from a Windows server, but I suppose, as with driving, the problem is often not the car, rather the nut behind the wheel.

Needless to say, I've changed things around so that the page (though not valid) works. It's only fairly temporary anyway. I will be making up an entire suite of pages for the event in a week or so that will replace this press release.

However, this is one of those occasions I hinted at earlier in our discussion, where one has no control over the server.

What I've always done in the past (for this site once I've replaced the press releases) is simply validate to iso-8859-1, either using Unicode entities for any characters not directly valid in the charset, or changing them to their nearest ANSI equivalents.

The page (press release) displays just fine on my personal website as UTF-8, another indication that it is the server.

Twey
03-03-2009, 01:34 AM
The funny thing is, I believe it is an Apache server. I would expect something like this from a Windows server, but I suppose, as with driving, the problem is often not the car, rather the nut behind the wheel.Quite so; Apache's default configuration does not specify a charset at all, which has raised some eyebrows, but if the host decides to override that with a limiting charset, there's not much one can do without a .htaccess.