PDA

View Full Version : How to utf-8 encode a Javascript file



kuau
12-07-2011, 03:30 PM
I am trying to help a Japanese translator translate some webpages I did. I was able to get the Japanese characters to show up on the page but the original page uses an external js file. I tried putting @charset "utf-8"; as the first line in the js file, but that immediately threw an error in Dreamweaver. When I put the Japanese characters in the js file and saved it, DW converted them all to S's which then were displayed on the webpage.

The only way I could get it to work was to embed the js in the html file. But I don't want to do it this way as it would be a maintenance nightmare. I tried googling this but did not find a clear answer.

What is the proper way to utf-8 encode an external js file? I put this in the html file: <script type="text/javascript" src="/js/maui.js" charset="UTF-8"> but the maui.js file is where the jp characters are and I can't save the file without losing them.

Thanks for any help. e :)

djr33
12-07-2011, 03:58 PM
The file itself can have a "physical" format. The way that the actual data of that file is stored should also be UTF8. Any information inside the file is just an instruction to the browser, but the way it stores it will keep it from getting corrupted. It's possible that dreamweaver is causing problems with this, although I believe you can use DW to switch the character encoding of the document.
You can change the character encoding of the text document in notepad. Save as, and select a new encoding.

kuau
12-07-2011, 05:21 PM
Dear Daniel: I don't like to sound like a total dummy but, what do you mean by a "physical format?" I am trying to store the data in UTF8 format. That is my question... how do I do that? Everything I have tried doesn't work.

I went into Notepad and saw no reference to encoding anywhere, even in the Help. I've never worked with another language on a webpage before, so this is all new to me. Please be more explicit. Mahalo, e :)

jscheuer1
12-07-2011, 06:23 PM
Where are you getting the Japanese from? How are you making the .js file?

If you have the Japanese text somewhere you can copy it, and open the .js file in NotePad and chose 'Save As', down near the bottom will be the encoding select box. Choose UTF-8. Save it. Now you can paste the Japanese into the file and save it again. Don't put it into DW after that. Upload it to the site via ftp.

djr33
12-07-2011, 07:43 PM
Yes, in the "save as" dialog as John said.

What I meant was that there are two kinds of encodings-- the actual format of the data and the way it is used. If you read UTF8 as UTF8 then that's both UTF8 format and UTF8 usage. But you can also read UTF8 as Latin1 (for example) and you'll get really weird errors. Basically any time you have unmatching encodings you get errors.

Here's a short list of ways you can set an encoding:
1. File format encoding (data)
2. Server settings (what format is is processing in?), such as with PHP sometimes.
3. Database encodings, both data and labels.
4. Labels in files suggesting an interpretation such as a meta tag in your HTML.
5. The format that your server uses to send the data. It could get messed up if your server sends the data in a different format, and that isn't always the same as the file format or the format suggested within the format's meta info.
6. Any program that you use to edit the files must be compatible with and respect the original encoding. I've used some text editors that try to change an encoding (eg, UTF8 to Latin1) and that's horrible to work with-- find a new program.
7. Cutting and pasting can mess with things too. Be careful, but most of the time this is not a major problem with newer computers. (If you've ever tried to use foreign characters in MS Word, though, you've likely had issues with this.)

In theory, all of those can be problems. In general, only (1) and (4) are often problematic. Sometimes (6) as well.


In short, think of a character encoding as a real "code". There are three things you need to decrypt a code: the encoded message, the method of encryption, and the key.
In the case of character encodings, it works like this:
Message: the raw data
Method: the 'physical' data format it's stored in
Key: the meta data note that you're using UTF8 or whatever. Without this, the final program just has to guess.

So, a good setup will look like the following:
1. You will have a file.txt that contains data that represents text that is already encoded in UTF8.
2. That file will be stored in the data ("physical") format of UTF8 as well.
3. Inside the file, there is a note (eg, meta tag) telling the browser to interpret it as UTF8. This is true with HTML files. I don't think it's needed with JS.

If you have any problems, it will be for one of two reasons:
1. One of the steps has an encoding that does not match the others (or a default non-matching encoding is being used because none was specified).
2. The original data is corrupt, such as from cutting and pasting from another format of encoding in the first place.

(1) is fixed just by changing the settings.
(2) is much harder to fix. Since you can't look up what the problem is (unless you're lucky enough to remember all of the encodings-- that would never actually happen, though), then the only real option is trial and error until you decode it by changing the formats. Do everything you can to avoid getting in that situation, though.

kuau
12-07-2011, 11:35 PM
Dear John & Daniel:

Once again I am beholden to you both. I am embarassed to admit that I didn't notice the encoding drop-down in Notepad. But as soon as I saved it as UTF-8, I pasted it back into Dreamweaver CS4 (sorry, had already done it before I re-read the directions), re-created it as an external js file, saved it no problem, and ftp'd it to the server. And it worked perfectly!! You guys are the greatest! Thank you.

I was surprised to find that the encoding is invisible. Is there a way to tell what encoding is on a file if you didn't create it yourself?

djr33
12-07-2011, 11:55 PM
Some editors will tell you. Dreamweaver lets you set a default somewhere, and there might be an option to edit it.
But the simple answer is to open the file in notepad and choose "save as" then look at what comes up. I think that will show the current encoding, although I'm not positive.

In fact, I'm not positive it actually is stored anywhere. It's the way that the data is encoded, not a setting. So maybe it's impossible to know what the encoding is except to know that the characters are appearing correctly when viewing it as UTF8 (for example). But I might be wrong.

jscheuer1
12-08-2011, 12:50 AM
Yes, I still struggle with encoding at times.

It's much like Daniel says, the editors that deal with it generally have both default settings and attempt to take a best guess at what the encoding of a file might be. The editor settings may sometimes make it override the encoding of the opened file. Or, if the editor can't guess the encoding, it will open it in the default.

If you're starting a file from scratch in one of those editors, its encoding will be set to the editor's default. If you open a file - well that's where it gets tricky.

But if the existing file is one encoding and you paste another encoding into it, if the pasted characters aren't the same or at least supported in both encodings, you will get garbage, which is likely what was happening to you.

djr33
12-08-2011, 02:02 AM
One final point to add, since we've already gone into all the details and I may as well explain this too--
Just because your website works, that does NOT mean it's necessarily configured properly, or at least not in the best way. True, it works, and that's usually fine, but there may still be encoding complications.

Here's an example situation:
One time I had a website running from a database (forum posts and so forth). The website worked fine. The contents of the database were edited through forms and displayed on webpages. So everything was in UTF8 because all of those HTML pages were in UTF8. Then I realized a while later that the database itself was in some other encoding. I couldn't edit the raw database information because it was just garbage there. But due to using UTF8 in, then UTF8 out, it actually worked out so that it looked like everything was fine on the website until I noticed that. If you're lucky, it won't cause any problems. But then imagine if I had to change servers and set up a new database. Suddenly I'd get those garbage characters instead of the UTF8 because there was no raw data anywhere in the right format.
(That's one time where I had to go in and find the right un-encoding sequence and rescue the data. A lot of time, no fun.)

In short, it's always good to check that everything is in order because character encodings are very subtle, but very bad if they go wrong. Honestly, it's rarely an issue, but it's still worth putting a little extra effort in to save yourself the huge amount of effort it would take to fix it if in the worst case scenario.

So, this isn't just for databases. You could in theory have a file that's being read as UTF8, but actually stored in something else, and depending on how your computer/server is doing it, you might never know the difference. It's very unlikely, but checking the default settings in your editor will solve that (forever).

//end tangent

Unless you are absolutely sure you'll only use English (without any fancy extra characters like accented letters, hearts, stars or smiley faces), then I recommend always using unicode (UTF8). In theory it's a little slower (or at least it used to be), but it's a good standard and it will work for any characters you have. And if everything is always by default set to it, then you'll never again have to worry about encoding problems.