View Full Version : MS WORD processed text
12-15-2009, 12:14 AM
I had trouble recently cleaning up a bit of text someone sent me that they had done in MS Word (even saved that .doc as rich-text and an HTML page from Word so I could see the MS coding).
Now someone else is asking me to help with a Web site containing a lot of text. I'm sure they are typing in a Word Processor such as MS Word. I don't believe Notepad has spell check and similar features a writer would depend on. Some time ago, I used QuarkXPress. I don't recall problems importing text from a Word Processor into such a typography program. Would importing text from a Word Processor into InDesign and then into HTML help clean up the unwanted mark-up coding of the Word Processor?
Anyone have suggestions about cleaning up MS Word coding so text can be properly marked up and styled for a Web page?
12-15-2009, 12:56 AM
Define classes and use them. If you post an example of text I'll give you an example of classes and the mark up using them.
(BTW, notepad++ (http://notepad-plus.sourceforge.net/uk/site.htm) does have a spellchecker plug-in.)
12-15-2009, 04:27 PM
MS Word is just a mess when it comes to web pages. Generally, using the UTF-8 encoding will solve most of the problems with stylized quotes and other characters like:
But the horrible mess it makes when saving to HTML is just a nightmare to fix, though depending upon the lifespan of the page, can often just be lived with.
12-15-2009, 06:40 PM
Cut and paste from word into another program. This will remove most of the formatting but it will let you use spell check and keep the basic setup like paragraphs.
Beyond that, you are stuck relying on Word to properly output a usable html page, which is very unlikely.
Here's an alternative, actually:
Google documents allows you to import .doc files. Because of this, you can then preserve formatting into a google document (online). There is then an export feature to html, and from a brief attempt it looks like the code is a lot cleaner than the nonsense word generates. Google is likely better than microsoft in this way because microsoft only worries about windows and internet explorer and google attempts to make their software work on all browsers in all systems.
It certainly still won't be perfect (and probably not pretty code), but it should be a lot better than what word gives you.
There are also other programs that can open .doc, such as openoffice which may generate slightly better output than word. I think google is the best option, though, because google docs is in the first place web-based.
One remaining problem here is that word inserts strange characters (see the post above), and I think google docs would preserve them. But they *should* be ok since google docs uses utf 8 encoding (the best encoding for strange characters across various browsers).
Realistically, though, the only time you should be using Word is for word processing. If you must have content available on the web, then a web design application should be used. The only exceptions to this are when you specifically want to have a "document" on your webpage, for example posting the excel spreadsheet of something. That would be a lot of work to transfer to html and in that case there is no huge problem with posting it as html, though I would suggest you post the file for download as well in case it does not work in a certain browser.
12-16-2009, 09:22 PM
Thank you all. I fancy myself as a "professional" HTML / CSS developer. The situation that causes problems is when somebody submits text to you that they typed in MS Word.
Such as ... http://www.josephdenaro.com/joe/index.html -- which thankfully was not too long.
I was not familiar with GOOGLE, I'll try to check that out.
I would have to check if the peerson has NOTEPAD ++ with a spell check, etc.
I am partial to Dreamweaver, which has some sort of spell checking.
12-16-2009, 11:12 PM
You could then copy the text from word
open dreamweaver in html or xhtml or php mode
in dreamweaver open your preferences go to copy and paste and check off what applies probably text with structure plus full formatting, check off retain line breaks and clean up word paragraph spacing (the last 2 are default i think)
goto split view
paste the copied word text into the design section
do a find and replace in the source code for the word " marks and replace with the genric " mark
also for the ' marks replace with the generic '
if you've made the classes use them here as well
01-06-2010, 01:33 AM
I usually only go to Dreamweaver Preferences to make the fonts bigger. Overlooked this option
Powered by vBulletin® Version 4.2.0 Copyright © 2013 vBulletin Solutions, Inc. All rights reserved.