Results 1 to 8 of 8

Thread: MS WORD processed text

  1. #1
    Join Date
    Oct 2006
    Location
    New York, NY, USA
    Posts
    262
    Thanks
    42
    Thanked 24 Times in 24 Posts

    Default MS WORD processed text

    I had trouble recently cleaning up a bit of text someone sent me that they had done in MS Word (even saved that .doc as rich-text and an HTML page from Word so I could see the MS coding).

    Now someone else is asking me to help with a Web site containing a lot of text. I'm sure they are typing in a Word Processor such as MS Word. I don't believe Notepad has spell check and similar features a writer would depend on. Some time ago, I used QuarkXPress. I don't recall problems importing text from a Word Processor into such a typography program. Would importing text from a Word Processor into InDesign and then into HTML help clean up the unwanted mark-up coding of the Word Processor?

    Anyone have suggestions about cleaning up MS Word coding so text can be properly marked up and styled for a Web page?

  2. #2
    Join Date
    May 2007
    Location
    Boston,ma
    Posts
    2,127
    Thanks
    173
    Thanked 207 Times in 205 Posts

    Default

    Define classes and use them. If you post an example of text I'll give you an example of classes and the mark up using them.

  3. The Following User Says Thank You to bluewalrus For This Useful Post:

    auntnini (12-16-2009)

  4. #3
    Join Date
    Apr 2008
    Location
    So.Cal
    Posts
    3,643
    Thanks
    63
    Thanked 516 Times in 502 Posts
    Blog Entries
    5

    Default

    (BTW, notepad++ does have a spellchecker plug-in.)

  5. The Following User Says Thank You to traq For This Useful Post:

    auntnini (12-16-2009)

  6. #4
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    30,495
    Thanks
    82
    Thanked 3,449 Times in 3,410 Posts
    Blog Entries
    12

    Default

    MS Word is just a mess when it comes to web pages. Generally, using the UTF-8 encoding will solve most of the problems with stylized quotes and other characters like:



    But the horrible mess it makes when saving to HTML is just a nightmare to fix, though depending upon the lifespan of the page, can often just be lived with.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  7. The Following User Says Thank You to jscheuer1 For This Useful Post:

    auntnini (12-16-2009)

  8. #5
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    Cut and paste from word into another program. This will remove most of the formatting but it will let you use spell check and keep the basic setup like paragraphs.
    Beyond that, you are stuck relying on Word to properly output a usable html page, which is very unlikely.

    Here's an alternative, actually:
    Google documents allows you to import .doc files. Because of this, you can then preserve formatting into a google document (online). There is then an export feature to html, and from a brief attempt it looks like the code is a lot cleaner than the nonsense word generates. Google is likely better than microsoft in this way because microsoft only worries about windows and internet explorer and google attempts to make their software work on all browsers in all systems.
    It certainly still won't be perfect (and probably not pretty code), but it should be a lot better than what word gives you.
    There are also other programs that can open .doc, such as openoffice which may generate slightly better output than word. I think google is the best option, though, because google docs is in the first place web-based.

    One remaining problem here is that word inserts strange characters (see the post above), and I think google docs would preserve them. But they *should* be ok since google docs uses utf 8 encoding (the best encoding for strange characters across various browsers).

    Realistically, though, the only time you should be using Word is for word processing. If you must have content available on the web, then a web design application should be used. The only exceptions to this are when you specifically want to have a "document" on your webpage, for example posting the excel spreadsheet of something. That would be a lot of work to transfer to html and in that case there is no huge problem with posting it as html, though I would suggest you post the file for download as well in case it does not work in a certain browser.
    Daniel - Freelance Web Design | <?php?> | <html>| español | Deutsch | italiano | português | català | un peu de français | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  9. The Following User Says Thank You to djr33 For This Useful Post:

    auntnini (12-16-2009)

  10. #6
    Join Date
    Oct 2006
    Location
    New York, NY, USA
    Posts
    262
    Thanks
    42
    Thanked 24 Times in 24 Posts

    Default text from authors / writers

    Thank you all. I fancy myself as a "professional" HTML / CSS developer. The situation that causes problems is when somebody submits text to you that they typed in MS Word.

    Such as ... http://www.josephdenaro.com/joe/index.html -- which thankfully was not too long.

    I was not familiar with GOOGLE, I'll try to check that out.

    I would have to check if the peerson has NOTEPAD ++ with a spell check, etc.

    I am partial to Dreamweaver, which has some sort of spell checking.

  11. #7
    Join Date
    May 2007
    Location
    Boston,ma
    Posts
    2,127
    Thanks
    173
    Thanked 207 Times in 205 Posts

    Default

    1. You could then copy the text from word
    2. open dreamweaver in html or xhtml or php mode
    3. in dreamweaver open your preferences go to copy and paste and check off what applies probably text with structure plus full formatting, check off retain line breaks and clean up word paragraph spacing (the last 2 are default i think)
    4. goto split view
    5. paste the copied word text into the design section
    6. do a find and replace in the source code for the word " marks and replace with the genric " mark
    7. also for the ' marks replace with the generic '
    8. if you've made the classes use them here as well

  12. The Following User Says Thank You to bluewalrus For This Useful Post:

    auntnini (01-06-2010)

  13. #8
    Join Date
    Oct 2006
    Location
    New York, NY, USA
    Posts
    262
    Thanks
    42
    Thanked 24 Times in 24 Posts

    Default Dreamweaver Preferences

    I usually only go to Dreamweaver Preferences to make the fonts bigger. Overlooked this option

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •