Archive of UserLand's first discussion group, started October 5, 1998.
Re: Processing Word as Input?
Author: Paul Howson Posted: 4/19/1999; 6:45:27 PM Topic: Processing Word as Input? Msg #: 5125 (In response to 5112) Prev/Next: 5124 / 5126
This is a need for publishers especially. Word is the defacto standard authoring tool (like it or not). People who author material for print publishing mostly use word (as a publishing person, it is my experience that material comes to me in Word format).
A route that I've investigated is to use RTF. RTF is a text-only format. Its design is terrible, but it does operate according to defined rules. You can parse RTF and construct your own representation (according to your needs) of a Word document.
As a publisher, I want to retain information about paragraph styles and convert this to xml markup. I want to retain some of the direct formatting which users apply (e.g. bold and italic) and convert this to xml markup. There's also a lot of garbage formatting (usually) that I want to throw away.
A couple of years back I had a prototype in Frontier to parse rtf. The plan was to deduce the document structure from the way heading styles had been applied and then to re-encode the document as xml --- so it can be cross-media published with Frontier and xmltr. I never had the time to follow it through.
I still think this idea has merit.
There are responses to this message:
- Re: Processing Word as Input?, Oliver Wrede, 4/20/1999; 3:41:20 AM
- Re: Processing Word as Input?, Tommy Sundström, 4/20/1999; 4:53:58 AM
- Re: Processing Word as Input?, Brent Simmons, 4/20/1999; 9:44:58 AM
This page was archived on 6/13/2001; 4:49:25 PM.
© Copyright 1998-2001 UserLand Software, Inc.