Archive of UserLand's first discussion group, started October 5, 1998.

Re: Frontier-XML Tutorial

Author:Deke Smith
Posted:1/12/1999; 11:06:22 AM
Topic:Frontier-XML Tutorial
Msg #:2036 (In response to 2023)
Prev/Next:2035 / 2037

How is the DOM implementation going? I did not work on it for most of December for the sake of staying married. ;->

I have implemented the core methods of Level 1 as far as I could without dealing with text encoding. I have not done either the extended XML methods or the HTML methods. The HTML methods would be very useful, but I would have to roll my own HTML parser to create it. It is low on the list of priorities. Blox and xml.* have indigestion with HTML. As soon as I complete the XML core methods I will finish the extended methods.

I am now working on text encoding issues. One issue is tracking text encoding so the output is predictable. Another issue is text manipulation. For example, a special script will have to be written to concatenate strings - UTF-16 string concatenation is not as easy as newString=string1 string2. In addition, getting character 3 of a UTF-8 string can be a huge challenge.

I have written some script-based text encoding translators. So far, I have MacRoman<->ISO8859-1, MacRoman<->Win1252, MacRoman<->UTF-8, MacRoman<->UTF-16, Win1252<->UTF-8, and win1252<->UTF-8. I have started tackling the fun ones like big5<->UTF-8. Encoding conversion will help with the task of string manipulation, also. Strings can be converted to UTF-16 so indexing by character can be accomplished. UTF-16 characters are always double-byte and contain, I think, any character that is needed for the major languages of the world.

The text encoding methods weren't directly part of the DOM spec and should probably be separate anyway, so I placed those in their own extension. There is quite a bit of overlap between that extension and the internationalization suite I wrote called Lingua. I have decided to merge some of the capabilities of Lingua with the text encoding methods to form what I am calling the i18n Extension.

Part of that extension is a TMX engine. TMX is an XML DTD to help translation programs convert phrases from one language to another. I have been using DOM as the framework for the TMX engine. Which, by the way, is what I have been trying to accomplish with DOM. I have been putting my DOM scripts through their paces with the TMX engine and have been finding bugs here and there.

I have built DOM around the concept of drivers. That way the DOM scripts can be used to script any XML parser which can be connected in some way to Frontier. It is possible to script a Java-based DOM implementation with it if someone creates the drivers. The initial driver I am creating during development is the "generic" driver. It is supposed to work with either Blox or xml.*. There are some inefficiencies in the scripts that may be eliminated by using specialized scripts from either Blox or xml.*. To leverage each parser's scripts I plan on customizing the generic driver once it is done for each of the parsers.

I haven't placed licensing information with the present distribution of DOM or i18n extensions. I am searching for a good, "non-infecting", open source license. Anyone who would like to help out with DOM or internationalization issues would be appreciated. If you wish to work with me on this, contact me at deke@tallent.com. I have both DOM and the i18n extension segregated onto a guest database so I can easily synchronize what I do at home with the version at work. That makes it easy to share.

I hope that answers your question, "How's that going?" Whenever my kids ask me a question I ask them back whether they want the long version or the short version. I guess you all have been subjected to the long version.




This page was archived on 6/13/2001; 4:47:13 PM.

© Copyright 1998-2001 UserLand Software, Inc.