Archive of UserLand's first discussion group, started October 5, 1998.

Re: Should-I-Use-XML

Author:Paul Snively
Posted:12/21/1999; 9:22:18 AM
Topic:Today's scriptingNews Outline
Msg #:13791 (In response to 13785)
Prev/Next:13790 / 13792

I think XML is about as useful as the tools are, and how useful that is depends to a huge degree upon what you wish to do.

Recently I've had to do a lot of third-party site wrapping in Java, i.e. be an HTTP client and extract some information from a site for further use.

Thanks to some pretty spiffy tools, here's a Java function to compute the shipping cost for UPS service to a residential address in the United States, assuming that you provide your own packaging:

public static double getShippingCost(String method, int srcZip, String destCity, int destZip, int weight) throws WeaselAppException
{
  StringBuffer query = new StringBuffer ("http://www.ups.com/using/services/rave/qcosthtm.cgi?accept_UPS_license_agreement=yes&10_action=3&14_origCountry=US&22_destCountry=US&47_rate_chart=One+Time+Pickup&48_container=00&49_residential=Yes&");
  double result = 0.0;

  query.append("13_product=" + method + "&");
  query.append("15_origPostal=" + new Integer(srcZip).toString() + "&");
  query.append("20_destCity=" + URLEncoder.encode(destCity) + "&");
  query.append("19_destPostal=" + new Integer(destZip).toString() + "&");
  query.append("23_weight=" + new Integer(weight).toString());

  Path path = Pwd.lookup(query.toString());
  Document doc;
  Node node;
  try
  {
    doc = LooseHtml.parseFile(path);
  }
  catch (IOException ex)
  {
    throw new WeaselAppException(ex.getMessage());
  }
  try
  {
    node = XPath.find("//text()[contains(., \"TOTAL RATE:")]/ancestor::td/following-sibling::td/text()", doc);
  }
  catch (XPathParseException ex)
  {
    throw new WeaselAppException(ex.getMessage());
  }

  String resultString = node.getNodeValue().substring(1, node.getNodeValue().length()); // Drop leading $
  result = Double.valueOf(resultString).doubleValue();

  return result;
}

It's just a few lines, and the only interesting ones are the Pwd.lookup, LooseHtml.parseFile, and XPath.find ones, because they implement opening an HTTP stream, parsing HTML into a DOM-compliant structure, and doing an XPath query, respectively. The rest is just parameterization and exception handling.

The assumption about the page that comes back is that there's a text element somewhere that contains "TOTAL RATE:", that there's a above it, that the has a following at the same level, and that the text element in within that contains a $ followed by a string that can be converted to a double.

It took me a lot longer to write this message than it took to write the above code. I can't think of any better explanation as to why "XML" is cool, but I put "XML" in quotes because XML per se didn't make the above code that easy to write; the ready availability of XML/HTML parsers, DOM implementations, and XPath implementations did. Incidentally, the above code was written using Resin, from <http://www.caucho.com>, as the XML/HTML/DOM/XPath implementation. As I've posted here before, you could do something extremely similar using Frontier by "importing" the HTML into a table and using UserTalk's ODB navigation commands to search for/extract the desired information.

XML is cool because there are so many big brains who think it's cool enough to write cool tools for. No better reason, no worse reason.


There are responses to this message:


This page was archived on 6/13/2001; 4:53:49 PM.

© Copyright 1998-2001 UserLand Software, Inc.