Archive of UserLand's first discussion group, started October 5, 1998.

Re: Meta tag theft

Author:Paul Snively
Posted:8/20/1999; 8:11:26 AM
Topic:Today's scriptingNews Outline
Msg #:9717 (In response to 9715)
Prev/Next:9716 / 9718

"OK, why should the search engines pay attention to the meta tag, or give it more weight than other content in the page?"

Because search engines haven't solved the natural language understanding problem. Instead, they generally use a set of heuristics about how to interpret HTML in such a way as to get at the "what the content is about" rather than just to search the content itself. For example, matches that show up in

blocks rate higher than matches that don't.

The META tag can be thought of as the earliest HTML progenitor of the current XML metadata projects such as RDF: an attempt to allow an explicit indication as to the "aboutness" of the content, as well as allowing for codifying the content itself.

The problem for the search engines is that they've rightly ascertained that the queries that people wish to issue are essentially semantic in nature (show me all the pages that are about aerosol effects on the ozone layer--even ones that might not, for example, explicitly contain the word "aerosol") but HTML does an extremely poor job of codifying semantics, to say the least, which is why XML exists in the first place. Note that having a traditional database-driven website doesn't solve the problem, either: SQL doesn't do any better at finding "content about aerosol effects on the ozone layer" in the absence of the word "aerosol" than the search engines do.

To be fair, my example presumes that some bright soul is going to add the word "aerosol" to a META tag for a page that doesn't contain the word... or that someone is going to code up some RDF that associates some content, presumably about environmental issues, with the concept of "aerosol." At this point we have to recognize that the solution to the problem consists either of a) a lot of human beings going nuts creating metadata to correlate every reasonably conceivable connection to the content, and/or b) coming up with some awfully impressive automated analogy-building tools to do the same. Before anyone starts snorting at the latter, see the work of Douglas Hofstadter's "Fluid Analogies Research Group" at Indiana University (and read his "Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought," ISBN 0465024750) and also see the FramerD project at <http://www.framerd.org>.

So that's the representation problem. But the META tag theft problem goes much deeper, because it touches on the fundamental question as to who has the rights to information and, perhaps more importantly, the information about the information? As the article linked to points out, it's ultimately all about money--in this case, in the form of "ad impressions." It's one reason (among many) I don't believe in ad-revenue-supported web sites. But what do we do about it?

For some provocative attempts to address the question of "electronic rights" from a technical perspective, see <http://www.erights.org>. For some of the social and economic motivations behind the technology, follow the link to Marc Stiegler's page and look for his "Final Exam."

Paul Snively
<mailto:psnively@earthlink.net>



There are responses to this message:


This page was archived on 6/13/2001; 4:52:01 PM.

© Copyright 1998-2001 UserLand Software, Inc.