Archive of UserLand's first discussion group, started October 5, 1998.

Category Generation

Author:Paul Snively
Posted:8/11/2000; 11:50:59 AM
Topic:Category Generation
Msg #:19651
Prev/Next:19650 / 19652

Dave Winer: Also last night I finally read Chris Locke's interview with Ian Clarke on Feed. Everyone's been pointing to it, and to hear the comments, you'd think Clarke was some kind of raving lunatic. Nothing could be further from the truth. I was struck by a question at the end of the piece. "So it's something more like Yahoo than Altavista," Locke said. "Yes. That's exactly what it'll do. It'll be like a Yahoo directory. But it'll actually figure out the categories for itself, rather than rely on anyone to tell it what the categories are." Personally, I have a lot of trouble believing that. I think the secret to routing around Yahoo and DMOZ is to give a good hierarchy editing tool to millions of Internet users, and let the cream rise to the top.

It's not clear to me that this isn't exactly what Clarke is describing as well, it's just that FreeNet might help automate the process by factoring in people's actions in relating categories. This is far from a weak idea; it shows up explicitly in The Open GRiD Project, for example, and Group Method of Data Handling approaches to neural networks.

Lots of people, especially in the symbolic AI tradition, have attempted to come up with collaborative approaches to categorization. This has almost always taken the form of determining what the appropriate ontology to start with is, a problem that dates at least to Aristotle. Cyc is probably the best-known such effort, thanks in no small part to their having made publicly available their Upper Cyc Ontology, which has evolved over a span of roughly a decade and a half. One of the lessons from Cyc (as expounded upon in various of Doug Lenat's books and papers) is that without computational assistance of some kind, large-scale human editing of categorical hierarchies doesn't work: you end up with too many redundancies/not enough dispersion/sloppy reasoning about kind-of/is-a relations and the like.

This is probably less of an issue given a relatively small number of applicable categories and so it might be tempting to try it with, say, musical genres. But even so I think an automated statistical approach would be better than even having, say, 100 people figure out what genre(s) Peter Gabriel or the Dave Matthews Band belong in.




This page was archived on 6/13/2001; 4:56:03 PM.

© Copyright 1998-2001 UserLand Software, Inc.