Archive of UserLand's first discussion group, started October 5, 1998.

Re: An RSS categorization proposal

Author:David A. Mundie
Posted:9/10/1999; 1:32:16 AM
Topic:rss channels via email
Msg #:10856 (In response to 10850)
Prev/Next:10855 / 10857

You write: "The problem with those controlled vocabularies produced by library science is that it more-or-less takes library scientists to accurately classify a given item." I understand where you’re coming from. The OCLC, which maintains DDC, views their customers as being professional librarians, and the materials available are geared towards them. This is a recognized problems, and there are projects underway to make the materials more accessible.

That said, it’s my personal opinion that this problem is more apparent than real. I am not a "library scientist" by any stretch of the imagination, but I was happily cataloguing away with the abridged version of the schedules within an hour. *Any* systematic classification scheme is going to have to have rules that are followed.

"LOC & DDS work great, but only when the sorters are intelligent (in this limited definition of the term :-) ). I think the systems will crash and burn when the general public has to do it."

If you mean that the general public will make more classification errors than professional librarians, I’d certainly agree, but I don’t think that’s the issue. The only issue is whether the "general public" will do a better job *with* the DDC guidelines than without them, and to that I strongly feel the answer is yes. Your "crash and burn" metaphor suggests that somehow misuse of the DDC will produce disastrously incorrect classifications, and I just don’t think that’s the case.

"...libraries may move a lot of material, but it tends towards the static..." I’m sorry, I don’t understand what frequency of updating has to do with subject classification. For a hundred years librarians have been *required* to come up with a classification for every document that has been thrown at them, and have succeeded. I don’t understand why the answer to "What is this document about?" has a different answer if the document has a lifetime of ten minutes rather than five hundred years.

"This would produce some sort of computer classification system (probably expressed as "degree of relationship" to some expert-human chosen baselines (large numbers of them)"

Let me make sure I understand you here. The criterion for the success of this program will be how closely we manage to re-create the existing Dewey decimal system?

"... a more human classification system would need to be projected, but that's mostly sweat-of-the-brow work, not an insoluble problem."

Have you actually used any of the "automatic classification" systems out there? In my experience the results are uniformly discouraging, to put it mildly. The problem is AI-complete: when we have true artificial intelligence, then we will be able to teach machines to use Dewey, but not before.

In any event, I really don’t understand why one would want to embark on this vast project of bottoms-up, empirical, synthetic research, only to end up with what we already have today: an eminently usable "human classification system", as you put it - i.e., the DDC.




This page was archived on 6/13/2001; 4:52:32 PM.

© Copyright 1998-2001 UserLand Software, Inc.