Archive of UserLand's first discussion group, started October 5, 1998.
Re: Junk in URLs and link-rot
Author: Jorn Barger Posted: 3/9/1999; 5:12:42 PM Topic: Unique vs. Generic URLs Msg #: 3857 (In response to 3835) Prev/Next: 3856 / 3858 
So, maybe you could explain more what you have in mind for the web-based database, and maybe someone around here could build it.Happy to!
First, let's limit ourselves to sites that regularly publish new material (which I'll call webzines, for now). Each of these has its own approach to when and where it announces new stories, and how it archives them, and how it arranges URLs and frames, and how it constructs URLs, and how its search-function works.
So adding new zines to your daily surfing routine is a pain in the butt! You have to painstakingly work out all these pieces, often by trial and error. So most people end up limiting their surfing to only a few sites they visit routinely...
But if there was a central web database that spelled out all these details in a machine-readable format, and if your browser could read that format, then ideally you could:
- view a checklist of webzines and 'subscribe' to the ones that you want to check out.
- the browser would know when and where to look for new material
- it would know how to create a custom headlines-aggregation of all the new material in the zines you like best, even extracting the lead paragraph if you prefer this format
- if URLs are hidden inside frames, it would know how to extract them, so you could have 'smart bookmarking' that ignored the 'volatile' URLs.
- in cases where the URLs change when the articles are moved to the archive, this is usually according to some rule that could be encoded with a system like regexps. 'Smart bookmarks' might even remember both forms, and understand when they'll change.
- if you get a 404 because a link wasn't 'smart' (or because a site decides to relocate everything), the same regexp-like rules could be applied to find where it's gone.
- if it's just lost, the database should know how to format a search that will uncover it-- by filling in whatever bits you know about its content, date, author, etc.
In my weblog I've been experimenting with "More" links that are generated by a Frontier script when I link an article. I just have to pick a few key phrases from the article, and my "More" links currently can only go to AltaVista, but I'm learning to choose phrases that produce pure gold, even via clumsy old AV. If the 'smart bookmark' was smart enough to help pick these phrases, and also to know how to send them to the zine's own search engine if there was one, then this would add another layer of 'linkrot insurance'.
In my parsing browsers essay I compare all this to a netnews "newsrc" (which specifies which topical newsgroups you've subscribed to) plus killfile, where you subscribe, and say what you like and dislike, and all the other scutwork is then handled automatically.
There are responses to this message:
- Re: Junk in URLs and link-rot, Dave Winer, 3/9/1999; 7:28:55 PM
This page was archived on 6/13/2001; 4:48:35 PM.
© Copyright 1998-2001 UserLand Software, Inc.