Archive of UserLand's first discussion group, started October 5, 1998.
Some advise for newschannel tool writers...
Author: Fredrik Lundh Posted: 7/13/1999; 1:13:15 PM Topic: Some advise for newschannel tool writers... Msg #: 8454 Prev/Next: 8453 / 8455
After looking closely at our server stats, I'm beginning to suspect that weblogs should be distributed via NNTP instead of HTTP... In the meantime, here some things to consider when you design your own newschannel robot:
- Download the channel files once per hour, or less. Don't assume that you're the only one polling for files... (In our case, one robot grabs all the files every ten minutes -- several megabytes per week, that is. Doesn't really scale...)
- Consider downloading either the RDF or the ScriptingNews (XML) version, but not both.
- If you have an HTTP/1.0 client, consider using a HEAD request first, and only download the body if the Last-Modified tag has changed.
- If you have an HTTP/1.1 client, consider using If-Modified-Since. And if you're grabbing multiple channels of the same site, consider using a persistent connection.
- Don't download files that don't exist -- if you keep getting 404's, stop looking for that file. You can at least wait an hour or two (not a second or two like one certain robot...). And if you get a 301, update your channel list.
thanks /F (just another newschannel provider)
There are responses to this message:
- Re: Some advise for newschannel tool writers..., Ian Davis, 7/14/1999; 4:29:58 AM
This page was archived on 6/13/2001; 4:51:23 PM.
© Copyright 1998-2001 UserLand Software, Inc.