Archive of UserLand's first discussion group, started October 5, 1998.

Re: I Require Permission

Author:Jonathan Eisenzopf
Posted:9/2/1999; 12:29:24 PM
Topic:Automated deep linking
Msg #:10471 (In response to 10468)
Prev/Next:10470 / 10472

You know, this is a really interesting line of reasoning. As I read it, I was immediately reminded of the same issues Network Solutions has had to deal with.

As most of you already know, Network Solutions handles registrations for all .com .net and .org Internet domains. They also offer a WHOIS database that lists information on domains. So if I do a whois lookup for scripting.com, I'm able to get Userland Software's snail mail address, Dave's email address @well.com, and his phone number.

This is valuable information for marketeers and Internet software products. For example, some Web log analysis tools are able to break down hits by location, i.e. City, State, ZIP, etc.. How do you think they do that? That kind of information is not in the Web log. Well, the answer is that the software company probably trolls the WHOIS database to collect this contact information, and matches the IP address in the Web log with the domain information from Network Solutions.

So you see, in the example above, someone is profiting from the WHOIS information. Obviously, while NS has no problem with offering a public WHOIS database, they do have a problem with companies using this information for profit. From their perspective, NS should be paid some sort of licensing fee for use of the information, and they're right to do so. In fact, they now display a message that prevents companies from "trolling" the WHOIS database.

The RSS issue is similar, however, it is a distributed kind of trolling. Furthermore, most sites that you scrape will receive ad revenue every time a users visits the link. So when we say "scrape", we really mean "meta-scrape" because you're not actually grabbing the story, just the headline and a link to the story. So, most Web sites should be thrilled that other companies are doing things that will give them more revenue. But not always.

What happens if bignakedbacksides.com places a CNN newsfeed right next to miss lolita's February spread? People who visit bignakedbacksides.com and see the CNN news feed may assume that CNN had given them permission to display the headlines on their site. Mr. Ted Turner may not be a pillar of moral fortitude, but he's probably not keen on the idea of people associating CNN with Joe's Mega Pron Palace.

So, from a publisher's perspective, allowing anyone to scrape their headlines is not a good idea. Not only must they worry about who's scraping the headlines, but they also have to be concerned about where those headlines are going (like Joe's pron site). As an example, I recently developed a product for Internet.com that allows a Web site to display news headlines from internetnews.com. The Web site is clear about acceptable usage: http://webreference.com/headlines/nh/.

So where do we go from here? Well, don't take this issue for granted first of all. Most publishers are dead serious about protecting their content, and for some very good reasons including the example above. There are several ways to go about it, but the bottom line is, if a business is grown out of scraping headlines without concern for permissions, not only will the aggrigators get shut down, but we may start to see publishers become more restrictive with their content. And there is precedence for it; remember sidewalk.com and Ticketmaster?

Secondly, whether you are scraping a site or grabbing their RSS file, unless they've already given explicit permission to do so on their Web site, you should contact the Webmaster, tell him/her what you would like to do, and ask them for permission to use their headlines. If you want to be really sure, send them a memo of understanding reiterating the permissions that have been granted to you.




There are responses to this message:


This page was archived on 6/13/2001; 4:52:22 PM.

© Copyright 1998-2001 UserLand Software, Inc.