Archive of UserLand's first discussion group, started October 5, 1998.

Attempted clear statement of D.Linking/Scraping controversy

Author:Jeremy Bowers
Posted:9/2/1999; 7:53:27 AM
Topic:Automated deep linking
Msg #:10450 (In response to 10444)
Prev/Next:10449 / 10451

The nice thing about copyright law is that once permission is granted, all the issues about "scraping" go away. If you permit it, it's not a problem.

By publishing a public RSS file, you authorize (more or less) that amount of content to be scraped (at least to those you grant access, which is often everybody). Additionally, it gives you control over the process (you might "neglect" to include an article of purely local interest, like a local intranet global password rotation), which probably benefits everyone not to wade through that information of no interest.

Same goes for everything... annotation, scraping, deep linking, wholesale copying and distributing (many Unix man files are one the web in several hundred places, mostly schools). If you've got permission, you can do it.

So, while "scraping" may produce the same information as a hypothetical RSS file, it cannot produce the same permission, which is the heart of controversy. Is it my right to scrape your site, or do I require your permission? If the answer is somewhere in between, where is the socially accepted line? (There needs to be one, so my lawyers can be satisfied I'm not crossing it.)

A lot of the power of RSS is that not everybody wants to completely control stuff. It provides a wonderful, open distribution technique for anyone to mirror information in a timely manner, with permission. Great for news, announcements, etc.

The issue comes down, as it so often does in these cases, to quantity. Again, I like to think of things as between two extremes:

Where is the line drawn? A link to everything I consider importent? A link to everything I consider importent, with a brief, hand-written summary? ... with a computer generated summary? ... with the first sentance? ... with the first paragraph? ... with the entire article available from my server and not theirs?

Deep linking issues can be subsetted as a case of scraping, where deep linking is the third "instance" of scraping:

  1. No links.
  2. Linking to the home page.
  3. Linking deeply.
  4. Linking deeply, repeatedly.
  5. Linking deeply, catching an entire class of links (such as "todays news stories".
  6. Linking as above, with summaries of an entire class.
  7. Linking as above, with large summaries of an entire class.
  8. Simply mirroring all content of interest.

As I understand RSS (correct me if I'm wrong), RSS facilitates #6. With permission, nothing is wrong, not even #8. How far can you go without permission?

(Interestingly, despite the fact I constructed this list out of my head, I can actually name an instance under which you are banned from #1: You are not allowed to not link to Geocities, if you use their free web page service. Kinda interesting, huh?)

Anyhow, under more normal circumstances, I think (without posted justification) that you can go as far as 4 without a problem (note "repeatedly" means "without any apparent system", where "I like these articles" [the way Slashdot and Scripting News works] is not an apparent system), but after that, the debate becomes two sided, and that's where I see the issue. In the end, all law on this matter means to benefit the public/society, but what set of protections and shared rights enhance this goal?

The old story of "If I haven't got a motivation, I won't do it" still holds. Do we need to protect those who create content that bundles nicely? If people are allowed to simply link, even systematically, I doubt it will really demotivate people. But, as the summary (a continuous value, which lawyers don't deal with well in many cases) gets larger and larger, the case gets better and better for banning it, so that there still exists a motivation for creating this content in the first place.

In the end, an arbitrary line needs to be drawn, satisfying as many people as possible, and benefitting the public as much as possible (which saving us a lot of lawsuits). There is no "right" line, just as there is no "right" line for differentiating fair use. Society just sets something that works, and we all live by it, since it's cheaper then litigating every dispute.

So many lines being blurred in this last few years... when does MoreOver.com cease being like a search engine on steroids, and become a mirroring service of some sort? Beats me, but it matters a lot.

(Sorry if this is overly verbose, but I find this stuff very interesting, and clearly stating the issue is always a help to me [if not anybody else].)


There are responses to this message:


This page was archived on 6/13/2001; 4:52:21 PM.

© Copyright 1998-2001 UserLand Software, Inc.