Archive of UserLand's first discussion group, started October 5, 1998.

Re: Scripts to produced Channel files?

Author:Jamie Scheinblum
Posted:6/17/1999; 9:00:50 AM
Topic:Scripts to produce Channel files?
Msg #:7510 (In response to 7486)
Prev/Next:7509 / 7511

Well, here's a prelim version. I'll have a better version after I see Dave's example. Then we can get in feature/syntax sync. Don't use this version, its not feature complete, and will probably get angry about syntax... A better/real version is forthcoming.

Dave: how much of the scripting news header do I need to implement to stick to the spec?

Here's the input file: -- Hello this is a text document Hello Do you like text documents?

hello this is another text document script stuff --

Here's the output:

-- $ perl parse.pl input.txt Hello this is a text document Do you like text documents? http://www.cnn.com<;/URL> Hello hello this is another text document stuff http://www.scripting.com<;/URL> script --

And here's the source so far...

-- use HTML::Parser;

### Copyright 1999 Jamie Scheinblum ### Jamie@networked.org ### 6/17/99 ### Working source-code, not for re-distribution

### What do we use to mark the end of a paragraph? ### Make this a regular expression to match

my ($article_mark) = '^

$';

###

{ package Parse; @ISA = qw(HTML::Parser);

my (%link); my ($cur_url); my ($look_for_text) = 0; my ($doc_text);

sub get_links { return %link; }

sub get_doc_text { return $doc_text; }

sub clear { $doc_text = ""; %link = {}; $look_for_text = ""; }

sub start { my ($this) = shift; my ($tag, $attr, $attrseq, $origtext) = @_; if ($tag eq "a") { $cur_url = $attr->{href}; $look_for_text = 1; } }

sub text { my ($this) = shift; my ($text) = shift;

if ($look_for_text == 1) { $link{$cur_url} .= $text; } else { $doc_text .= $text." "; }; }

sub end { my ($this) = shift; my ($tag,$orig) = @_;

if ($tag eq "a") { $look_for_text = 0; } }; }

my $parser = Parse->new;

print "n"; print "http://www.scripting.com/dtd/scriptingNews.dtd">n"; print "\n";

foreach my $input_file (@ARGV) { ### For each file on the commandline, process the file

open(INPUT, $input_file) || die "$! : $input_file\n";

### Read the file while () { ### Strip returns s/n//; s/r//;

if (/${article_mark}/) { &item;

} else { $parser->parse($_); } } close(INPUT); &item; } print "\n";

sub item { ### New article time my $hash = Parse->get_links(); ### Now print out the xml tags

print "\t\n"; print "\t\t",Parse->get_doc_text(),"\n";

foreach my $key (keys(%{$hash})) { print "\t\t\n"; print "\t\t\t$key\n"; print "\t\t\t$hash->{$key}\n"; print "\t\t\n"; }

print "\t\n"; Parse->clear(); }


There are responses to this message:


This page was archived on 6/13/2001; 4:50:53 PM.

© Copyright 1998-2001 UserLand Software, Inc.