To: WSFAlist at keithlynch.net
Date: Sat, 11 Jan 2003 06:33:11 -0500
Subject: [WSFA] Re: John(s) Hopkin(s), newsgroups, and Google
From: ronkean at juno.com
Reply-To: WSFA members <WSFAlist at keithlynch.net>

On Tue, 7 Jan 2003 23:32:37 -0500 (EST) "Keith F. Lynch"
<kfl at keithlynch.net> writes:
> ronkean at juno.com wrote:
> > I get 705,000 hits for John Hopkins University (not in quotes),
> and
> > 80,800 for "John Hopkins University" (phrase in quotes).  "Johns
> > Hopkins University" (in quotes) gives 764,000, and Johns Hopkins
> > University (not in quotes) yields 931,000 hits.
>
> You're obviously doing a web search, not a newsgroup search.
>

Yes, it was a web search, because that was (I think) the type of search
which was originally being discussed re 'John Hopkins University'.
Earlier discussion had concerned newsgroup searches.

> Without the quotes, it finds any page containing all of the words
> in any order, which isn't very interesting here.

Yes, that is how it is supposed to work.  But since the original comment
did not specify whether the search was done using quotes, I did the
search both ways.

>
> For web searches (with quotes), I get:
>
> "Johns Hopkins University". 715,000
> "John Hopkins University".   79,700

That's close to what I got.

  Whether something is considered spam has nothing
> whatsoever to do with whether it's commercial or off-topic or
> fraudulent, but only with how much of it there is.
>

'How much' there is of a given mailing, in the context of point to point
email, would be a function of how many individual addresses the mailing
is sent to.  'How much' from the point of view of an email user being
bothered by spam in general, would be how many spam messages per day they
receive.  'How much' from the point of view of an email user complaining
about a particular mailing or family of mailings would be related to how
may copies that user receives of a given message, or very similar
messages, or perhaps how many unwanted messages that user gets from the
same sender.  With newsgroups, posting a spam to a newsgroup is sending
it to just one address, in the hope that it will be seen by many people
who use the newsgroup.  So a cancelbot which uses 'how much' as a
criterion would be looking at multiple posting and cross posting as an
indicator.

But there seems to be pitfall with that approach.  If the cancelbot is
not sensitive to blocks of text within messages being repeated across
postings, then spammers could simply make slight alterations in each
message which is sent out as a cross posting or a multiple posting, to
evade the cancelbot.  But if the cancelbot is sensitive to such repeated
blocks, then legitimate users of the newsgroups who quote passages from
earlier messages, accompanied by dozens of other legitimate users who
quote some of the same passages, within the 45 day period, would trigger
the cancelbot to cancel subsequent legitimate messages quoting those same
passages.

> > But my own web page has been up for nearly a year, and google has
> > apparently not indexed it.
>
> Do any other pages have pointers to it?  Google has to be able to
> find it somehow.
>

As far as I know, there are no links to it anywhere.  So that would
explain it.

Ron Kean

.

________________________________________________________________
Sign Up for Juno Platinum Internet Access Today
Only $9.95 per month!
Visit www.juno.com