Date: Tue, 7 Jan 2003 23:32:37 -0500 (EST)
From: "Keith F. Lynch" <kfl at keithlynch.net>
Subject: [WSFA] John(s) Hopkin(s), newsgroups, and Google
To: WSFAlist at keithlynch.net
Reply-To: WSFA members <WSFAlist at keithlynch.net>

ronkean at juno.com wrote:
> I get 705,000 hits for John Hopkins University (not in quotes), and
> 80,800 for "John Hopkins University" (phrase in quotes).  "Johns
> Hopkins University" (in quotes) gives 764,000, and Johns Hopkins
> University (not in quotes) yields 931,000 hits.

You're obviously doing a web search, not a newsgroup search.

Without the quotes, it finds any page containing all of the words in
any order, which isn't very interesting here.

For web searches (with quotes), I get:

"Johns Hopkins University". 715,000
"John Hopkins University".   79,700
"Johns Hopkin University"       229
"John Hopkin University"        305

>> ... New York fan Seth Breidbart is the person who formally defined
>> newsgroup spam for purposes of determining what should be canceled.
>> Look up "Breidbart Index".)

> That seems to be an index of 'excessive multiple posting' within a
> 45 day period.

It's a weighted combination of multiple posting and cross-posting,
the latter being considered not as bad.  The definition is completely
content-neutral.  Whether something is considered spam has nothing
whatsoever to do with whether it's commercial or off-topic or
fraudulent, but only with how much of it there is.

> But my own web page has been up for nearly a year, and google has
> apparently not indexed it.

Do any other pages have pointers to it?  Google has to be able to
find it somehow.

As I've mentioned, Google has never indexed the web pages which
contain the archives of this list.  That's because there are no
pointers to those pages.  The URL was mentioned in a WSFA Journal,
but I snipped it out of the online version of that Journal, since the
consensus was to keep the archives private, and that's the only way
to do so.

> I had thought that google searches for phrases in quotes would seach
> for the exact phrase only, case insensitive, but it seems that in
> this case the elves at google have set it up so a search for "John
> Hopkins University" also includes results for "Johns Hopkins
> University".

Not always, since otherwise the former would have had the same or more
hits than the latter.  But Google is definitely doing something, since
all four searches find http://www.jhu.edu/, which definitely does not
contain the singular John or Hopkin, not even in hidden comments or
meta tags.

Following up on the questions of newsgroup volume, I decided to track
mentions of "Worldcon" by year.  There's one every year, and it's been
of roughly constant size for the past quarter century, so variations
in how much it's mentioned ought to track total newsgroup volume.  Or
rather, the fannish subset of it.  Or at least that part which has
been archived.  Or something.  Anyhow, here are the numbers:

2002   4240
2001   6720
2000   4720
1999   4220
1998  42400
1997   4340
1996   5630
1995   2790
1994   1880
1993   2210
1992    576
1991    441
1990    279
1989    159
1988      2
1987      3
1986     22
1985     26
1984     25
1983     35
1982     21
1981      4

It would appear that 1987 and 1988 were poorly archived, since the
volume of discussion certainly didn't decrease.

1998's number is wildly out of proportion.  When I tried to pin
it down, it shrank.  By halves, 1998 had 2440 and 8170 postings
containing "Worldcon", not 42,400.  By quarters, 1070, 1370, 7510, and
664.  I guess I ran into one of the rare bugs in Google.  (Much of the
third quarter excess is due to a rec.arts.sf.fandom thread with 2940
posts, whose subject line contained "Weapons at Worldcon".  That's not
what was actually talked about, except during the first few postings.
People reply and don't bother to change the subject line, even long
after the topic has changed, and branched, and changed again.)

The earliest online fandom wasn't in the newsgroups, but in the
SF-LOVERS email list, whose archives have survived intact since
September 1979.  On that list Worldcon was mentioned 47 times in 1980,
25 times in 1981, and 22 times in 1982.  The first mention of Worldcon
was on April 25th 1980.  Curiously, the first mention of Disclave was
the previous day, April 24th 1980!
--
Keith F. Lynch - kfl at keithlynch.net - http://keithlynch.net/
I always welcome replies to my e-mail, postings, and web pages, but
unsolicited bulk e-mail (spam) is not acceptable.  Please do not send me
HTML, "rich text," or attachments, as all such email is discarded unread.