To: WSFAlist at keithlynch.net Date: Sat, 11 Jan 2003 06:33:11 -0500 Subject: [WSFA] Re: John(s) Hopkin(s), newsgroups, and Google From: ronkean at juno.com Reply-To: WSFA members <WSFAlist at keithlynch.net> On Tue, 7 Jan 2003 23:32:37 -0500 (EST) "Keith F. Lynch" <kfl at keithlynch.net> writes: > ronkean at juno.com wrote: > > I get 705,000 hits for John Hopkins University (not in quotes), > and > > 80,800 for "John Hopkins University" (phrase in quotes). "Johns > > Hopkins University" (in quotes) gives 764,000, and Johns Hopkins > > University (not in quotes) yields 931,000 hits. > > You're obviously doing a web search, not a newsgroup search. > Yes, it was a web search, because that was (I think) the type of search which was originally being discussed re 'John Hopkins University'. Earlier discussion had concerned newsgroup searches. > Without the quotes, it finds any page containing all of the words > in any order, which isn't very interesting here. Yes, that is how it is supposed to work. But since the original comment did not specify whether the search was done using quotes, I did the search both ways. > > For web searches (with quotes), I get: > > "Johns Hopkins University". 715,000 > "John Hopkins University". 79,700 That's close to what I got. Whether something is considered spam has nothing > whatsoever to do with whether it's commercial or off-topic or > fraudulent, but only with how much of it there is. > 'How much' there is of a given mailing, in the context of point to point email, would be a function of how many individual addresses the mailing is sent to. 'How much' from the point of view of an email user being bothered by spam in general, would be how many spam messages per day they receive. 'How much' from the point of view of an email user complaining about a particular mailing or family of mailings would be related to how may copies that user receives of a given message, or very similar messages, or perhaps how many unwanted messages that user gets from the same sender. With newsgroups, posting a spam to a newsgroup is sending it to just one address, in the hope that it will be seen by many people who use the newsgroup. So a cancelbot which uses 'how much' as a criterion would be looking at multiple posting and cross posting as an indicator. But there seems to be pitfall with that approach. If the cancelbot is not sensitive to blocks of text within messages being repeated across postings, then spammers could simply make slight alterations in each message which is sent out as a cross posting or a multiple posting, to evade the cancelbot. But if the cancelbot is sensitive to such repeated blocks, then legitimate users of the newsgroups who quote passages from earlier messages, accompanied by dozens of other legitimate users who quote some of the same passages, within the 45 day period, would trigger the cancelbot to cancel subsequent legitimate messages quoting those same passages. > > But my own web page has been up for nearly a year, and google has > > apparently not indexed it. > > Do any other pages have pointers to it? Google has to be able to > find it somehow. > As far as I know, there are no links to it anywhere. So that would explain it. Ron Kean . ________________________________________________________________ Sign Up for Juno Platinum Internet Access Today Only $9.95 per month! Visit www.juno.com