Date: Mon, 6 Jan 2003 23:22:00 -0500 (EST) From: "Keith F. Lynch" <kfl at KeithLynch.net> To: WSFAList at KeithLynch.net Subject: [WSFA] Re: baseballs, priests, and newsgroups Reply-To: WSFA members <WSFAlist at keithlynch.net> ronkean at juno.com wrote: > Can anyone calculate terminal velocity? I suppose the question is: > how fast a wind would it take to exert 1.42 newtons of force on a 43 > cm^2 cross section? Not enough information. The surface makes a big difference. > Apparently, molestation within the Catholic Church has been going on > for many years, but it has only been in the past year or so that the > resulting scandal has come to the forefront of public attention. Yes, it was the public attention I was interested in. I had thought it had had that attention for years, certainly long before Joe's death, but apparently there was a big increase in that attention about a year ago. > A big drawback, though, is that the results will be highly dependent > on how the search is formulated, and properly formulating the search > terms might require much trial and error and reading of sampled > messages to verify the relevancy of chosen search words. Very true. For instance there was the journalist who reported that there were many thousands of satanic cults on the Internet. He had done a search for all web pages containing either "satanic" or "cult," and assumed that all such pages were home pages for distinct groups of Satan worshippers. One thing I didn't mention about the 107 messages containing both "Disclave" and "flood" is that the first such message, with a quite plausible subject of "Re: ASB in Fandom (was Re: Report: Disclave)" (ASB refers to the group which caused the flood) was dated 1992, five years before the flood! > Another drawback is that while that research tool may show to what > extent people are discussing a given topic (and especially how that > has changed over time), it does not show what they are saying about > the topic. You would have to sample the messages to characterize > what is being said. Right. Fortunately, anyone can do so. Want to see what those 107 messages said? Go look! > Perhaps Keith can readily answer these questions: About how many > distinct newsgroups are there which are reasonably active (at > least a hundred messages per month), how far back do the newsgroup > archives go, and has the use of newsgroups generally, as measured > by the number of messages posted, been increasing, stagnating, or > falling off, in recent years? Those are very difficult questions. There are tens of thousands of newsgroups, but it's not clear how many are "reasonably active" by your definition. I could have answered this a few years ago, but ISPs all now have their newsfeed on one machine and their shell on another, meaning I can't use standard Unix tools to get an answer. I would have to write and install a variant newsreader program. Also, many newsgroup messages are spams. Many others are spam cancels. Users normally see neither, as they cancel each other out. The newsgroup spam problem, unlike the email spam problem has been largely solved thanks to these cancels. (It's a problem for ISPs, who have to bear all the excess traffic, but not for Google, which does not archive them, nor for users, who usually never see them.) (Aside: New York fan Seth Breidbart is the person who formally defined newsgroup spam for purposes of determining what should be canceled. Look up "Breidbart Index".) Many newsgroup messages are crossposted, i.e. exist in two or more newsgroups at once. Only one copy of the message is stored by each ISP or by Google, and only one copy of the message is seen by any user, even if he reads all messages in all the groups it's crossposted to. Google says they archive (and fully index) more than 700 million newsgroup postings. DejaNews started archiving in early 1995. In early 2001, DejaNews (by then Deja.com) went out of business, and sold their archive to Google. In late 2001 Google extended the archive back to mid-1981 thanks to Toronto fan Henry Spencer, who had archived much of the early material on tape, and also thanks to Usenet feeds which were available via CD-ROM for a while in the late 80s and early 90s. The message numbers on Panix, one of the first ISPs, are probably a good estimate of how many postings each newsgroup has had since it began. Rec.arts.sf.fandom has had over 600,000 (of which Google has archived about 375,000), and rec.arts.sf.written should hit a million in a few months (of which Google has archived 567,000). For comparison, the old SF-LOVERS email list had about 500,000 from 1979 through the end of 2000, and this list has had about 3700 in just under 11 months. > It might be interesting to compare the annual message volume of all > newsgroups with the corresponding data for some of the large email > list services, such as yahoo and topica. The S.M. Stirling Yahoo list has had 68,000 postings, of which about 3000 were from Stirling himself. I haven't checked the other Yahoo groups. The Lois Bujold list (not a Yahoo or Topica list) doesn't have message numbers, but seems to currently average about 3000 messages per month. In addition to the more than 700 million newsgroup postings, Google also indexes over 3 billion web pages, something it's much better known for. And they do it *quickly*, too. They're already indexed the November 1992 WSFA Journal, which I just put up yesterday, and didn't tell anyone about. (Googling for "drunken badgers" will find it.) (They don't, however, index the archives of this list, since I carefully make sure there are no links they can follow.) If I had more time, it might be interesting to compare the numbers of web page hits for various stfnal terms to the numbers of newsgroup posting hits. Would the cross-plot fall on a straight line on log-log paper? What would be the significance of terms which diverged strongly from that line, i.e. which were mentioned a lot in newsgroups but hardly at all on the web, or vice versa? Sometimes I wish I had a copy of all these archives, immortality, and a time machine. I'd go back to one of the warmer and less carnivore-infested periods of earth's pre-history, and spend a few eons just reading it all. On second thought, immortality may be all I need. Even if I were to only read one message per year, and even if the volume continues to increase exponentially forever, I will eventually catch up. Every message, after all, can be given a unique message number N, and I'll read message N in year N. So by the year infinity, I'll be fully caught up, despite having fallen further and further behind up until then. That's math for you. -- Keith F. Lynch - kfl at keithlynch.net - http://keithlynch.net/ I always welcome replies to my e-mail, postings, and web pages, but unsolicited bulk e-mail (spam) is not acceptable. Please do not send me HTML, "rich text," or attachments, as all such email is discarded unread.