Date: Sat, 15 Mar 2003 14:54:27 -0500 (EST) From: "Keith F. Lynch" <kfl at KeithLynch.net> To: agarcia at starbaseunix.org Cc: WSFAlist at keithlynch.net Subject: [WSFA] Predicting who will be at WSFA meetings Reply-To: WSFA members <WSFAlist at keithlynch.net> Thanks again, Tony, for your helping Sam and I scan in graphics from back issues of the WSFA Journal when you were in town last Thursday. And for helping place them online. I'm sorry you're not local, since I know you'd enjoy WSFA. You expressed skepticism that it made any sense to try to predict who will attend what WSFA meetings, when we were discussing that topic. Obviously it isn't possible to do a perfect job of it. The question is how good a prediction *is* possible? Why bother? Mostly because the main reason anyone goes to a meeting is because of the other people expected to be there. It's useful to have as accurate an idea as possible as to who those people are likely to be. Also, as you saw, our secretary circulates a sign-up sheet at each meeting containing names with boxes for check marks next to each one, and space for write-ins. There isn't room to list all 339 people who have been to at least one of the 192 meetings we have attendance records for. About half of those people have only been to one meeting, and most of that half will probably never be to another. If the sheet has room for 40 names, they should presumably be the names of the 40 people most likely to show up. Also, we have access to immense amounts of computer power. Why not put it to use? If it makes sense to use a ton of steel and millions of calories of energy that have been sequestered underground since the Paleozoic, just to get to a WSFA meeting, then why not use more CPU power than it took to put a man on the moon to guess at who is most likely to be there? Why? Just because we can. How to do it? The simplest approach would be to look at what proportion of the meetings in the past year each person attended. If she attended 70% of them, then her chances of attending the next one are probably close to 70%. But why the past one year? If we look at the past two years, that will give us more meetings, and hence a more accurate percentage. On the other hand, if we look at just the past six months, that will more accurately reflect current conditions, since the proportion of meetings a person attends often varies with time. I wonder which single span of time would be most accurate? It would be easy enough to find out. But a better approach, I'm sure, would be a weighted average. Every meeting should count, but meetings further in the past should count less, asymptotically approaching zero weighting for meetings long ago. A negative exponential is easily computed by keeping a running total of meetings attended for each person, but multiplying the total by some constant k before each addition. And then normalizing by dividing the final score by the highest possible score. This sort of scaling law appears in many physical systems. For instance the voltage on a capacitor connected to a resistor, as a function of charge placed on the capacitor at various times in the past. Or the air pressure in a tire with a slow leak as a function of quantity of air pumped into the tire at various times in the past. Or a person's weight as a function of how much they ate and exercised at various times in the past. Each of these systems has some characteristic time constant. What happened in the past has less relevance to the current state than what happened more recently. I wonder what the characteristic time constant of WSFA attendance is? I could simply try different values on the first N meetings, and see which "k" comes closest to "predicting" who was at the N+1st meeting for all values of N. More complicated systems behave similarly, but with two or more time constants, with some arbitrary weighting for each. For instance the concentration of a medicine in one's blood as a function of how much was taken and how long ago. Or the total amount of air in two tires with different-sized leaks. Generalize this to a large number of time constants, each with its own weighting, and you've reinvented the Laplace transform. If you let these numbers be complex rather than real, you've reinvented the Fourier transform, and are well on your way to Kalman filtering and wavelet transforms. Things are complicated by the fact that we alternate between meetings at two locations, and many people are more likely to show up to meetings at one location than the other. (See the cross-plot at http://www.wsfa.org/journal/j02/4/#aar) I also have (or can easily get) weather information for all meeting dates. Perhaps I should check to see if there are any fen who are less likely (or more likely) to show up during bad weather. This is where I usually bog down, coming up with more and more elegant approaches for modeling and analyzing WSFA attendance, but never actually *doing* anything much beyond collating and presenting the raw numbers. Since it's approaching the time of year at which I do that collating and presenting, maybe I should try and find the time to start programming, and do the analysis. With 143 people who have attended three or more of 192 meetings I have data for, a total of 5149 data points, I finally have a fair amount of data to play with. And since I've finally pushed back the online Journals to before attendance was recorded, I won't have significantly more data if I wait another year. (Any WSFAn who replies: Please note that Tony is not on the list, so please be sure to CC him, agarcia at starbaseunix.org. Thanks.) -- Keith F. Lynch - kfl at keithlynch.net - http://keithlynch.net/ I always welcome replies to my e-mail, postings, and web pages, but unsolicited bulk e-mail (spam) is not acceptable. Please do not send me HTML, "rich text," or attachments, as all such email is discarded unread.