Date: Sat, 15 Mar 2003 14:54:27 -0500 (EST)
From: "Keith F. Lynch" <kfl at KeithLynch.net>
To: agarcia at starbaseunix.org
Cc: WSFAlist at keithlynch.net
Subject: [WSFA] Predicting who will be at WSFA meetings
Reply-To: WSFA members <WSFAlist at keithlynch.net>

Thanks again, Tony, for your helping Sam and I scan in graphics from
back issues of the WSFA Journal when you were in town last Thursday.
And for helping place them online.  I'm sorry you're not local, since
I know you'd enjoy WSFA.

You expressed skepticism that it made any sense to try to predict who
will attend what WSFA meetings, when we were discussing that topic.

Obviously it isn't possible to do a perfect job of it.  The question
is how good a prediction *is* possible?

Why bother?  Mostly because the main reason anyone goes to a meeting
is because of the other people expected to be there.  It's useful to
have as accurate an idea as possible as to who those people are likely
to be.  Also, as you saw, our secretary circulates a sign-up sheet at
each meeting containing names with boxes for check marks next to each
one, and space for write-ins.  There isn't room to list all 339 people
who have been to at least one of the 192 meetings we have attendance
records for.  About half of those people have only been to one
meeting, and most of that half will probably never be to another.
If the sheet has room for 40 names, they should presumably be the
names of the 40 people most likely to show up.

Also, we have access to immense amounts of computer power.  Why not
put it to use?  If it makes sense to use a ton of steel and millions
of calories of energy that have been sequestered underground since the
Paleozoic, just to get to a WSFA meeting, then why not use more CPU
power than it took to put a man on the moon to guess at who is most
likely to be there?  Why?  Just because we can.

How to do it?

The simplest approach would be to look at what proportion of the
meetings in the past year each person attended.  If she attended 70%
of them, then her chances of attending the next one are probably close
to 70%.

But why the past one year?  If we look at the past two years, that
will give us more meetings, and hence a more accurate percentage.
On the other hand, if we look at just the past six months, that will
more accurately reflect current conditions, since the proportion of
meetings a person attends often varies with time.

I wonder which single span of time would be most accurate?  It would
be easy enough to find out.

But a better approach, I'm sure, would be a weighted average.  Every
meeting should count, but meetings further in the past should count
less, asymptotically approaching zero weighting for meetings long ago.
A negative exponential is easily computed by keeping a running total
of meetings attended for each person, but multiplying the total by
some constant k before each addition.  And then normalizing by
dividing the final score by the highest possible score.

This sort of scaling law appears in many physical systems.  For
instance the voltage on a capacitor connected to a resistor, as a
function of charge placed on the capacitor at various times in the
past.  Or the air pressure in a tire with a slow leak as a function
of quantity of air pumped into the tire at various times in the past.
Or a person's weight as a function of how much they ate and exercised
at various times in the past.  Each of these systems has some
characteristic time constant.  What happened in the past has less
relevance to the current state than what happened more recently.

I wonder what the characteristic time constant of WSFA attendance is?
I could simply try different values on the first N meetings, and see
which "k" comes closest to "predicting" who was at the N+1st meeting
for all values of N.

More complicated systems behave similarly, but with two or more time
constants, with some arbitrary weighting for each.  For instance the
concentration of a medicine in one's blood as a function of how much
was taken and how long ago.  Or the total amount of air in two tires
with different-sized leaks.

Generalize this to a large number of time constants, each with its
own weighting, and you've reinvented the Laplace transform.  If you
let these numbers be complex rather than real, you've reinvented the
Fourier transform, and are well on your way to Kalman filtering and
wavelet transforms.

Things are complicated by the fact that we alternate between meetings
at two locations, and many people are more likely to show up to
meetings at one location than the other.  (See the cross-plot at
http://www.wsfa.org/journal/j02/4/#aar)

I also have (or can easily get) weather information for all meeting
dates.  Perhaps I should check to see if there are any fen who are
less likely (or more likely) to show up during bad weather.

This is where I usually bog down, coming up with more and more elegant
approaches for modeling and analyzing WSFA attendance, but never
actually *doing* anything much beyond collating and presenting the
raw numbers.

Since it's approaching the time of year at which I do that collating
and presenting, maybe I should try and find the time to start
programming, and do the analysis.  With 143 people who have attended
three or more of 192 meetings I have data for, a total of 5149 data
points, I finally have a fair amount of data to play with.  And since
I've finally pushed back the online Journals to before attendance was
recorded, I won't have significantly more data if I wait another year.

(Any WSFAn who replies:  Please note that Tony is not on the list, so
please be sure to CC him, agarcia at starbaseunix.org.  Thanks.)
--
Keith F. Lynch - kfl at keithlynch.net - http://keithlynch.net/
I always welcome replies to my e-mail, postings, and web pages, but
unsolicited bulk e-mail (spam) is not acceptable.  Please do not send me
HTML, "rich text," or attachments, as all such email is discarded unread.