Newsgroups: comp.lang.lisp, sci.math.num-analysis
Followup-To: sci.math.num-analysis
From: hoey@zogwarg.etl.army.mil (Dan Hoey)
Date: 29 Jul 91 22:32:44 GMT
Subject: Re: random selection w/o replacement

The key point here is to manage the non-replacement by exchanging the
selected elements with elements at one end of the sequence.  That way the
unselected elements form a contiguous subsequence, which you can index
into with (random number-remaining).  That way, if your sequence is a
vector, you take time linear in the number selected.

If you have to do this a lot of times, you can use the vector over and
over again.  It just gets more and more randomized.

And of course if you select all the elements, you've got a shuffle.

(defun select-without-replacement (n eseq &aux (len (length eseq)))
  (dotimes (used n (subseq eseq (- len n)))
    (rotatef (elt eseq (random (- len used)))
             (elt eseq (- len used 1)))))

>if the number of items you're selecting is << than the length of the
>sequence, then the following is simple and probably okay:
>(defun randum[sic]-elements (n eseq)
>  (do ((len (length eseq))
>       (result nil))
>      ((= n (length result)) result)
>      (pushnew (elt eseq (random len)) result)))

The I'th element into result will take Theta(I + I^2/LEN) time on
average, so the total time will be Theta(N^2 + N^3/LEN).  Provided, of
course, that ESEQ is a vector.

>... This can be improved by doing the member test explic[i]tly,
>thereby avoiding taking the length of result on each iteration....

Pretty minimal speedup, since pushnew (or member) has to look at the
whole list anyway.

>A better algorithm is to copy the sequence into an array (fast, no
>garbage), randomize the array (slow?, no garbage), and then take the
>first N values out of the array (fast, no garbage).

As I've shown, you only need to randomize the part of the array you
are going to use.

>I tend to swap the 0th element with a random element a large number
>of times....

Now that is just wrong.  Worse, that is a common beginner's mistake
for shuffling a deck of cards, and you should get over it before you
start giving advice on the subject.  Your method will randomize the
first element of the array, but you will never randomize the rest of
the array (unless LEN<=2).  It's easy to see you can't get a random
shuffle, because the the probability of each outcome is a terminating
fraction in base LEN arithmetic.

The way to shuffle a deck of card is to--guess what--select without
replacement (until you reach the end then stop).  The above function
will do it.  In some languages it's easier to collect the selected
items at the beginning of the array, and that's a reader's exercise.
But remember: if you keep calling random with the same argument, you
should start thinking about how that has got to lose.

Followups belong in an algorithms group: this is not language-specific.
I guess sci.math.num-analysis is about it.

Dan Hoey
Hoey@AIC.NRL.Navy.Mil