Date: Thu, 20 Jun 2002 22:39:18 -0400 (EDT)
From: "Keith F. Lynch" <kfl at keithlynch.net>
To: WSFAlist at keithlynch.net
Subject: [WSFA] Re: Speel checkers?
Reply-To: WSFA members <WSFAlist at keithlynch.net>

Ted White <tedwhite at compusnet.com> wrote:
> Actual content filtering is *dangerous*.  It implies content
> monitoring.

That depends on who is doing the filtering.  I have written my own
content filters, which nobody else has seen.  Any message containing
certain character strings which are very common in spam and very
uncommon otherwise is discarded unread.

This is in addition to discarding anything from Argentina, Korea, or
Taiwan, anything from a site in the RBL blacklist, anything in HTML,
and anything with an attachment.

Steve Smith <sgs at aginc.net> wrote:
> If you don't want people reading your e-mail, you need to encrypt
> it.  Both Netscape and Microsoft mail programs have encryption
> built in.

But do you trust them?  Better to use PGP.  And how do you know you
can trust your copy of PGP?  While I've never used it, I have a copy
handed to me on diskette by Phil Zimmerman, which is as good a chain
of custody as you can get.

PGP is dependent on yet another major unsolved math problem:  There
is no known way to quickly factor a large number.  It's easy to find
two large prime numbers, and even easier to calculate their product.
But given only the product, it's completely infeasible to find what
the primes were, unless you have a billion supercomputers all running
for geological ages.  I hope if anyone ever finds a way to quickly
factor large numbers, word of this breakthrough gets out, and isn't
known only to the government.

Ted White <tedwhite at compusnet.com> wrote:
> But it was my impression that spam was detected based not on its
> content but on its mode of dispersal.

There have been many attempts at filtering spam.  All of them have
problems with false positives and false negatives.  The ideal solution
would be on the supply side:  Send spam, go to jail.

Common filtering methods include:

* Based on the apparent origin.  Almost useless, since spammers almost
  always forge this, and change it sometimes several times per *second*.

* Based on the IP addresses in the headers.  This works pretty well,
  and is how I'm blocking three nations and the sites in the RBL.
  But plenty of spams still slip through.

* The "Brightmail" approach, of salting the net with addresses which
  spammers then harvest, and discarding any message which is CCd to
  one of those addresses.  It requires that you redirect all your
  email to pass through Brightmail, however.  And spammers often make
  tiny automatic changes in their spam from instant to instant to
  evade this.  Which Brightmail and their competitors attempt to
  compensate for with "digital fingerprinting" or commonality indices.
  (The newsgroups do something similar -- some newsgroups exist only
  as spam traps, and anything cross-posted into them is automatically
  cancelled from all newsgroups.)

* Based on keywords in the body of the message.  There's a particular
  (imaginary) federal law which spammers often cite, for instance.
  And certain P.O. boxes and phone numbers which belong to certain
  persistent spammers.  Spammers often evade this by  s p a c i n g
  or d.o.t.t.i.n.g. key words or numbers in their text.

* Based on the message being HTML or having attachments.

Another approach, instead of (or in addition to) filtering, is hiding.
Keeping one's address secret from automatic harvesters, preferably
in a way that doesn't keep it from real people.  For instance
instead of using my real email address in postings, I could use
kfl at keithlynchgiraffe.com, and add "remove the animal to reply".
Disposable addresses are another variation on this.  But they have
the same disadvantage as frequent changes of street address.  Plenty
of legitimate correspondants will lose track of you.

Another approach is to require all legitimate messages to contain some
particular phrase.  Or to require them to have the same subject line
as a message you sent from the address being written to.

Since it takes a certain number of complaints to get a spammer shut
down, filters which block 99% of the spammer's messages just means he
gets to send 100 times as many.  The same number are seen, albeit by
different people.  Filtering and hiding are not a solution to spamming
any more than tight clothes are a solution to obesity.  It just bulges
out somewhere else.

One often-made suggestion is to require some kind of postage on email.
Even one cent would completely wipe out spam, since that one positive
response in ten million would cost the spammer $100,000.  Unfortunately,
this isn't possible with email as it's currently constituted.  I expect
we'll see some sort of postage-based system replace email as email
becomes increasingly useless.  I just hope it isn't proprietary, and
doesn't belong to Microsoft or AOL.

> Any program that monitors content (I wouldn't assume any humans are
> actually *reading* anything) borders on the FBI's Carnivore -- at
> which I look askance.

Even if you wrote it and are running it?

Of course whether you're filtering or not has nothing to do with
whether Carnivore or some similar system is scanning your email.

It's been suggested that all sorts of deniable secret messages can be
hidden in spam.  A particular phrasing might mean "ATTACK AT DAWN".
And the fact that you received that message would cause no suspicion,
since everyone gets tons of junk from people they don't know.  Even
traffic analysis would be useless, since fake spam would be lost in
the noise of real spam.

Some people who are annoyed at Carnivore put Carnivore-bait in their
email to waste the government's time and energy.  Something like
"Hijack Al Qaeda Nuclear Heroin Bomb Smuggle Khaddafi Libya Opium".

ronkean at juno.com wrote:
> I would think that just about any ISP would want to close the
> account of anyone who spams via their service,

They do.  Some of them don't have 24 x 7 abuse desks, however.  And
some ISPs aren't real ISPs at all, but are just fronts for spammers.
It takes time for their upstream site to figure this out and pull the
plug on them.

> because of the large number of bounced messages coming in as a
> consequence of the spam.

Not a concern, since nearly all spammers forge their address.  The
bounces go to the forged reply address.

> 500 MB is way larger than what ISPs usually provide for an inbox, so
> it seems that the spammer's inbox would quickly overflow, and the
> extra data would create an annoyance for the ISP, requiring the ISP
> to take some action to clear the data.

It would be a waste of their bandwidth, but not of their disk space.
Once the inbox is full, additonal emails are discarded unread, and
take up no disk space.

> Also, much spam these days is html, with graphics, so the average
> spam message is really far larger than 1 KB.

I suspect spammers are steering away from HTML and attachments.  For
one thing, it's possible to send a larger number of smaller messages
in a given amount of time.  For another, increasing numbers of people
discard all HTML and attachment-mail unread.

> how do spammers access the internet?  Do they open a new account
> for each spamming session, even if the account will work for only a
> few hours?

Sometimes, yes.  Other times, they open accounts at clueless or malign
foreign ISPs.  Or simply automatically break into other sites and
reprogram them to send millions of spams.

When they do open accounts, it's usually using credit card numbers
given by clueless would-be customers.  And using the name and street
address that go with the credit card.  Identity theft on a massive
scale.

> Juno offers free email accounts, but Juno is not suitable for
> spamming ...

True.  Nobody spams from Juno, Yahoo, or Hotmail.

> So, how do spammers access the internet in a way which allows
> hundreds of thousands of messages per day to be broadcast?

Using a traditional ISP, or a corporate system, which allow full
control by a user or his programs.  I could easily write a program
which would email this message to every email address I can find.
Of course my account would very quickly be cancelled, but probably
not before I was able to send a few million copies.

> ... it sounds like some of the complaining you have done has been
> to the spammers themselves, as distinct from the business being
> advertised (if different from the spammer), or the originating ISP.
> I would think that complaining to spammers would be a waste of
> effort 99% of the time, based on the hostile or ignorant attitudes
> evidenced by the spammers' responses, as well as the common sense
> observation that spammers are not ashamed of what they do.

Yes, I always CCd my complaints to the spammers.  Usually, the address
didn't work.  Or if it did, my message was probably discarded unread.
But some were seen.

Why?  Because I believe everyone has the right to confront their
accusers.  And to show that I'm not afraid of them.  And in case some
of them really *are* that clueless.  And so that they can't honestly
claim that nobody ever complained.  And so that they know I'm on to
them (I often include details of their "business" they don't want
known).

And because I like to collect angry replies such as the ones I quoted
from.  They showed I was having an effect.  And they took TIME to
write.  If everyone who hates spam could cause each spammer to waste
just ONE SECOND, they'd have no time left for spamming.

Sadly, I had not enough of an effect.  You can't get rid of a roach
infestation no matter how much you stomp them, or how gory a mess you
make when you do.

> Also, complaining to spammers might result in them taking some
> malevolent retaliatory action against you, e.g. making false
> complaints to your ISP that you are spamming or harassing, hitting
> you with an overload of email,

Yes, they've done all of these things.  I'm still here.

> or simply adding your address to as many spamming lists as they can.

So?  They'd do that ANYWAY.

Some of them really do have remove lists.  If not for everyone who
asks to be removed, then at least for the real troublemakers like me.
The ones who get a spam at 2 am, and have their account nuked by 2:03.

> I would think that it would be much more effective to complain
> directly to the originating ISP, and possibly to the business being
> advertised (if it seems to be a legitimate business), and just
> ignore the spammer.

I CCd all of them.  Also the FTC, the appropriate state attorney
general, the appropriate Better Business Bureau, and (where
appropriate) also the FBI, SEC, FDA, IRS, etc.  Not that any
government agency has ever done anything, as far as I can tell.
Some of them asked me to fill out various forms.  Right.  I was
burning up pretty much *ALL* of my free time devoting less than
one minute to each spam.  To spend an hour on each one, I'd have
to hire a hundred full-time assistants.

> I would think that even sending a 'remove' request to the spammer
> would most of the time result in your address being added to a
> 'live' list, rather than being removed.

This is probably one of the main reasons I get so much spam.  I added
my name to several hundred remove lists, including over a dozen
"universal" or "global" remove lists.  Not because I thought they
would work, but to prove that spammers operate in bad faith.

I no longer feel I have to prove anything.

> It would seem to be well within the capability of the technology to
> put email traversing the internet backbones through a parser which
> could be programmed to check for messages sent to more than, say,
> 100 addresses, or messages of identical content from the same origin
> which number more than, say, 100 within a day.  Those messages would
> then be simply deleted from the data stream.

That would destroy all legitimate email lists, such as this one.
(Well, ok, we don't have a hundred subscribers.  But plenty of
lists do.)  Also, email simply doesn't work like that.  There's
no central hub.

Besides, spammers often vary their message slightly, if only by
sticking random letters and numbers on the end.  They do this to
evade such filtering.

> A more sophisticated protocol might keep running track of the
> cumulative number of 'To:' addresses within any messages sent by a
> given sender that day, and simply delete from the stream any further
> messages from that sender that day, once the cumulative 'To:'
> address count reaches some number, say 500.

It's hard to figure out who the real sender is.  Besides, there are
legitimate needs to send to more than 500 addresses.

> Since there is a legitimate need for some businesses to send out
> newsletters to their customers, those businesses could be issued
> a permissive code which would be appended to the headers of their
> outgoing messages.

To be issued by whom?  What would keep spammers from getting such
codes?  How could an ordinary individual get one if he had a
legitimate need for one?

> One would think that the market would richly reward ISPs who are
> among the first to implement such a spam elimination system,

How?  Who would sign up on an ISP that treats its customers like
potential criminals?  Especially when doing so wouldn't put a dent
into spam unless all other ISPs did the same.

It would be like your trying to make a high crime neighborhood safer
by moving there and removing all weapons from your own house.

"Michael Walsh" <MJW at mail.press.jhu.edu> wrote:
> A list I'm on - small presses, but not skiffy - one of the folks
> brought up the offer he/she had received for 14,000,000 email
> addresses on a CD.  Did not have clue as to what sort of firestorm
> they would be hit with should they make use of the offer.

How could they not?  Did they think all those people had volunteered
to receive random junk from everyone?  Or did they think that even
people who didn't volunteer didn't mind being repeatedly spammed?

> Yes, there are *some* well-meaning - but without internet savvy -
> innocents out there.

I don't think it requires any knowledge of the Internet to know that
spam is universally unwelcome.

Ted White <tedwhite at compusnet.com> wrote:
> Some of the spam I get has, in its list of recipients, variations
> on my name and e-address.  Makes me wonder if addresses are being
> computer-generated, rather like its equivilent, telemarketer phone
> calls, uses computer-generated phone numbers.

Yes, at least on major ISPs.  Every possible combination of <initial>
<common last name> at aol.com and <common first name>.<common last
name> at aol.com gets spam.  AOL is not pleased by this.
--
Keith F. Lynch - kfl at keithlynch.net - http://keithlynch.net/
I always welcome replies to my e-mail, postings, and web pages, but
unsolicited bulk e-mail (spam) is not acceptable.  Please do not send me
HTML, "rich text," or attachments, as all such email is discarded unread.