Newsgroups: sprouts-theory
From: Dan Hoey <Hoey@AIC.NRL.Navy.Mil>
Date: Wed, 24 Dec 2008 15:00:38 -0500
Subject: Re: Sprouts notation

Yper Cube sent me a note requesting input on Sprouts notation suitable
for a position library as in GLOP.  With his permission, I'm
responding to sprouts-theory generally.

Yper Cube wrote:
> I liked the parenthesis idea.It doesn't work in positions arousing from
> torii and proj. planes surfaces but maybe it can still be used sometimes.
> (Example: we can't use it in  .abcabc.
> but we can write   .abacbc.
> as  .(b)(b).   , can't we?

I think this is a good idea when possible.

> I would also prefer GLOP to use "{" to mark the start of a region and not
> only to mark its ending and "." to mark for boundary starting as well. Like,
> instead of:
> 0.0.AB.}AB.}]!
> use something like
> {.0.0.AB.}{.AB.}]!
> or maybe
> [.0.0.AB.}{.AB.}]

> In a  recent sprouts theory post, I mentioned another possible
> technique to copmact notation further.
> Use of * or ^ when we have multiple identical boundaries.
> Example. instead of    0.0.0.0.}], use:
> 0*4.}]
> (GLOP uses the above now)
> or
> [{(.0.)^4}]
> or
> [{(.0.)*4}]

I strongly disagree with this.  We should save our matching brackets
for things that need to be grouped recursively, whereas boundaries
within regions within lands are handled by hierarchy.  Boundaries
within a region should be _separated_ (as with "."), regions should be
separated (say, with "/"), and lands should be separated (say, with
"%").  The use of a trailing separator is superfluous and should be
avoided, so the above would be 0.0.AB/AB .  The "!" for
end-of-position should only be needed if there's some other thing
following the position.  A trailing separator should be permitted on
input, but ignored.

For nonplanar regions, I suggest a region should be suffixed with +n
for T^n or -n for P^n.  Since the region will always be followed by
"/", "%", "!", or end-of-string, there is no restriction on multiple
digits for n.

> Using parenthesis, we can extend this to include more complex
> objects (than boundaries)
> Like, instead of    0.0.0.0.2.2.2.2.2.AB.CD.EF.}AB.}CD.}EF.}  ,use:
> {(.0.)*4(.2.)*5(.AB.{.AB.})*3}

This can be done in several ways without the leading or trailing
separators.  Perhaps the simplest is to always use a kind of brackets
(say "<>"), so that <0.>4 means 0.0.0.0
                 <1.2A/>2 means 1.2A/1.2A
                   <12>32 means 1212122
             <(>3<(<)>2>3 means (((())())()) .
The <...> is always followed by a single digit specifying the repeat.
A position canonicalizer may not want to use all of these constructs,
but they should be recognized on input.

Your last example in which the parentheses imply a change of variables
is somewhat problematic.  I think it should be done with a different
approach, which I call the "partial position" representation.  I've
talked about partial positions before--I think they will eventually
appear in the database by themselves, along with simplifications such
as the ^x:x.0 = ^x:x2 that has been observed before.

A partial position is a land that has some unmatched pivots called
"parameters".  The parameters are listed in the front of the partial
position in a sort of lambda notation, like ^xyz:1xy2z.2 .  This
refers to a part of a position whose pivots are to be matched up with
another part of the position in a way to be determined.  Copies of the
partial position may be matched up in several different places.

When there is only one parameter, the presence of that parameter
outside the partial position denotes a pivot attached to a copy of the
partial position.  So ^x:xA/A1%xx.x means AB.C/AD/D1/BE/E1/CF/F1 .
Partial positions can instantiate other partial positions:
^x:x((1))%^z:x.xz%zzx means
ABC/D.EA/D((1))/E((1))/F.GB/F((1))/G((1))/C((1)) .

With a multiple-parameter partial position instantiated more than
once, we must specify which parameters go with which instantiation.
In the simplest case, on a plane where the matching pivots appear in
the same region (and therefore on a single boundary), the parameters
parenthesize, so we can specify parameters as [x...y...z] without
ambiguity.  For instance, ^xyz:1xy2z.2%xy[z1[xyz]yx]z would mean
ABF1GHJEDC/1AB2C.2/1DE2F.2/1GH2J.2 .  In more complicated situations,
we can specify multiple parameter sets for a partial position as in
^xyz^tuv:x(y(z))%xtuyvz for ADEBFC/A(B(C))/D(E(F)) .  The primary
parameter set can still be parenthesized if necessary, but the others
must be unambiguously matched with each other.

Finally, I would like a construct for specifying the join of two
partial positions.  I propose using "=xyz=tuv" , where the first "="
replaces the preceding "%" character.  So ^xyzw:xyzw1=xyzw=xywz refers
to the (nonplanar) position ABCD1/ABDC1

On alphabets: Glop introduced the idea of lower-case characters for
pier spots, which can appear on multiple boundaries within a position.
upper-case characters were reserved for pivots.  The problem is that
we sometimes run out of one kind or another.  With the addition of
parameter letters, the problem is worse, so I propose a more general
approach.  For readability, a program is encouraged to use ABC...  for
pivots, abc...  for labeled pier spots, and xyz... for parameters so
far as possible. The rules for recognizing which is which should be
loosened so that any letter can be used for any of these purposes.

A letter that appears in a lambda expression "^xyz:" is always a
parameter, and those are the only parameters.  For multiple parameters
in brackets, each parameter must appear just once except within inner
brackets.  Other multiple parameter sets appearing on a boundary are
assumed to match each other, so ^xy%xy.xy means AB.CD/AB/CD; we need
^xy^zw:xy%xz.yw to refer to AC.BD/AB/CD .  No other unbracketed
parameters of the set may appear on that boundary.  Other sets of
unbracketed parameters that appear within a region are assumed to
match each other, so ^xy%Axy/A1xy means ABC/A1CD/BC/CD ; again we
can't have other stray members of the set in the region.  If none of
these is possible, each set of parameters can appear only once in a
land: ^xy^zw^tu:xz.yt/uz for AD.BE/FC/AB/CD/EF .

Excluding the parameters, a letter that appears twice on a boundary is
a pier spot, and may appear only twice on that boundary.  Excluding
these, a letter that appears on different boundaries in the same
region may appear only twice in the region, and the two letters match
each other.  (This can only happen on nonplanar boards).  After all of
these are taken into account, remaining letters must appear exactly
twice per land, and match each other.

Just in case we end up with a need for more than 52 letters, I believe
we should allow an sequence like &word; to be used for a letter, where
"word" is composed entirely of letters.

So the grammar we have is

      <position> := <gamelabel> <spec> | <spec> | <position> "!"
     <gamelabel> := <number> ( "+" | "-" ) ":"
          <spec> := <simple>
                    | <partial> "=" <letters> "=" <letters>
                    | <partial> "%" <spec>
       <partial> := "^" <letters> ":" <land>
                    | "^" <letters> ":" <partial>
       <regular> := <land>
                    | <land> "%" <regular>
                    | <regular> "%"
          <land> := <toporegion>
                    | <toporegion> "/" <land>
                    | <land> "/"
    <toporegion> := <region>
                    | <region> ( "+" | "-" ) <number>
        <region> := <boundary>
                    | <boundary> "." <region>
                    | <region> "."
      <boundary> := <subboundary>
                    | <subboundary> <boundary>
<subboundary> := <site>
                    | "(" <boundary> ")"
                    | "[" <letter> <boundary> <letter> "]"
          <site> := <letter> | 1 | 2 | "&" <letters> ";"
       <letters> := <letter> | <letter> <letters>

This grammar accidentally prohibits extended letters as parameters,
but maybe that's a good thing.  There are other requirements such as
that parameter lists can't repeat a letter, but that is for symantic
analysis.

The gamelabel is in case someone needs to mark the position as normal
or misere or specify an initial number of spots.  I suppose I should
allow the number or sign to be omitted.

The use of "<>" for repetition is a metalanguage used for abbreviating
this language.  It is expanded textually first before applying this
grammar.

We still have a set of brackets "{}" that aren't used, but I find they
tend to look too much like parentheses to be very useful.  Other
symbols such as @~#$*_\,;?'"` may turn out to have some use as well.

Finally, I should note that GLOP databases need to have a version
number at the top so that older versions can be recognized and
converted to newer ones.

Dan