A bestiary of cores

The Hoon core is just not a gadget you're familiar with from your Earth programming career. A core is not a lambda, a closure, a library, an object, etc. Or perhaps we should say: when we translate any of these into Hoon, we get a core. So we cannot translate "core" into any one of them.

But clearly in practice there are lots of different kinds of practical cores. For example, we have already met Hoon's equivalent of the lambda, the gate. Every gate is a core, but not every core is a gate. And the gate is by no means the only subspecies of core. So describing cores becomes a big part of the Hoonist's alien vocabulary.

Let's review briefly what we know about cores. A core is a noun of the form

[battery payload]

where the battery is a tree of formulas, each invoked with the entire core as the subject, and the payload is any noun.

There are three main ways we can talk about a core: its variance model, %gold, %iron, %zinc or %lead, %gold by default; the inference approach of each of its feet, %ash or %elm, informally dry or wet, dry or ash by default; and its payload pattern, which makes it a gate, reef, book, trap or tray. There are also several ways of using cores: you can pull, kick, slam or slug them.

For an explanation of variance and inference, see chapter 10. Here, let's go over the payload patterns. Unlike most concepts in Hoon, these are strictly informal conventions. A core is a core - there is no code in hoon.hoon that deals with a gate per se. Nonetheless, these definitions are precise and unambiguous.

Book

A core whose payload is either another core, or a constant (%cube), is a book.

Reef

A book whose payload is either another reef, or a cube, is a reef. Effectively, a reef is an entirely constant core.

Trap

Any core with one arm named %$ is a trap. "It's a trap!"

Gate

A gate is a trap whose payload has the form [sample context], sample being dynamic data and context being anything.

Tray

A tray is any core whose payload has the form [sample context].

Categories

Every reef is a book. Every gate is a trap. Every gate is a tray.

Using cores

As we already know, a name or axis (which of course applies to all nouns, not just cores) is a limb. A list of limbs is a wing.

To access a wing or limb in a core is to pull it. When we write a.b.x, we pull the wing a.b (ie, a within b) from x. If the result is defined as a subtree of x, the wing is a leg. If it is computed from x by a formula, it's an arm.

To pull %$ - the empty name - on a core is to kick it. To replace the sample with a in gate g then kick the gate, is to slam g with a.

(The archaic usage of "call" for slam is discouraged, but common. Please at least do not use "call" to mean pull.)

Slamming is a special case of slug, which pulls any limb on a tray after replacing the sample. To replace the sample with a on tray t, then pull m, is to slug t to m with a.

Thus when we write

(~(get by foo) bar)

++by produces a tray. We slug by to get with foo, producing a gate that searches the map foo, and slam this gate with bar. This is the most normal possible thing in the universe, though it must seem strange to you now.

On jet propulsion

Consider the problem of optimizing a Nock interpreter. Let's assume that this process must not and will not contaminate the pristine semantics of Nock itself.

But by definition, the interpreter does not need to be optimized to compute all possible functions as fast as possible. It only needs to be optimized for the functions it computes.

Moreover, any given function (to use the word in its abstract mathematical sense), can be computed by an indefinite, and generally infinite, number of Nock formulas. Again, we do not need to optimize all possible formulas - just the ones our interpreter actually executes.

A crude approach to this problem would be to compute the hash of every formula we interpret, and hardcode these hashes into the interpreter. For hashes both commonly executed, and known to match some known algorithm, a manual implementation of that algorithm would be run.

Not quite what we do, but it's close enough. Nock's conventional jet model is more or less this hash-matching with the addition of a reasonably convenient and effective mechanism for matching formulas and their manual implementations ("jets").

In general it is not correct to match the formula when detecting jet invocations. Rather, we need to match the entire core and the arm being pulled. In other words, when instrumenting a Nock interpreter, the correct way to match jets is to sit on top of Nock 9 and match both subject and arm.

When matching the subject to see if we can use a jet, we need a jet description that (a) expects the subject to be a core, (b) uses a hash to make sure it has the battery, and (c) imposes some other complex constraint on the payload of the core.

(Who writes this jet description? Perhaps the programmer who wrote the core. But this requires the programmer to either be the developer who runs the core, or to have some parallel channel for delivering her optimizations to the consumer.)

In any case, our task is matching the core to the jet. We divide this into two parts: one, matching the battery (and arm, since different arms in a core may or may not be jet-propelled); two, checking the payload constraint.

It is not practical to compute a hash, especially a secure hash, of every formula we reduce in Nock - or even of every battery. We also do not want to make this match heuristically. Finally, we do not want to give the developer - who may well be creating the battery and the jet at the same time - the job of computing hashes by hand.

Battery matching is still a difficult and potentially ugly problem. Worst of all, it seems potentially slow. Some things that save us:

One, at least unless co-developing core and jet, it is generally not common for new batteries to be created. Therefore, caching is very worthwhile and pointer equality is often useful.

Two, we do not have room for a full secure hash on every noun, and we do not want to distinguish batteries from ordinary nouns when it comes to storage. However, every indirect noun does have a 31-bit slot for a conventional FNV hash, ++mug. This is computed lazily, relying on the fact that 0 is not in its range. (The mug slot is very useful for jet dispatch - but not used only for jet dispatch.)

Broadly speaking, while ++mug is by no means infallible, it reduces the associative process enough for linear search to be perfectly practical.

We then need to check the payload. Broadly speaking, the payload consists of some mosaic of code, static data, and dynamic data.

Consider a giant reef, like the one hoon.hoon constructs. A reef is a stack of books, in which each book is the payload of the one above it. To make sure our jet matches an arm in any of these books, we need to check every battery in the stack.

The trick to checking a payload is to simplify the payload into a single parent core and a surrounding bundle of data. To recognize its core, it is sufficient to (a) recognize its parent - matching, once again, by the battery - and (b) validate all other data in the core.

The most common structure of this form is, of course, the tray [battery [sample context]]. (Again, iff battery is a single formula which implements an arm %$, our tray is a gate.) Trays and other core conventions are simply design patterns, which may be violated with complete impunity. And there of course may be multiple cores within the payload. One must be selected and treated as the parent; the rest are static data.

As for the sample and/or other dynamic data, it may or may not prove useful and/or efficient for a jet dispatch mechanism to enforce basic structure constraints on it. Remember that, at the Nock level, we have no guarantee that this core was constructed in accordance with Hoon type rules.

However, we can ask the programmer to at least do us the favor of declaring logical cores. Every time we construct a core - which, for a core defined by an arm in a book, ie, a typical library function, is every time we use it - we identify, using the three legs of the %sgcn hint, three aspects of this core:

(p) the axis to its parent core (q) the logical version of the algorithm implemented (r) a list of bound arms, as [name axis] pairs

Thus there are two sources of information for a match: the Hoon programmer, who specifies the logical name of her core; and the jet programmer, whose declaration matches that logical name, verifies the match by computing an actual hash, and also describes any other constraints on the core her jet requires for it to accurately simulate Nock.

In addition, it is very advisable for a jet framework to include tools that let the jet programmer bail out trivially with a refusal to handle any given case of the function, returning control to the pure Nock interpreter.

Another advantage of this model is that interpreters which receive a match for jets for they don't have, still have a declaration from the programmer's hand of the set of inner loops that need to be optimized. This completely eliminates any excuse for a complex tracing JIT. We know exactly where to JIT: wherever we don't find a jet.

Gonads

It's common in functional programming to create pipelines of functions which accept and produce the same type. If we have two, three, or 47 gates which both accept and produce an atom, it's easy to string them together into one gate with two, three, or 47 pipeline steps.

Suppose we think of one of these homogeneous parts as a block:

   ----
  |    |
->| F  |->
  |    |
   ----

one little function

It's clear that these blocks fit neatly together into a pipeline:

   ----  ----  ----  ----
  |    ||    ||    ||    |
->| F  || G  || H  || I  |->
  |    ||    ||    ||    |
   ----  ----  ----  ----

one big function

But suppose we have a function whose product isn't the same type as its sample?

   ----
  |    \
->| F   )->
  |    /
   ----

one funky function

It would seem difficult to use this as a building block. But in fact, all we need is a little piece of glue that accepts the product type of F, and produces its sample type:

  ----
 \    |
->) U |->
 /    |
  ----

one gluey function

With this, we can construct pipelines using U as the "mortar":

   ----  ----  ----  ----  ----  ----  ---- 
  |    \\    ||    \\    ||    \\    ||    \
->| F   )) U || G   )) U || H   )) U || I   )->
  |    //    ||    //    ||    //    ||    /
   ----  ----  ----  ----  ----  ----  ---- 

one big funky, gluey function

Note that just as FGHI in the simple model has the same interface as F, G, H or I, FUGUHUI in the complex model has the same interface as F, G, H, or I.

Let's say the sample of F, the straight line, is X, and the product of F, the bump, is Y. So FUGUHUI turns an X into a Y, just the way F does.

We can make this system even more powerful if, instead of using the simple tool U, a gate whose sample is Y and whose product X, we use a more complex tool V. V is a gate whose sample is a cell [m n], where m is a Y and n is another gate matching F. V's product remains a Y.

Here's what this looks like in ASCII art:

   ----  ---- 
  |    \\    \
->| F   )m V  )->
  |    //    /
   ----  ---- 
          |n
         ----  ----  
        |    \\    \
      ->| G   )m V  )->
        |    //    /
         ----  ---- 
                |n
               ----  ----  
              |    \\    \
            ->| H   )m V  )->
              |    //    /
               ----  ---- 
                      |n
                     ----  ----  
                    |    \\    \
                  ->| I   )m V  )->
                    |    //    /
                     ----  ---- 

one big funky, gluey, complicated function

What's the advantage? A V-function is more powerful than a U-function, because the U-function has no choice but to turn its X into a Y and use n, the rest of the pipeline, on it. The V function can do anything with its input - it could even ignore the rest of the pipeline and just produce the m it got in.

In Hoon, we call one of these funky, gluey, complicated pipelines a gonad. The gonad rune is ;~, [%smsg p q]. A gonad has one go, the V part - p in [%smsg p q] - and one or more nads, the F, G, H or I parts (q).

There's one other complicated trick we play to make our gonads work better.

If you look at the expansion, you'll see that before we pass n to V, the go, we reset its sample to the sample of the gate that %smsg produces. Since neither the go nor n would otherwise see this value, it allows for more hackery.

For instance, gonads are often used for parsing. A common go for parsing is ++pose, whose nads are parsing rules tried in order - the first to parse succeeds. Because of this trick, when the first nad returns a parsing failure, the second nad is sent to the go pre-configured with the original problem.

The go doesn't have to do a proper slam on n, replacing its sample. It can just use n as a trap, leaving the sample default - in the case of ++pose, the original parsing problem. Which is exactly what we want.

On debugging your urbits

Okay so you tried to do something and it didn't work.

What happened? Hopefully, nothing bad. But coding is a vale of tears. Here are some bad things that can happen to you in Hoon.

Parse error

You could be 'avin a parse error. The Hoon parser reports the location - line and column - of errors quite accurately.

Let's generate an erroneous file. Open up try/bin/foo.hoon, an empty file, and put a # at line 8, column 4. Then type

~waclux-tomwyc/try=> :foo
! /~waclux-tomwyc/try/~2014.2.9..06.58.40..e352/bin/foo/hoon
! {8 4}
! 'syntax-error'
! exit

That means you got a syntax error at line 8, column 4, when trying to load /~waclux-tomwyc/try=/bin/foo/hoon. (You may also see traces for /~waclux-tomwyc/arvo=/batz/hoon, which simply means Arvo was compiled with debugging.)

This is always exactly right, for certain values of "right." As always, you have to understand what Hoon is thinking.

The position of the error is the position at which the parser stopped being able to parse - it has no way to interpret the byte at this line and column. Often this is actually where you screwed up. It is at least no later than you screwed up.

But one of the prices we pay for for a tall form without terminators is that the parser relies on knowing that after a :+, for instance, it expects to parse exactly three twigs. If you accidentally leave one of these three out, the parser at :+ will steal whatever twig should come next.

Generally at some point the parser will collide with reality and reality will win. At this position, you'll see a syntax error. But bad luck can move this point some distance from the mistake.

This is never a serious problem once you get used to Hoon, but it can be a pain in the butt while you're learning. To attack any mysterious syntax error (in any language, not just Hoon), work by binary search reduction - pull out giant chunks of code to see if the error is there, or elsewhere.

Type failure

Okay, you parsed. But you're still failing. Plenty of fail.

By far the most common semantic failure from the Hoon compiler is the cryptic type-fail. What failed exactly? We have a lot more information, but we'd rather not give it to you up front.

The second most common error is missing limb. Here we actually tell you the name of the limb. But look, if you write b.a, you'd better have a b in your a. You probably have the wrong idea of what type an a is. So it comes back to type again.

Essentially all Hoon compiler errors are in a sense type failure. Generally, the compiler wants to shield you from the direct details, because most ways of presenting the direct details automatically would, at least in certain cases, just belch all over the screen.

Instead, the most important information about any Hoon error is the location stack. Where did this happen? The general process of troubleshooting disregards the error details completely, and starts from the place your program failed.

Unlike the parser location, the type location trace is always correct. But you still have to know how to interpret it. And first, you need to know how to generate it.

Generate your trace

Every twig within a !: generates trace information, both at compile time and runtime. Normally you will see !: at the start of a file, and when writing simple programs it is a good habit just to put it in at the start.

Why wouldn't you always use !:? Because runtime tracing isn't free, and especially because it kills tail-call optimization. Also, because sometimes code just works. For all these reasons, there is no !: on hoon.hoon or zuse.hoon.

To experiment with tracing, copy try/bin/goodbye.hoon into try/bin/error.hoon.`:

  + /~waclux-tomwyc/try/4/bin/error/hoon
  ~waclux-tomwyc/try=> :error "world"
  "hasta la vista, world."

The simplest kind of error is on the command line itself:

  ~waclux-tomwyc/try=> :error
  ! type-fail
  ! exit

But this only happens if the file actually compiles. Let's make it not compile, by changing it from

|=  *
|=  [planet=tape ~]
^-  bowl
:_  ~  :_  ~
[%$ !>("hasta la vista, {planet}.")]

to

|=  *
|=  [planet=tape ~]
=+  foo=(add 2)
^-  bowl
:_  ~  :_  ~
[%$ !>("hasta la vista, {planet}.")]

We've introduced a subtle semantic error into our Folger's coffee. How might the discerning compiler respond?

~waclux-tomwyc/try=> :error "world"
! type-fail
! exit

It's mischieviously identical to the argument error. But if we add !: to the start of the file, we get:

!:
|=  *
|=  [planet=tape ~]
=+  foo=(add 2)
^-  bowl
:_  ~  :_  ~
[%$ !>("hasta la vista, {planet}.")]

and running it:

! /~waclux-tomwyc/try/~2014.2.9..22.36.30..2052/bin/error:<[2 1].[7 37]>
! /~waclux-tomwyc/try/~2014.2.9..22.36.30..2052/bin/error:<[3 1].[7 37]>
! /~waclux-tomwyc/try/~2014.2.9..22.36.30..2052/bin/error:<[4 1].[7 37]>
! /~waclux-tomwyc/try/~2014.2.9..22.36.30..2052/bin/error:<[5 1].[7 37]>
! /~waclux-tomwyc/try/~2014.2.9..22.36.30..2052/bin/error:<[5 5].[5 16]>
! /~waclux-tomwyc/try/~2014.2.9..22.36.30..2052/bin/error:<[5 9].[5 16]>
! type-fail
! exit

We see that the problem is on line 5 between columns 9 and 16. Ie, (add 2). Well, duh.

But suppose we can't see what's wrong with (add 2). Then, it's time for us to print some types, which we do with ~!:

!:
|=  *
|=  [planet=tape ~]
^-  bowl
~!  2
~!  +<:add
=+  foo=(add 2)
:_  ~  :_  ~
[%$ !>("hasta la vista, {planet}.")]

We see:

: /~waclux-tomwyc/try/10/bin/error/hoon
~waclux-tomwyc/try=> :error "world"
! /~waclux-tomwyc/try/~2014.2.9..22.41.38..5b4d/bin/error:<[2 1].[9 37]>
! /~waclux-tomwyc/try/~2014.2.9..22.41.38..5b4d/bin/error:<[3 1].[9 37]>
! /~waclux-tomwyc/try/~2014.2.9..22.41.38..5b4d/bin/error:<[4 1].[9 37]>
! /~waclux-tomwyc/try/~2014.2.9..22.41.38..5b4d/bin/error:<[5 1].[9 37]>
! @ud
! /~waclux-tomwyc/try/~2014.2.9..22.41.38..5b4d/bin/error:<[6 1].[9 37]>
! [a=@ b=@]
! /~waclux-tomwyc/try/~2014.2.9..22.41.38..5b4d/bin/error:<[7 1].[9 37]>
! /~waclux-tomwyc/try/~2014.2.9..22.41.38..5b4d/bin/error:<[7 5].[7 16]>
! /~waclux-tomwyc/try/~2014.2.9..22.41.38..5b4d/bin/error:<[7 9].[7 16]>
! type-fail
! exit

In other words: our sample, 2, is a @ud; and the default sample of add, +<:add, is a [a=@ b=@]. So we'd expect some problems here.

Hoon believes strongly in modifying the program itself, not some external metadata, to debug it. While a ~! has no runtime cost and negligible compile-time cost, take it out when you're done.

Printf, %sgpm

It's well known that real programmers don't use debuggers. Real programmers use printf.

~&, informally "printf," is a true printf that sends its operand, printed by its type (a bit of biblical magic), directly and magically to the console whenever it executes. It's okay to have side effects if they are write-only.

Let's try a ~& in a working program:

!:
|=  *
|=  [planet=tape ~]
^-  bowl
~&  [%planet planet]
:_  ~  :_  ~
[%$ !>("hasta la vista, {planet}.")]

It's as easy as this:

: /~waclux-tomwyc/try/11/bin/error/hoon
~waclux-tomwyc/try=> :error "world"
[%planet "world"]
"hasta la vista, world."

Please note the very important difference between these outputs. Both result in a line of text on the console - but "hasta la vista, world." is the legitimate product of the application bowl, whereas [%planet "world"] went into the interpreter as a hint and came out as a Unix printf.

Trace

We also can crash, and get a reasonably useful and accurate trace. With a very misconceived assert:

!:
|=  *
|=  [planet=tape ~]
^-  bowl
~&  [%planet planet]
?>  =(5 (add 2 2))
:_  ~  :_  ~
[%$ !>("hasta la vista, {planet}.")]

we get:

: /~waclux-tomwyc/try/12/bin/error/hoon
~waclux-tomwyc/try=> :error "world"
[%planet "world"]
! /~waclux-tomwyc/try/~2014.2.10..00.18.08..16ea/bin/error/:<[4 1].[8 37]>
! /~waclux-tomwyc/try/~2014.2.10..00.18.08..16ea/bin/error/:<[5 1].[8 37]>
! /~waclux-tomwyc/try/~2014.2.10..00.18.08..16ea/bin/error/:<[6 1].[8 37]>
! exit

Note that the printf is delivered as soon as it executes. Note also that we accurately report the location of the failing twig. And note finally that without !:, we get none of this info.

Stackf, %sgbr

~| has the same form as ~&, but it computes at all only if there is actually a runtime crash within the body of the ~|. In that case, it prints its data as part of the stack trace.

It's important to note that ~| does no work except at crash time - its hint is not a tank, but a trap producing a tank. Let's try it:

: /~waclux-tomwyc/try/12/bin/error/hoon
~waclux-tomwyc/try=> :error "world"
[%planet "world"]
! /~waclux-tomwyc/try/~2014.2.10..00.34.28..8f89/bin/error/:<[4 1].[9 37]>
! /~waclux-tomwyc/try/~2014.2.10..00.34.28..8f89/bin/error/:<[5 1].[9 37]>
! /~waclux-tomwyc/try/~2014.2.10..00.34.28..8f89/bin/error/:<[6 1].[9 37]>
! [%asteroid "world"]
! /~waclux-tomwyc/try/~2014.2.10..00.34.28..8f89/bin/error/:<[7 1].[9 37]>
! exit

And if there's no crash, ~| doesn't run:

!:
|=  *
|=  [planet=tape ~]
^-  bowl
~|  [%asteroid planet]
:_  ~  :_  ~
[%$ !>("hasta la vista, {planet}.")]

just gives us:

: /~waclux-tomwyc/try/15/bin/error/hoon
~waclux-tomwyc/try=> :error "world"
"hasta la vista, world."

Probably someone invented this before, but I haven't seen it.

Finally, if you want to do ~| but your problem is important enough that you have to write your own custom tank trap, ~_ is the rune for you.

The urban bull

There is actually another %type stem we forgot to mention. Finally, the full and final %type:

++  type  $|  ?(%noun %void)
          $%  [%atom p=term]
              [%bull p=twin q=type]
              [%cell p=type q=type]
              [%core p=type q=coil]
              [%cube p=* q=type]
              [%face p=term q=type]
              [%fork p=type q=type]
              [%hold p=(list ,[p=type q=twig])]
          ==
++  twin  ,[p=term q=wing r=axis s=type]

What is this %bull?

%bull is a feature of "middle Hoon" (164K), created by the new hoon =*. Most of Urbit is written in the older "classic Hoon" (191K), and does not use =*.

A common complaint in classic Hoon is that we have to use long, opaque paths, like p.i.t.p.q.u.q, when pulling deep wings from a large, complex subject. Of course, we could use =+ to push a copy. But that would be inelegant.

After studying this problem, our engineers invented %bull, the alias. Let's copy goodbye.hoon into bull.hoon, and add a =*:

!:
|=  *
|=  [planet=tape ~]
^-  bowl
=*  mars  planet
:_  ~  :_  ~
[%$ !>("hasta la vista, {mars}.")]

=* just makes a synonym. We can use this synonym anywhere we used the original wing, and the compiler will produce the same code as if we used the original wing:

~waclux-tomwyc/try=> :bull "world"
"hasta la vista, world."

Let's make this slightly fancier:

!:
|=  *
|=  [planet=tape ~]
^-  bowl
?>  ?=(^ planet)
=*  mars  t.planet
:_  ~  :_  ~
[%$ !>("hasta la vista, {mars}.")]

What would you expect to see here? Do you know already?

~waclux-tomwyc/try=> :bull "world"
"hasta la vista, orld."

and, of course:

~waclux-tomwyc/try=> :bull ""
! /~waclux-tomwyc/try/~2014.2.11..18.29.54..4a1a/bin/bull/:<[4 1].[8 35]>
! /~waclux-tomwyc/try/~2014.2.11..18.29.54..4a1a/bin/bull/:<[5 1].[8 35]>
! exit

Note that <[5 1].[8 35]> is exactly the span of line and column that contains the assertion twig ?>. Don't be frightened by broad error ranges - they are normal, as are big twigs.

And finally, there is no difficulty in both reading and writing through %bull. For example, purely to amuse ourselves, we can bullify decrement:

!:
|=  *
|=  [a=@ ~]
^-  bowl  :_  ~  :_  ~  :-  %$  !>
=+  b=0
=*  c  b
|-  ^-  @
?:  =(a +(c))
  c
$(c +(b))

Too much information

It would be vain to deny that Hoon is not a perfect language. Especially at this early stage, it has its quirks and rough spots, and some oddities will probably always remain. Every language has some.

One of the worst is the infamous TMI. What is TMI? Let's make tmi.hoon:

|=  *
|=  nos=(list ,@)
^-  bowl  :_  ~  :_  ~  :-  %$  !>
(snag 2 nos)

Let's see what this does:

~waclux-tomwyc/try=> :tmi 56 71 12 19
12
~waclux-tomwyc/try=> :tmi 56 71 12 19 22
12
~waclux-tomwyc/try=> :tmi 56 71
! exit

Tracing is not needed to figure out why. But suppose, for no good reason at all, we get some TMI:

|=  *
|=  nos=(list ,@)
?>  ?=(^ nos)
^-  bowl  :_  ~  :_  ~  :-  %$  !>
(snag 2 nos)

Now we have problem:

~waclux-tomwyc/try=> :tmi 56 71 12 19 22
! type-fail
! exit

Why this happen? Let's look at ++snag:

++  snag
  ~/  %snag
  |*  [a=@ b=(list)]
  |-
  ?~  b
    ~|('snag-fail' !!)
  ?:  =(0 a)
    i.b
  $(b t.b, a (dec a))

Obviously, ++snag produces entry a of list b. Less than obviously, ++snag only works for homogeneous lists.

You might think b, being (in this instance) a (list ,@), is a homogeneous list. But actually, we got b from nos, and we refined nos with ?=. It's now not a mere (list ,@). It's actually a [i=@ t=(list ,@)].

So when we recurse in ++snag, we are pulling $ from the trap produced by |-; and when we mutate this core with b=t.b, we are replacing [i=@ t=(list ,@)] (ie, a list we know is non-empty) with a mere (list ,@) (which could be empty). So, the compiler gets mad and has every right to.

Of course, you could say: the compiler should be smarter. No, it shouldn't be smarter. The right way of understanding a compiler's semantics is to understand what it does. If Hoon were smart enough to figure out that breaking a local rule here was, in some sense, globally ok, it would need a kind of mammalian global forebrain sitting on top of its reptilian walnut brain. Then you, the programmer, would need to grok this global forebrain as well. And for what? So you don't have to cast?

Mitigating TMI

Fortunately, there are an endless number of ways to hack around TMI - in this case, both on the caller and callee side.

In ++snag, if we instead recurse with (snag (dec a) t.b) in classic 20th-century style (normally a real solecism in Hoon), ++snag does not have a problem with nos. (This hack will probably be committed at some point, but ++snag is such a good example.)

You probably don't have commit rights on hoon.hoon, so you have to do your hacking on the caller side.

One thing you can do is:

|=  *
|=  nos=(list ,@)
?>  ?=(^ nos)
^-  bowl  :_  ~  :_  ~  :-  %$  !>
(snag 2 `(list ,@)`nos)

This fix has zero cost in runtime and if anything makes your code compile faster. On the other hand, it is slightly ugly and it needs to be done every time you use the value. You can also cast it back in place:

|=  *
|=  nos=(list ,@)
?>  ?=(^ nos)
=>  .(nos `(list ,@)`nos)
^-  bowl  :_  ~  :_  ~  :-  %$  !>
(snag 2 nos)

Even a very weak optimizer will optimize this to zero-cost. It is even uglier, however.

You can also cleverly avoid refining the subject at all. For instance:

|=  *
|=  nos=(list ,@)
?>  !=(~ nos)
^-  bowl  :_  ~  :_  ~  :-  %$  !>
(snag 2 nos)

Equality testing does not produce type information. This is a feature, not a bug. If you want to refine the subject, you really do have to use ?= or some synthetic hoon that resolves to it.

TMI and other bulls

Middle Hoon, however, added bulls not just because long terse wings are ugly, but also because they help with TMI.

When we ?= through a bull, the type refinement sticks to the bull. That is, we can write

|=  *
|=  nos=(list ,@)
=*  cud  nos
?>  ?=(^ cud)
^-  bowl  :_  ~  :_  ~  :-  %$  !>
(snag 2 nos)

and it works just fine, because nos remains (list ,@). However, we are back to type-fail with:

|=  *
|=  nos=(list ,@)
=*  cud  nos
?>  ?=(^ cud)
^-  bowl  :_  ~  :_  ~  :-  %$  !>
(snag 2 cud)

In many complex computations, because the type system isn't perfect, we often want access to both original and refined types. By interrogating a %bull, we get exactly that. It's straightforward to make the alias carry the refined type, and the underlying wing the original type - another subtle design touch that chimes smoothly with Hoon's rich Corinthian leather.

Another convenient convenience

Another common complaint about classic Hoon is that you very often want to fish in something that isn't a wing. For instance, put this in as try/bin/fish.hoon:

!:
|=  *
|=  [x=@ ~]
^-  bowl  :_  ~  :_  ~  :-  %$  !>
=+  foo=(mod x 2)
?+  foo  !!
  0  "even"
  1  "odd"
==

~waclux-tomwyc/try=> :fish 3
"odd"
~waclux-tomwyc/try=> :fish 4
"even"

We need the ?+ and !! because the type system, not being infinitely smart or anywhere close, cannot show that foo is limited to 0 and 1. Try it with ?-:

!:
|=  *
|=  [x=@ ~]
^-  bowl  :_  ~  :_  ~  :-  %$  !>
=+  foo=(mod x 2)
?-  foo
  0  "even"
  1  "odd"
==

! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[2 1].[9 3]>
! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[3 1].[9 3]>
! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[4 1].[9 3]>
! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[4 11].[9 3]>
! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[4 18].[9 3]>
! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[4 25].[9 3]>
! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[4 33].[9 3]>
! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[5 1].[9 3]>
! /~waclux-tomwyc/try/~2014.2.12..00.06.42..8e4a/bin/fish:<[6 1].[9 3]>
! -lost.@
! mint-lost
! exit

The compiler is telling us that the rest of @ is unhandled in our ?-.

But what we can't do is what we'd actually like to do:

!:
|=  *
|=  [x=@ ~]
^-  bowl  :_  ~  :_  ~  :-  %$  !>
?+  (mod x 2)  !!
  0  "even"
  1  "odd"
==

We know that we can't do this, becase we see the type:

++  twig  $%  [%wtls p=wing q=twig r=tine]
          ==

and, obviously, (mod x 2) is a twig, not a wing. But let's try it anyway:

!:
|=  *
|=  [x=@ ~]
^-  bowl  :_  ~  :_  ~  :-  %$  !>
?+  (mod x 2)  !!
  0  "even"
  1  "odd"
==

~zod/try=> :fish 4
"even"

WTF? It works?

It turns out that when we parse ?+ - and ?-, ?@, ?^ and ?~ - we produce not an ordinary twig matching the hoon, but a bizarre precursor:

++  twig  $%  [%wthp p=wing q=tine]
              [%wthz p=tiki q=tine]
              [%wtkt p=wing q=twig r=twig]
              [%wtkz p=tiki q=twig r=twig]
              [%wtls p=wing q=twig r=tine]
              [%wtlz p=tiki q=twig r=tine]
              [%wtpt p=wing q=twig r=twig]
              [%wtpz p=tiki q=twig r=twig]
              [%wtsg p=wing q=twig r=twig]
              [%wtsz p=tiki q=twig r=twig]
          ==
++  tiki  $%  [& p=(unit term) q=wing]
              [| p=(unit term) q=twig]
          ==

With the uncommon power of ++tiki, our parser figures out if you're using these pattern tests to fish for (a) a nameless wing,
(b) a named wing, (c) a nameless twig, (d) a named twig.

For (a), wtlz is equivalent to wtls. For (b), we automagically insert a =*:

!:
|=  *
|=  [x=@ ~]
^-  bowl  :_  ~  :_  ~  :-  %$  !>
=+  moo=(mod x 2)
?+  foo=moo  !!
  0  ["even" foo]
  1  ["odd" foo]
==

~zod/try=> :fish 4 
["even" %0]
~zod/try=> :fish 5
["odd" %1]

(c) we've seen already; and (d) inserts a =+:

!:
|=  *
|=  [x=@ ~]
^-  bowl  :_  ~  :_  ~  :-  %$  !>
?+  foo=(mod x 2)  !!
  0  ["even" foo]
  1  ["odd" foo]
==

~zod/try=> :fish 5
["odd" %1]
~zod/try=> :fish 4
["even" %0]

To call this feature essential would be going too far. But it's often quite convenient.

Vulcanization

As you're probably aware by now, generic polymorphism is funky. It spares us from having a whole monstrous abstraction layer in typeclasses. That's cool. But still, it's funky.

One of the funky questions that's perhaps occurred to you is: how do argument names work in a wet (generic) gate?

Consider a normal dry gate, like |=([a=* b=*] [b a]). When you slam this gate, %.([%foo %bar] |=([a=* b=*] [b a])), you know what the compiler does - it replaces [a=* b=*] with [%foo %bar], checks that [%foo %bar] nests inside [a=* b=*], and executes the formula compiled against [a=* b=*]. Losing all your type information:

~zod/try=> %.([%foo %bar] |=([a=* b=*] [b a]))
[7.496.034 7.303.014]

But with |*([a=* b=*] [b a]), we get:

~zod/try=> %.([%foo %bar] |*([a=* b=*] [b a]))
[%bar %foo]

The types persist. Okay, we know that. We also know that, to put it crudely, |* works by recompiling the twig with the default sample replaced by the caller's sample, then checking that this produces the same code as originally compiled.

But - this leaves us compiling

|*([%foo %bar] [b a])

How do the limbs b and a get pulled? The default sample had these names, but the caller's sample doesn't.

The answer is simple - |* doesn't work the way you think it does. |* is synthetic, of course; and you might think it resolved to

=+  [a=* b=*]
|%
+-  %$
  [b a]
--

just as |=([a=* b=*] [b a]) is

=+  [a=* b=*]
|%
++  %$
  [b a]
--

But no! If you try the top twig as a replacement for |*, you'll find it doesn't work at all, unless you slam it not with [%foo %bar], but rather with [a=%foo b=%bar].

The solution is vulcanization. |* becomes |\, which gets a funky little thing done to it:

++  open
  ^-  twig
  ?-    gen
      [%brfs *]  =+  lyg=~(cloq al p.gen)
                 :+  %brcb  q.lyg
                 %-  ~(run by q.gen)
                 |=  a=foot  ^-  foot
                 ?.  ?=(%elm -.a)  a
                 :-  -.a
                 :+  %tsgl  p.a
                 :+  %cnts  ~
                 :~  [[[%& 6] ~] [%ktts p.lyg [~ 6]]]
                 ==
  == 

Huh? The long and the short of it is that, thanks to this little piece of wizardry, |* in our example instead generates:

=+  [* *]
|%
+-  %$
  =>  .(+< [a b]=+<)
  [b a]
--

Ie, we pull the names out of the default sample and reimpose them in the actual arm. This is vulcanization. If you don't like it, don't use |* or |\, and feel free to roll your own!

Naming conventions

There are three tasks in translating an algorithm into Hoon.

First, catch your algorithm.

Second, arrange it as a Hoon twig, in a way that looks good on the page. Hoon arrangement, because it requires adjusting both the runes and the formatting, becomes a sort of minor art form in a pleasant way, perhaps at the level of knitting. Perhaps an editor could be taught to do it, but not necessarily do it well. And do we really need more robots taking jobs from programmers?

Third, decide what to name things. We actually have three styles of naming, corresponding to durability of code.

Hoon's theory is that durability of code should correspond to austerity of naming. The more austere and impersonal its names, the more like a specification your twig looks. It seems written in stone - and indeed it is in another sense. The more austere your names are, the harder it is to change them. As you know, if you've ever tried to change a one-letter variable name.

The ideal of austere code is that anyone who tried to rewrite it, in the same austere style, would produce exactly the same bytes. There is nothing in the code but what it does. But an austere style, applied to code not itself austere, is a disaster.

There are three naming conventions: hyperlapidary, lapidary, and freehand.

In hyperlapidary naming (H-Hoon), variables and arguments are named alphabetically with one letter, a, b, c etc, in strict order of appearance in the text. This scheme is only useful in the case of extremely regular and straightforward namespaces: very short functions, for instance.

Hyperlapidary arms must be gates or trays. Gate arms are three letters and try to carry some mnemonic significance - for instance, ++dec. Tray arms are two letters and try to resemble pronouns - for instance, ++by.

Hyperlapidary structures must be short tuples, no wider than 5. The legs are named p, q, r, s and/or t.

Conventional recursive structures use other standard names. The head of a list is always i, the tail is always t. In a binary tree of nodes, the node is n, the children l and r.

When in doubt, do not use H-mode. In an ordinary context - not least because hyperlapidary gates are easily mistaken for lapidary variables - there should be as few hyperlapidary arms as possible. And always remind yourself that H-mode makes it as hard as possible to refactor your code.

Lapidary mode (L-Hoon) is the ordinary style of most of Hoon and Arvo. In lapidary mode, variables, arguments, attributes, etc, are three-letter strings, usually consonant-vowel-consonant, generally meaningless. If the same string is used more than once in the same file, it should be used for the same concept in some sense, as often happens spontaneously in cutting and pasting. It would be nice to have an editor with a macro that generated random unique TLV strings automatically.

Lapidary arms are always four letters. They may or may not be English words, which may or may not mean anything relevant.

In freehand mode (F-Hoon), do whatever the heck you want. Note that while uppercase is not permitted in a symbol, - is, suggesting a generally Lisp-like state of gross hyphenated disorder. F-mode is best used for top-layer software which nothing else is based on; prototyping and casual coding; etc.