2019-05-25: Introing trnx, a self-modifying lang

I'm trying to get back into the habit of doing & making, after some years of accumulated fear around this. So this will be much rougher than my usual efforts. Read at your peril.

I read somewhere, maybe it was an ESR essay, that a programmer ought to learn lisp not because they will ever use it, but because it will change the way they think. This idea has stuck with me ever since, and been proven true over many years.

Every new language I spend a time with feels like falling in love — it changes you, sometimes a little and sometimes a lot. I can't think of better advice for learning functional programming than "learn Haskell", and Java surprised me by how much I learned from it about OO and refactoring.

So I started to wonder — if these (more or less) pragmatic real-world languages have things to teach me, what might I have to learn from weirder examples? The language might not even have to exist or have an interpreter to be interesting.

So I've decided to create and learn a language that makes heavy use of self-modification. I'll start by writing a few simple programs to see how it feels, and then write an interpreter, and see what I learn.

I'll call it Trxns.

I decided to start with two constraints to keep me honest:

  1. No state outside the code
  2. It should be visualizable (which should be easy given constraint 1)

Hello world

1
print("Hello, world!")
append(#2, #3)

Of course this had to come first, but I wanted also to think about loops.

On line 1 is our first oddity. The most taken-for-granted piece of program state is the line of code that gets executed next — also known as the instruction pointer. If you've not learned much about processors you might not even think about this as state. But since it is, we have to put it in the code and the interpreter will have to look at the first line of the program to decide what line to execute next. Our interpreter will increment it after each execution cycle.

If you've watched enough novice programmers try to think about programs you'll know that reasoning about execution order can be tricky. Hopefully Trxns will keep it fairly simple, and in any case it will be visible.

Line 2 should be fairly self-explanatory. Each line should be a statement — a command that optionally changes the state of the program. I've used () for statements here which I think might be a bit naughty of me because I suppose parentheses reference lambda calculus which should refer to functions, which have return values and therefore are expressions not statements? But I'm just a fucking badass so I'm doing it anyway.

It occurs to me that actually line 2 secretly introduces some state! Standard out, AKA the terminal! If I were a true Haskeller I would have recognised that printing is a side-effect but I guess I'm not. Might think more about this.

append on line 3 is our first self-modifying instruction. It seems to take a variable number of arguments referring to lines, which it then appends to the end of the program, so stepping through we get:

1
print("Hello, world!")
append(#2, #3)
2
print("Hello, world!") // now printed someplace
append(#2, #3)
3
print("Hello, world!")
append(#2, #3)
print("Hello, world!")
append(#2, #3)

And so on. An infinite loop of Hello, world!s!

I definitely feel less sure about having standard out as a side-effect now. Given that this whole program is prompting recollections to assembler, we might pick an assembler-like solution — designating a range of lines as 'standard out' and write stuff there. Awkward yes, but instructive.

Fizzbuzz

next: 5
counter: 1
condition:
write_if(#counter % 3 == 0 && #counter % 5 == 0, next, print_fizzbuzz)
write_if(#counter % 3 == 0, next, print_fizz)
write_if(#counter % 5 == 0, next, print_buzz)
print(#counter)
write(counter, #counter + 1)
write(next, condition)
print_fizzbuzz:
print("FizzBuzz")
write(next, condition)
print_fizz:
print("Fizz")
write(next, condition)
print_buzz:
print("Buzz")
write(next, condition)

I wanted to play around with conditionals and so fizzbuzz was the natural choice. I made a few interesting decisions here that I'll probably row back on.

The obvious change here is the addition of labels. When I was learning assembler I sort of figured labels defeated my purposes, as they weren't ~*~ genuine machine code ~*~ and got me further away from the computer I was trying to learn about.

So it was interesting to observe myself reinventing them! To illustrate, let's consider this program without labels:

4
1
write_if(#2 % 3 == 0 && #2 % 5 == 0, 1, 11)
write_if(#2 % 3 == 0, 1, 14)
write_if(#2 % 5 == 0, 1, 17)
print(#2)
write(2, #2 + 1)
write(1, 4)
print("FizzBuzz")
write(next, 4)
print("Fizz")
write(next, 4)
print("Buzz")
write(next, 4)

Rather unpleasant to read — but it gets worse. If I want to add some code anywhere other than the end — say to add some logic to the first chunk, I need to change all of the line references that refer to lines after wherever I added a line! Truly impractical for any programmer.

Hence, she discovers, one of the reasons assemblers are around — so that you can have labels that the assembler then translates to the right numbers.

I am already learning something from my made-up language, which feels very fun.

You may be able to spot a few control flow concepts you're familiar with reduced to their most basic form. To spot them, it might be instructive to consider the historical lineage of the if statement:

  1. Writing directly to the instruction pointer (line 1) allows us to change which instruction is going to execute next, meaning we can jump to a different line in the program.
  2. This was codified into the JMP (jump) instruction in assembly, also known as goto in higher level languages (yes, goto of goto considered harmful fame!)
  3. Along with this came the conditional jump, e.g. JE which would Jump if something was Equal to something else, and otherwise just let the program continue.
  4. And from there it's a short leap to the if statement, as you may be able to see above.

You can also spot subroutines, and you might even get an idea of why return might be named that way!

However, we do have a big problem here. Labels cannot actually meet my constraints and still be useful! Let's consider this variation on our hello world program:

next: 3
loop:
write(#next, print)
insert_after(#next, #loop + 1, #loop + 2)
print:
print("Hello, world!")
write(#next, loop)

In this program we start by running a loop, which jumps to a print subroutine, which then loops by duplicating the body of the loop after the current instruction.

The problem: how do we implement the labels?

We could extract all the labels at the start of the program and then say that #print is a constant that means 7, but then when we add those lines to the end of the loop we're going to mess it up by meaning line 7 isn't actually line 7 anymore. It also entails either state (the table is kept somewhere) or difficulty-to-visualize (the interpreter does a global-find and replace which removes the utility of the label names).

Or we could consider the labels to be dynamic, and they are looked up in the code whenever referenced. Actually now I think of it that doesn't seem like an obviously bad idea, except for the performance implications, we'll see how that works out...

I'm starting to get the idea that macros will be a big part of my explorations with this language. Since this language is defined largely by self-modification, they seem like a natural technique. You can sort of see how ifs might be implemented. I've not played around with macros much so that will be fun!

Next I think I'm going to build a basic interpreter.