“Create Your Own Compiler” is an interactive tutorial that shows step by step how to write your own simple compiler that transforms JavaScript into Lisp. Along with that, we take a look at what a compiler actually is and the state of the art that is Roslyn.


Compilers are important, but most people go day by day using their favorite programming language and tools without
thinking too much about them, ignoring what happens under the covers.

However peeking into that black box and learning to write a compiler gives you super powers. It will allow you to write custom tools, min languages/DSLs, make your own fully fledged language, or as in “Create Your Own Compiler”, transform one language to another!

A prime example of why the latter, in other words transpilation, has proved indispensable is the case of Babel. Since not all browsers are able to cope with all the latest Javascript language features, Babel translates the newest Javascript code into backwards compatible versions of JavaScript in current and older browsers or environments.

Yet another example is the case of Typescript which adds optional typing (on that matter make sure to also check Sorbet – Making Ruby Statically Typed) to Javascript, acting as a statically typed and better superset of it. The TypeScript compiler analyzes and compiles the TypeScript code into JavaScript in order to run on any browser. Since the VM engine that runs Javascript is there, why not reuse it instead of building one from scratch to support our own language? It’s easier to convert!

Fable, is yet another X-to-Javascript transpiler. Fable transpiles F# to ES2015 JavaScript so code written in F# can run anywhere JavaScript runs – the browser, Node.js, Electron,React Native or generally V8.

But a compiler’s most popular application is for programs to translate from a higher language to a lower-level language in order to create an executable program; see C.

Each compiler works by executing several well defined phases, each phase taking the input of the previous one until it finally produces runnable code.

The first phase is tokenization by the part called the lexer. It takes a stream of characters and uses regexes groups them according to the language syntax into what’s called the tokens – keywords functions, operators, etc.

The next phase is parsing. The parser takes the stream of tokens made by the lexer and represents them in a structure, the abstract syntax tree, something much easier to work with.

The next phase is the semantic analysis where the compiler considers the language’s syntax constraints and the data types. It makes sure that the code is well-formed and well-typed.

The next phase is to optimize the AST – eliminating dead code using techniques like Tree shaking for example. The result of this phase is the Intermediate Representation or IR. IR does itself undergo optimizations specific to the target CPU architecture to produce machine code.

The last step is to produce a standalone executable (runtimes that work with bytecode like the JVM, work with IR instead of creating an executable), something usual in C programming but with the new tools now available, even high level languages ​​like Java, under GraalVM, can compile to native executables.

The above list is simplified of course but in general the steps you have to take in order to take an input source and transform it to the desired output are

  • Lexing
  • Parsing
  • Building up an Abstract Syntax Tree (AST)
  • Generating IR code for the given AST
  • Optimizations on the generated IR code
  • Generate machine code

Add to those the steps of defining the syntax of your new programming language, if you want to go that way.

The “Create Your Own Compiler” playground makes that complicated process easy to go through. Actually is an annotated walkthrough of Jamie Kyle’s “The Super Tiny Compiler”, a simple compiler written in Javascript. The goal of the tutorial is to compile a Lisp statement into Javascript. Along the way we go through the different stages of Lexical Analysis, Syntactic Analysis, Transformation, and Code Generation.

Each stage is broken into multiple steps and each step comes with the annotated code interactively. It’s a great way to get your feet wet and to grasp the bare concepts.

The other, post-modern way of building compilers is by going the Roslyn way. Write a compiler for the language in that language? Microsoft has done that with the state of the art compiler platform, Roslyn.

As for the question of what Roslyn actually is, what is better than getting an authoritative answer than by a member of the Roslyn team, the renowned C# Guru himself, Eric Lippert? The opportunity came about in the form of an interview that he gave us back in 2014:

NV: Roslyn’s official definition states that it is a “project to fully rewrite the Visual Basic and C# compilers and language services in their own respective managed code language; Visual Basic is being rewritten in Visual Basic and C# is being rewritten in C#.”
How is C# being rewritten in C# ?

EL: When I was at Microsoft I saw so many people write their own little C# parsers or IDEs or little mini compilers or whatever, for their own purposes. That’s very difficult, it’s time-consuming, it’s expensive, and it’s almost impossible to do right. Roslyn changes all that, by giving everyone a library of analysis tools for C# and VB which is correct, very fast, and designed specifically to make tool builder’s lives better. I’m so excited that it’s almost done! I worked on it for many years and can’t wait to get my hands on the released version.

Click on this link to read the rest of Eric’s comments.

Building your compiler using Roslyn gives you distinct advantages:

  • Massive performance improvement and built-in mechanism for handling dynamic objects. Crucial functionality for code emitting, parsing assemblies and the structure of the compiler itself that results in assemblies portability and the possibility of integrating it with tools available only for C# (code analysis, VS extensions).
  • Cross platform capability since Roslyn produces portable class libraries compatible with Mono and the . NET Core.
  • Visual studio integration and other functionality including code colourization, syntax highlighting and IntelliSense.

Magic!

More Information

Create your own compiler

Related Articles

C# Guru – An Interview With Eric Lippert

Fable – Write Front-End Apps For The Web In F#

Sorbet – Making Ruby Statically Typed

How To Create Pragmatic, Lightweight Languages

Take Cornell’s CS 6120 Advanced Compilers For Free

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banners


picobook



Comments

or email your comment to: [email protected]

Leave a Reply

Your email address will not be published. Required fields are marked *

Explore More

You Can Take a Rust Programming Course for Free

June 27, 2023 0 Comments 5 tags

To say that Rust is the programming language of the future, as many people in the industry have claimed, is not an overstatement. Rust has been rapidly growing in popularity

Real estate Wikipedia

December 2, 2023 0 Comments 2 tags

See List of house types for a complete listing of housing types and layouts, real estate trends for shifts in the market, and house or home for more general information.

Welcome To Real Estate Commission

September 14, 2023 0 Comments 3 tags

Built on centuries of tradition and dedicated to innovating the luxurious real estate trade, Sotheby’s International Realty provides transformative experiences by way of a global network of outstanding brokers. Ongoing