How To Build a (Janky) Web Browser
2018-2-23
Ever wondered how web browsers work internally? I have, so I wrote my own hilariously janky one in Go, which can load and render a small SVG-like dialect of XML, demo'd above. It doesn’t support scripting or CSS (yet…).
Source here: https://github.com/vilterp/janky-browser (Currently ~1k lines of Go)
It’s uses nothing but the Go standard library and Pixel, a library which provides windowing, a low-level drawing API, and mouse and keyboard events.
Why make a browser?
In short, if you use or develop for the browser, it might help you understand your work better. The best way to understand how something works is to implement your own (simplified) version.
More broadly, as programmers, we’re continually using and creating abstractions. These can range in scale from small (a function which wraps up a few lines of code) to large (an operating system, a database engine, a browser, a web API), to enormous (AWS services, Google Search).
Understanding how these abstractions work allows us to use them effectively. More importantly, once an abstraction is no longer magic, we can imagine alternate versions of it which have different boundaries or exist in different contexts. (We might also need to rebuild the abstraction after an apocalypse in which all source code is lost!) For instance, GUIs have been around since Xerox PARC, but have evolved into many different forms: from desktop to browser, smartphone, and recently VR and in-room.
Browsers are one of the most important abstractions many developers use and program against every day, which makes them especially important to understand deeply. Also, the debate about how exactly to organize UI code seems to be never-ending. (React? Vue? Elm? Redux). Going below the browser abstraction can give us another perspective on this debate. For instance, we can see the similarities between browsers and game engines like Unity more clearly.
My Story
I got my start by programming in Flash, and then for the web, both server and client side. In college I took classes in which I implemented substantial parts of a TCP stack, a database engine, and an operating system. I also got a handle on how programming languages work by implementing a couple interpreters and a compiler.
However, despite having gotten my start in interactive graphics programming, I still didn’t feel I had a solid enough grasp on how the abstractions I worked with in those environments (the DOM tree, etc) were implemented. That nagging curiosity has finally gotten the better of me, and here I am!
This project is my attempt to satisfy my own curiosity; I hope it will be useful to others who are curious. I believe that a project like this deserves to be part of college-level CS curricula alongside networks, databases, and OSs, since browsers are just as essential to modern computer use.
How much of a real browser is it?
This project will never approach parity with real browsers — they’re some of the largest and most insanely complicated codebases out there! Its aim is to be a minimal working example of the basic structure of a browser: what the pieces are, and how they fit together.
The project currently makes these simplifications:
- Render SVG, not HTML, so we don’t have to implement a layout algorithm.
- Don’t comply to the SVG spec exactly, so we don’t have to fight the parser too much or worry about details.
Ok, so how do you build a browser?
The Git history is pretty messy, but you can roughly correlate it with these steps.
Step 0: Choose PL and graphics library
I settled on Go out of familiarity (I use it at work), and because it’s easy to program in (garbage collected) while being fairly close to the OS (compiles to machine code, has libraries with C bindings). A similar exercise could be done in C, Python, or some other language and have much of the same pedagogical value.
Pixel was the first library I found for Go which allowed me to pop up a window and draw graphics. It’s based on GLFW, a C library which has bindings for a lot of other languages.
Step 1: Get a window to draw on
Goal: Run program; see window with a simple shape on it.
The Pixel library allows us to write a Go program which pops up a window, containing nothing but an empty canvas we can draw on. I have only tested it on a Mac, but it should work wherever OpenGL is supported.
The mechanics of how programs interact with the OS to pop up a window and draw to it are fairly complicated, and could be their own blog post. In short, to pop up a window, our Go program must contact the OS (specifically the window server process) and ask it for a window. Our program then draws to a graphics buffer, and coordinates with the window server to draw that buffer to the screen. I believe this is usually done with shared memory.
Step 2: Define DOM types; render from them
Goal: Define DOM types and render graphics to the window from a hardcoded example.
The central data structure of a browser is the DOM. HTML is parsed into it, and then the browser renders from it, re-rendering when anything changes. We’ll now define our own DOM by defining some Go types. We’ll then instantiate an example DOM and render it.
Step 3: Parse DOM from XML
Goal: Parse a hardcoded string of XML into our DOM structure and render it.
I did this using Go’s encoding/xml library, which was a little awkward but gets the job done.
Step 4: Load XML over the network
Goal: Load XML from a hardcoded URL, parse it, and render it.
We simply import net/http from the Go standard library and call it, updating our browser’s state along the way.
Step 5: Render browser chrome
Goal: See a back button, the current URL, a loading indicator, and error text.
Use the DOM elements we have defined to render a basic UI for our browser: a back button, the current URL, a loading state indicator, and error text.
Using DOM elements to define browser chrome is a little meta, like the Mozilla technology formerly used in Firefox XUL. It’s convenient here.
Step 6: Implement clickable links!
Goal: When a DOM element with an href attribute is clicked, go to the link.
Of course, any browser has to be able to follow links. The main challenge here is figuring out what was clicked on (this algorithm is usually called “picking”); once we know that, we just load and render that URL as before.
Step 7: Make URL bar editable
Goal: Be able to type in a URL, hit enter, and go to it.
Clicking links is nice, but we should be able to type in a URL and go to it. Since Pixel doesn’t provide a text input widget for us to use, we’ll have to roll our own! This involves rendering the text, background rectangle, cursor, and selection, and responding to keyboard events.
Currently, you focus JankyBrowser’s URL bar by hitting Cmd+L; it’s not clickable.
Known Issues
- The coordinate system provided by Pixel is a little crazy: (0, 0) seems to be in the bottom-left corner of the window. Changing this would make the code substantially simpler.
<g>
s don’t respect the ordering of their children, due to XML parser weirdness.- It uses 15% when idle, since it constantly draws the scene at 60FPS, whether anything has changed or not.
- There are barely any unit tests.
Next Steps
Embed a JavaScript interpreter: I’ve written a language interpreter before, but never embedded it in a context like this. This will involve importing a JS engine (probably Otto, since it’s written in pure Go and has a nice API), figuring out how to expose the Go DOM structs to JS, and figuring out how to fit JS execution in with JankyBrowser’s rendering and event handling loop, including a basic event bubbling system.
Maybe: HTML and layout. JankyBrowser currently goes straight from the DOM tree to the canvas; HTML adds another layer of complexity: the DOM tree must be laid out to a SVG-like “render tree”, in which everything is absolutely positioned and can be rendered. Implementing this might help me finally understand tricky HTML/CSS layout concepts like how to center things.
Maybe: Devtools. Web development would be practically impossible without the ability to introspect the DOM tree and see how it maps onto what shows up on the screen. It would be fun to implement this basic functionality, and maybe also show JS console.log output.