This is part two of a series on WebGL and creating an iris indicator from scratch. Read the first article here.
In this article, I will talk about setting up a new project to use WebGL. This includes state management, and an initial understanding of the OpenGL rendering pipeline. Next week, I will finally talk about actually drawing things.
Primer to WebGL
With the background out of the way, I sat down to create my contraption from hell. What I already knew going into this project is that I would be needing an HTML canvas element, some WebGL context, and a lot of circle math. I also knew that WebGL is essentially just a browser-based version of OpenGL, so I knew I’d be dealing with shaders. But there were still a lot of unknowns that I will introduce along the way.
Let’s first start with the basics. Displaying things on a website is today among the easiest things to do. Many children1 could code up a very simple HTML website that can be opened in a browser. Once they add some image, they are already drawing something (even though most of the work is being done by the browser). Displaying rendered graphics is comparable to displaying an image on a website, but with the added complexity of you having to create the image first.2
But how do you actually render something? Well, for that you need a graphics rendering engine. No surprise here. Today, there are roughly three big players in this game: Microsoft has DirectX, which effectively is what almost all games use to deliver next-gen photorealism yadda-yadda. Because DirectX is proprietary Windows-stuff, there is an alternative called Vulcan that can also render things, but which is supported also on Linux. Since Apple introduced its ARM-chips (“Apple Silicon”) back in 2020, they have also invested heavily in an Apple-only solution called “Metal.” And, finally, there is also OpenGL, the “Open Graphics Library.” OpenGL is almost as old as DirectX and was started to ensure that there’s always an open alternative to DirectX.
A little more than a decade ago, the big browser vendors decided that they also wanted to enable web developers to do some graphics rendering on their websites. Was this a smart decision? I don’t know, but the end result is what we know as WebGL: A JavaScript implementation of OpenGL so that we can render silly little boxes in our favorite data-stealing browser. Sometime in 2019, it received its version 2.0 which included support for newer OpenGL versions. And ever since, when you wanted to do some fancy 3D-rendering in the browser, you’d be using WebGL.
Using WebGL is actually quite simple: You first have to create a canvas:
<canvas id="webgl"></canvas>
Then, you access it in JavaScript, and retrieve the WebGL context from it:
const canvas = document.querySelector('#webgl')
const gl = canvas.getContext("webgl2") // Or "webgl" if you want the old API.
Now you can use this gl object to draw something! So simple!
… is what I would say if it was. But initializing WebGL is actually the simplest and most straight-forward step of them all. And at this point you haven’t even thought about writing something on the canvas.
Setting up WebGL
Once you have the WebGL context, the fun just starts. Now you have to configure it, set everything up, and perform quite a lot of work before you can even draw anything. However … what should you even configure? Because I had no prior experience with WebGL or OpenGL at all, I heavily consulted the websites WebGLFundamentals.org, and later on WebGL2Fundamentals.org. For the bloom filter and MSAA I consulted LearnOpenGL.com. Big kudos to all contributors to these pages, because without those comprehensive “From zero to hero”-guides, I would have taken much, much longer to make my dream a reality.
So, how should we start? Before setting up WebGL directly, we need to think about the project structure more generally. Because there is a lot of work to be performed, and you will need to do quite a bit of state management to achieve the desired result. So before writing any line of code, let’s talk a bit about the fundamentals of any dynamic rendering engine, and, by extension, how to organize the different parts.
State Management
Let’s start with understanding how to actually render something to the screen, and let’s use a video game analogy because that is easier to understand. In every video game, you have two things at the same time: a state of your game world, and the rendering pipeline. For example, in Civilization, you have the current camera position, a zoom level, and you have the various cities and units of the players. This is the game state, and this is firstly independent of the rendering state. It only tells us where things are, but not yet how to display them. In a first-person shooter, you’d likewise have a camera position (which equals the player position), and then you have the same for a bunch of additional players, you have the positions of all the objects on the map, and so on.
All of this is game state, because it has an influence on the game itself. How it’s rendered is a question detached from this. Whether, say, an AI player defeats some unit of yours in civilization is not a question the renderer has to handle. The renderer should just display that state. The renderer of course also has some state, but that only includes settings such as the rendering resolution, texture size, etc.
Now, what does this have to do with rendering a simple indicator? Quite a lot, it turns out. Whenever you have any form of dynamic rendering, you’ll want to maintain actually two states: One for whatever it is you are rendering, and then one for the actual rendering process. This principle applies to both entire games, and simple animations like the indicator we want to implement. How do I know? Well, because at first I thought “Hah, it’s just a simple animation, how difficult can it be?,” and as you already know, I was very, very wrong.
So, we first create a class to manage the non-rendering state, and second a class for all the WebGL:
class WebGLEngine {
constructor (private readonly gl: WebGL2RenderingContext) {}
}
class IrisIndicator {
private engine: WebGLEngine
constructor (gl: WebGL2RenderingContext) {
this.engine = new WebGLEngine(gl)
}
}
Note that I’ll be using TypeScript throughout the article series. I did start writing everything in JavaScript (because I don’t like having to set up a build pipeline for small projects), but it really turned out to be a big mistake as I was writing the code.
This looks like a very simple setup. We have one class, IrisIndicator that will contain all the code for managing our state, and then a class WebGLEngine, where I want to centralize all the nitty-gritty of the WebGL code. During rendering, the indicator will derive some data from its state necessary to produce a frame to the engine, which in turn maintains whatever objects and buffers and what not else to actually do the rendering.
One thing to note is that there is an explicit hierarchy between these two classes: Each IrisIndicator contains one WebGLEngine. The reason is that the WebGL engine should simply draw whatever state the indicator has, but never vice versa. The rendering engine is thus dependent on whatever the Iris indicator contains. By encapsulating the renderer inside the indicator, we ensure that anyone actually instantiating an indicator will, in the future, only have to tell it how many segments to render. Everything will be abstracted away in the rendering logic that we are going to write across this series.
Further, note that all of what I am presenting you in these articles is the end result. I’m skipping over all my unlucky experiments and bad decisions.
Setting up WebGL
With this first architectural decision out of the way, we can do the minimal setup we need for WebGL. And that minimal setup is actually quite… minimal. There is only one requirement for any WebGL rendering process: It requires a program, and such a program consists of one vertex shader and one fragment shader. That’s it. Everything else is not necessary (at least not if you don’t want to produce any output).
So let’s do so.
First, we need a vertex shader. An (almost) minimal version of such a vertex shader could look like this:
#version 300 es
in vec2 a_position;
uniform vec2 u_resolution;
out vec2 v_texcoord;
void main () {
vec2 normalized = transformed / u_resolution;
vec2 scaled = normalized * 2.0;
vec2 centered = scaled - 1.0;
vec2 clipSpacePx = centered * vec2(1, -1); // Flip y-coordinates
gl_Position = vec4(clipSpacePx, 0, 1);
v_texcoord = clipSpacePx * 0.5 + 0.5;
}
Next, a fragment shader. A very minimal version of that can look like this:
#version 300 es
precision highp float;
in vec2 v_texcoord;
uniform sampler2D u_texture;
out vec4 fragColor;
void main () {
fragColor = texture(u_texture, v_texcoord);
}
Finally, we have to, quite literally, compile these two shaders onto the GPU. This can be done in a few steps:
function compileShader (gl: WebGL2RenderingContext, type: 'vertex'|'fragment', source: string): WebGLShader {
const shader = gl.createShader(type === 'vertex' ? gl.VERTEX_SHADER : gl.FRAGMENT_SHADER)
if (shader === null) {
throw new Error('Could not create shader from WebGL Context!')
}
gl.shaderSource(shader, source)
gl.compileShader(shader)
const success = gl.getShaderParameter(shader, gl.COMPILE_STATUS)
if (success) {
return shader
}
const msg = `Error compiling "${type}" shader: ${gl.getShaderInfoLog(shader)}`
gl.deleteShader(shader)
throw new Error(msg)
}
This is a utility function that I’ve adapted from WebGLFundamentals and essentially, what this does is compile one of the two shaders and returns it. Some things to note:
- You can only create two types of shaders, vertex, and fragment shaders. I’m passing in a string literal for that, just because that is easier to handle. WebGL inherits from OpenGL its extensive use of flags and constants, and the types that TypeScript provides for WebGL are a bit lacking. You could absolutely pass the type directly.
- Next, the operations with WebGL can sometimes seem a bit redundant. Why can’t we directly create a fully compiled shader by calling a single function, passing it both the type and source code? Well, because you can actually provide new source code on the fly to a shader and then re-compile it. Do we need this with such a simple application? Absolutely not. But as soon as we enter complex game territory, this ability of WebGL to re-use shader objects may come in handy. I don’t know because I’m not keen on developing an entire engine.
- Because WebGL heavily inherits from OpenGL, it doesn’t make use of the JavaScript way™ to throw errors. Rather, you’ll have to implement modern error handling yourself by calling a function that checks some result, and throw an error yourself. Also, you will have to call
getShaderInfoLogbefore you delete the shader, because otherwise the error will also get deleted, so you can’t throw an error in a single line.
Because there is a bit of logic involved, it makes sense to create a dedicated utility function for that. Once we have the shaders at hand, we can create a program out of them. This looks almost identical to the shader compiling, with the small difference that, while you compile a shader, you link a program. That’s simply some system programming terminology and not terribly relevant for us here. Again, the function is courtesy of WebGLFundamentals and I adapted it slightly.
function compileProgram (gl: WebGL2RenderingContext, vertexShader: WebGLShader, fragmentShader: WebGLShader): WebGLProgram {
const program = gl.createProgram()
gl.attachShader(program, vertexShader)
gl.attachShader(program, fragmentShader)
gl.linkProgram(program)
const success = gl.getProgramParameter(program, gl.LINK_STATUS)
if (success) {
return program
}
const msg = `Could not link program: ${gl.getProgramInfoLog(program)}`
gl.deleteProgram(program)
throw new Error(msg)
}
Now, we can save that program for reference, and tell WebGL to use this little program of ours:
const vertexShader = compileShader(this.gl, 'vertex', vertexShaderSource)
const fragmentShader = compileShader(this.gl, 'fragment', fragmentShaderSource)
this.program = compileProgram(this.gl, vertexShader, fragmentShader)
gl.useProgram(this.program)
And that’s it! At this point, we have told WebGL that, whenever we want to draw something, it should use our program, which consists of our two shaders. However, as you may see now, this is still quite a lot of code, and we still haven’t drawn anything onto the screen. Also, these shaders aren’t written in JavaScript (or TypeScript, for that matter), but rather in GLSL, the OpenGL Shader Language. And they do quite a bit of the work. But, what is even more detrimental for an easy understanding of what they do is that OpenGL also does quite a lot of work in between.
So next, let’s talk a bit about the rendering pipeline and what happens when you actually draw something onto the screen. This knowledge will come in handy as we continue.
Understanding OpenGL’s Rendering Pipeline, Part One
In order to draw something, the following steps have to happen:
- Your engine needs to calculate the positions of all your objects in the 3D-world. Then, it passes those positions to the rendering engine.
- The rendering engine then passes these positions to OpenGL, and tells it to render these objects.
- OpenGL then calls the vertex shader for each element in the positions that you have passed. The vertex shader is responsible for taking the position of a thing in the world, and transform it into what is known as “Clip space” (which is essentially a coordinate system that ranges from $-1$ to $+1$ in the $x$ and $y$ direction). The vertex shader must return these clip positions. What this shader does is perform a relatively simple, z-transform-style operation: It expects absolute pixel positions, transforms them into the domain $[0; 1]$, scales them to $[0; 2]$, then subtracts $-1.0$ to convert them into $[-1; +1]$, and finally multiplies the coordinate with
vec(1, -1)which flips the $y$-axis. The last step is only required for WebGL, because OpenGL usually treats the $y$-axis as incrementing from bottom to top, while HTML canvas elements treats the $y$-axis as incrementing from top to bottom. If you’re not writing OpenGL code for the web, you won’t have to do that. - Then, OpenGL does a trick behind the scenes: It takes the vertex positions produced by the vertex shader, and takes a look at the output which you want to draw the things onto. It then calculates all the pixels that are touched by the given vertex, and runs the fragment shader on each. The fragment shader then has the task to calculate the color of the provided pixel, which it can do in the simplest case by looking up a position on a texture. (We will be actually computing the colors later.)
- This color is then what gets applied to the correct pixel in whatever you’re drawing onto.
This is quite something to unpack.
First, why do we even need a vertex shader if all it does is transform some position into a coordinate system of $-1$ to $+1$? Can’t you just simply provide all the positions already in the correct coordinate space to begin with? Well, yes, you certainly can. But you shouldn’t. Why? Well, here we can return to our distinction between the general engine and the rendering engine. Remember that the general engine should remember all the positions of objects in your world. But you usually can move the camera around. And that means that all object positions will move accordingly, but only from the perspective of the camera. They don’t actually move in the “game world.”
Now, you could absolutely re-calculate all object positions relative to the camera in JavaScript, and only pass the final positions to the rendering engine, making the vertex shader kind of redundant. However, this is a computationally heavy task, since every vertex has to be moved. What you commonly do in graphics rendering therefore is to have a set of matrices. One matrix is used to transpose every vertex homogeneously. When you move the camera left or right, you’d only update the transposition matrix, which is two numbers. Then you’d provide that matrix to the vertex shader and make use of a convenient property of GPUs, or graphics processing units: These can heavily parallelize tasks. Which means: If you have thousands of positions to transpose, letting the vertex shader do the transposition of each vertex is much more efficient. In JavaScript, you can only transpose one vertex at a time, but the vertex shader can transpose as many vertices as you have compute cores on your GPU. Usually, these are … quite many.
If you have ever seen technical details to a GPU, you’ll probably have seen that these graphics cards nowadays have many, many more computing cores than normal CPUs.3 Which means: If you have a GPU with 1,000 cores, and you have a thousand vertices to transpose, your GPU can essentially run the vertex shader once on each core, and it will be done in a single iteration. In addition, you won’t have to update your object positions if only your camera position changes. You only have to provide a single changed matrix to your vertex shader and let your GPU do the work.
There is no strict boundary as to which computations you should be doing in JavaScript, and which to do on the GPU. But if you ever run into a bottleneck, you can probably do some performance benchmarking to figure out what the GPU should be doing, and what your JavaScript code should be doing. But again, we’re only rendering a few simple shapes, so I’m not going to do that.
The other thing to unpack is what happens in between the vertex and fragment shaders, because this is difficult to understand this — you can’t really “see” this in the code. It’s important to understand that the vertex shader actually performs vector graphics. What comes out of the vertex shader is still a set of vectors connected by lines, and these lines can be described by mathematical formulas. What OpenGL then does with these vertices is it rasterizes them.
Rasterization is the process of taking some “pixel-perfect” vector graphics and turning them into … well, pixels. For that, OpenGL will check each individual pixel on whatever you’re drawing onto, and ask: “Does this vertex touch this pixel?” If it does, it will remember that position. Once OpenGL has checked every single pixel, it will then — and only then — start up the fragment shader. The fragment shader then gets a pixel and has the task to calculate a color for that pixel. In other words, the fragment shader will never run for a pixel that is not touched by any vertex. What you see is the vertex shader and the fragment shader, and you see that there is some data being passed around, but the entire work of OpenGL in between these two shaders is hidden from view. I found that very difficult to understand.
Anyways, that is part 1 of understanding the rendering pipeline. (There will be a second and a third part which involve understanding read and draw frame buffers, the back buffer and front buffer, and rendering buffers, but we’ll get to that later.)
Final Thoughts
At this point, you should have understood the basic state management and setup of WebGL so that we can now turn to finally drawing things onto the screen. Since we’re already 3,000 words into this single article, I’ll keep the suspension up for the next article, where we will actually draw things. So stay tuned!
1 I’m not saying “every child” because I want to avoid the hated discussions about “today’s youth” and what they all can’t do anymore. I could do this as a child, and so should have you.
2 Insert some cue to “To make apple pie, you first have to invent the universe” here.
3 As a reference, the GeForce RTX 5090 has, according to its technical specification, 21,760 shader units. This means that it can run 21,760 shader calculations in parallel.