<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Personal Research Blog | Hendrik Erz</title>
<generator>WinterCMS Winter Blog Plugin</generator>

<!-- NON-STANDARD ADDITIONS BY MYSELF -->
<icon>https://www.hendrik-erz.de/themes/main/assets/images/apple-touch-icon.png</icon>
<subtitle>By Hendrik Erz</subtitle>
<logo>https://www.hendrik-erz.de/storage/app/media/hendrikerzde_socialmedia.png</logo>


<link rel="alternate" type="text/html" href="https://www.hendrik-erz.de/blog" />
<link rel="self" type="application/atom+xml" href="https://www.hendrik-erz.de/feed.xml" />
<id>https://www.hendrik-erz.de/blog</id>

<updated>2026-06-14T15:51:23+00:00</updated>

<entry>
  <title>Four Uncomfortable Truths About the Impeding Collapse of AI</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/four-uncomfortable-truths-about-the-impeding-collapse-of-ai" />
  <id>https://www.hendrik-erz.de/post/four-uncomfortable-truths-about-the-impeding-collapse-of-ai</id>
  <published>2026-06-12T07:00:00+00:00</published>
  <updated>2026-06-12T16:07:16+00:00</updated>
  <summary type="html"><![CDATA[The &quot;AI Bubble&quot; will collapse. That is not a question of &quot;if,&quot; but a question of &quot;when.&quot; However, I have seen several people arguing that, once the AI bubble will burst, everything will go back to normal. But this is very unlikely to happen. In this article, I outline four uncomfortable truths which everyone – apologetics and critics – will have to come to terms with.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/four-uncomfortable-truths-about-the-impeding-collapse-of-ai">
    <![CDATA[<p>The cracks are starting to form, and they’re visible. After GitHub introduced metered pricing to its Copilot subscription models, developers across the board realized just how much text that model generated for each request. And soon, they had to realize that the golden times of vibe coding were over. No more letting LLMs run rampant on a code base for hours on end to implement features or chase bugs. Now it’s all about being efficient, ensuring that the LLM doesn’t burn too many tokens so that the allowance reaches until the end of the month.</p>
<p>It’s kind of mindboggling to think about that: You are writing a bunch of text and submit it to some machine learning model, and the provider is going to bill you based on how many words it generated. But there is no way for you – or the provider, for that matter! – to really control how many tokens it will generate. Instead, you have to think hard about your prompt, and pray to the gods of probabilistic computing that the model does not think for too long. You can control what you provide to the model, but you have no way of knowing whether the model will accidentally enter an infinite loop and eat up your credit without doing anything of what you’ve asked it to.</p>
<p>But I digress. This is not about the insanity that is connected with the widespread adoption of LLMs. Instead, in this article, I want to provide a prognosis of what is going to happen going forward. And regardless of whether you’re an apologetic vibe coder or an “LLM-Luddite,”<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> you are not going to like that.</p>
<h2>The AI Bubble is Popping</h2>
<p>First things first: The AI bubble is currently starting to crack. You don’t have to be a fervent reader of Ed Zitron to know that: AI companies are slowly running out of VC money; data center builds are postponed; all companies are tightening the thumb screws around their usage plans. In a year, give or take, the AI bubble will fully burst, and it will take down a lot of startups with them. But not all of them, and that’s the first uncomfortable truth.</p>
<p>None of the “big names” will go out of business. Open AI will probably stay around due to market penetration. Anthropic, too, but for developers. Microsoft, Google, and Meta will similarly stick around, because they have business models that also include non-AI services (read: advertisements). Even if the AI bubble exterminates 90 % of the AI market, none of the companies that we all know and love/hate will go down.</p>
<p>It helps to compare the AI bubble with the Dot-com bubble in the early 2000s. Microsoft was very entrenched in the Dot-com market, and it still survived. Just because something turns from free money printer to a normal business activity, this does not mean that the cause of it will go away. Or did we stop living in houses after the 2007 subprime crisis?</p>
<p>Besides this big, primary point, there are a few other trends that will continue to flourish even post-bubble. And these are not positive trends. Instead, they are deeply worrisome. These are: (1) Software quality will stay at a degraded, lower standard; (2) There will not be any meaningful post-crisis re-hiring of laid off software developers; and (3) Atrophy of both skills and knowledge will continue to stay an issue for decades.</p>
<h2>Atrophy of Skills and Knowledge will Continue</h2>
<p>There have been debates that LLM usage deteriorates skills and knowledge. If you externalize any thought process to an LLM, you won’t be keeping your mind sharp. The more we rely on LLMs for basic tasks, the less we will be capable to do things by ourselves. And even if the bubble finally pops, this won’t suddenly make people crawl out of their caves and see the light. There will still be some LLM providers out there, even if they aren’t as many. And people will continue to rely on them for the tasks they have forgotten how to do.</p>
<p>This is the same principle as with Uber, Airbnb, Netflix, Instagram, or Spotify: No matter the enshittification of a service, once it is foundational to our social lives, it is impossible to fully get rid of it. The same will happen with LLM providers — once they have become muscle memory, people will continue to use them instead of (re)learning the necessary skills. Think of OpenAI as the next Spotify.</p>
<p>Skill atrophy is not going to go away. As long as there is a handy button for people to compose emails for them, many really won’t feel the need to write an email by themselves. And even if their favorite email client suddenly no longer has a button, I feel that many will reach for another provider rather than going through the — quite uncomfortable — experience of having to write an email with the correct tone and choice of words.</p>
<p>Because that’s the thing: Having to decide what to do, and rolling with that decision can be scary, depending on the stakes involved. And if there is something, like an LLM, that can do some of the work for you, it doesn’t feel as if you have to make the decision yourself. And this externalization can be quite comfortable. The main difference between LLMs and other technical breakthroughs in the past is that in the past technology has primarily made things easier, but still required knowledge to operate. Now, technology – for many people – automates the thought process itself.</p>
<h2>No Meaningful Re-Hiring of Software Developers</h2>
<p>And this atrophy is already re-shaping society. More specifically, if everyone can use an LLM for a task, this will gradually become the new norm. If you think, say, about an advertisement text, rather than letting an AI come up with a catchy slogan, you’ll possibly be more witty, but also slower than your competitors. And once AI-generated advertisement has become the norm, there won’t be much use for people who can think about the problem anymore. Exactly the same will also happen to software developers, many of whom are currently laid off.</p>
<p>Here’s the next uncomfortable truth: Even if all AI providers went bankrupt tomorrow, there will be no complete re-hiring of the software developers who have been laid off since the start of the AI bubble. There are two reasons for this. First, there had been a lot of over-hiring during the pandemic, because everyone suddenly relied on software for everything. Those “surplus” positions are gone for good.</p>
<p>But second, and more importantly, GPT models won’t suddenly stop working. As such, companies will continue to attempt to make their small core-team of developers achieve inhuman feats by pairing them with a school of agents to work on their software. Middle management has a reputation for good reason: They aren’t trained in any of the actual productive tasks that need to happen in a company. They are trained to keep the numbers look good. And as long as one developer can work through the same amount of tasks than two with the help of some LLM, management will happily keep only one developer. Because green-lighting a new server to run some big model locally is cheaper than paying one server every year because it’s a human being and looks more expensive on balance sheets.</p>
<p>Both an AI server and a developer might cost $100,000. However, the developer will need to be paid every year, while the AI server <em>looks</em> as if it’s just a one-off price. I’m not going to elaborate on all the levels this is wrong, but you hopefully get the point.</p>
<p>But there is more. The balance sheets and the inhumane appeal of fixed capital over human capital for middle managers is just one part of the equation. What will really convince managers that they can continue to operate with a reduced team is lowered standards. Which leads me to the next point.</p>
<h3>Software Quality Will Likely Never Recoup</h3>
<p>On social media, in personal chats, or in blog articles, many have already uttered concerns about a degradation in software quality. Vibe coding has objectively degraded the software quality of many products, and it has supercharged the principle “quantity over quality” for software, judging by the amount of new Markdown editors/readers/writers in r/Markdown over the past twelve months.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup></p>
<p>Unfortunately, this will be a persistent trend. Again, GPT models won’t stop working overnight, and you can do a surprising amount of coding with local models.<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">3</a></sup> The “tech enthusiast-to-vibe coder” pipeline will continue. As John Gruber <a href="https://daringfireball.net/linked/2026/06/04/the-ai-driven-resurgence-of-mac-app-development">recently quoted Jason Snell</a>, “if you can dream an app, you can probably build it.” And this will remain true even after the AI bubble bursts.</p>
<p>But just because some randos on the internet decided they want to become indie developers now doesn’t suffice as a reason for a general degradation in software quality. What is needed is a <em>general</em>, widespread degradation of software quality. Luckily, Microsoft is doing the lord’s work in this regard. Microsoft is really at the forefront of depressing the expectations of quality for software products. And it doesn’t matter where. Windows, VS Code, Active Directory, corporate licenses of Office — all of them are either stagnant, or actively degrading in software quality to a degree I have not witnessed in my entire life.</p>
<p>“What does this have to do with permanently lowered standards?” you may ask now. “Won’t users be able to tell good from bad?”</p>
<p>Let me tell you: We all <em>have</em> to interact with Microsoft products. We cannot avoid it. I am a Mac user, and have been for the past decade. But even I interact with at least one Microsoft product multiple times a day. Because I have no choice. Many people will just shrug and continue as they were. If you’re not a tech-savvy person, what are you going to do about it? Nothing. You can’t possibly know whether some bug is a stupidity that no sane developer would have let run past Quality Control, or whether you’re just doing it wrong.</p>
<p>This, gradually, leads to a lowering of standards that we, on a societal level, have towards software. And this will stick. Once people are “trained” to accept buggy or sluggish software, there is no incentive for anyone to improve their software. Because that doesn’t make money. Software standards are driven by large companies, not indie developers with a devotion to their craft. In turn, these lowered standards will empower companies to continue mandating AI coding agents for their developers.</p>
<p>The basic problem with the argument that software standards degraded is the notion of what a “standard” even is. “Standard” is whatever people see as the norm. And there is no objective baseline to determine what a “good” or “bad” standard is. A hundred years ago, the norm was that, if you were badly ill, you would just die. Nobody had an issue with that because there was no way to prevent that from happening. And vaccine acceptance shows that, even if it’s possible to prevent death, if people stop seeing the benefits of access to vaccines, they won’t care, until dying before the age of 18 suddenly becomes the norm again. The same phenomenon holds true for software.</p>
<p>Unless software degrades to state where it quite literally prevents users from doing their job at all, this trend will continue. Because the hard truth is that, as long as people can somewhat do what they need to do with software, they won’t rebel. If a software does its job, then — even if it is the most horrible experience you can have on this planet — people will accept it. Society does not operate on the principle of “I won’t use something that’s inferior.” It operates on the principle of “As long as it works, don’t touch it.”</p>
<h2>Final Thoughts</h2>
<p>When I talk with people or read discussions online about what people expect after the bubble pops, I see a lot of misconceptions about how the economy works. And I believe it is important to stay as realistic as possible. The AI frenzy has already altered society and the tech we use profoundly, and anything that has become culturally ingrained is much more likely to stick, even if the original cause of it vanishes. There is this famous notion in causal analysis that the cause that leads to a phenomenon to emerge in the first place can be entirely disconnected from the cause that perpetuates this phenomenon. That’s why social changes remain “sticky.”</p>
<p>Being honest about the changes that AI has already introduced to society and which are likely to remain persistent even after the economy goes back to “normal” is crucial to manage expectations. Atrophy, no rehiring of software developers and a persistent quality degradation of software are not the only things that the AI hype has brought. But they are the most visible ones.</p>
<p>With all of that being said, I’m not an oracle, and as such all of what I wrote here might turn out wrong. I would wish for that. But I have lived through two global economic crises already, and when the AI bubble pops, it’s going to be somewhere in the ballpark of the Dot-com or subprime crisis. And I have seen which changes have stuck around. I sincerely do hope that I’m wrong, but I’m not optimistic.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Quick aside: What an odd accusation. First, Luddites were never against technology per se, they were against the devaluation of their work. And second, even the fiercest anti-LLM-people I’ve met — and believe me, the Zettlr user base is full of them — are enthusiastic about tech. This is not the burn you think it is.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>I have been playing with the thought of just leaving that Subreddit multiple times in the past six weeks, because it’s <em>that</em> bad.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>I’m currently test-driving OpenCode and will report back once I have some more stable numbers and usage examples.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>The End of General and Free Public Education</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/the-end-of-general-and-free-public-education" />
  <id>https://www.hendrik-erz.de/post/the-end-of-general-and-free-public-education</id>
  <published>2026-06-05T07:00:00+00:00</published>
  <updated>2026-06-04T22:25:17+00:00</updated>
  <summary type="html"><![CDATA[Recently, the German educational minister has publicly stated that German Student Assistance, known as BAföG, does not need to be increased to counter the current cost-of-living crises. This has raised vocal critique across the political spectrum. I take this incident as a chance to focus on a bigger trend: Educational Retrenchment across all of Europe is rising, not just in Germany. Higher education is slowly turning from a public good to a privilege for wealthy families, reversing a trend started in the 1960s.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/the-end-of-general-and-free-public-education">
    <![CDATA[<p>The fact that education is publicly accessible, (mostly) free, and generally available to all members of society is younger than you probably think. If you are a U.S. resident, you may even have raised an eyebrow at the title. In the memory of my entire generation, the millennials, education in the U.S. was <em>always</em> connected with loads of student debt. But in Europe, we had this crazy time when education was practically free. But that time was very short. And we are currently witnessing the beginning of the end of general and free public education (in Europe).</p>
<p>Now, I’m not a scholar of education, so I’m not going to present a detailed history of that.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> But a few corner points need to be in place for what is to follow. First, I’m talking about <em>higher</em> education specifically, that is, university education. And second, I’m only going to focus on Europe, because it is here where higher education went from privilege to general admission, and where it is currently turning back to a privilege. And third, I’ll mostly focus on Germany, because (a) I know the system; and (b) because <em>of course</em> some German politician has said something stupid again.</p>
<h2>The Origins of German Student Assistance (“BAföG”)</h2>
<p>In the mid-twentieth century, Germany had a brilliant idea: Education is a public good, and we can gain economic strength by focusing not on manufacturing, but instead on high-tech. This also coincided with the deindustrialization of the <a href="https://en.wikipedia.org/wiki/Ruhr">Ruhr valley</a>, which saw millions of former mine worker families in economic distress.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> To pull this off, however, German citizens needed to be much more educated than they were after World War II.</p>
<p>For most of European history, education and then university education was a privilege of the rich. First, tuition is no joke, and European universities, just like U.S. universities, charged quite a lot for the privilege to be educated beyond the basics.<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">3</a></sup> But second, and more importantly, you couldn’t really earn money and enter the workforce while studying. While most workers typically enter the labor market at around 15–17 years of age, university graduates don’t start earning money until their mid-twenties. This places economic strain onto the families of university students to support them while they couldn’t yet earn money. And that’s when “BAföG” was born. If you know German colleagues, they may have mentioned this term before. BAföG<sup id="fnref:4"><a class="footnote-ref" href="#fn:4" role="doc-noteref">4</a></sup> is a law that essentially made public education free for anyone who didn’t have the means to support themselves — read: working class people. For the transformation Germany was planning, just letting the privileged educate themselves wasn’t going to cut it.</p>
<p>If you want to educate large parts of a population, you also need the capacity. You require a broad net of universities that could absorb many more students than the existing ones. Luckily for Germany, a third trend coincided with the push for higher education and deindustrialization — a university reform movement. If you take a look at <a href="https://en.wikipedia.org/wiki/List_of_universities_in_Germany#List">the list of German universities by founding date</a>, you will see that the mode of founding dates is around the 1960s and 1970s. That’s when many universities — such as Bielefeld, Bochum, Dortmund, or Düsseldorf — were founded in an attempt to unlock higher education for the masses. And this movement wasn’t restricted to Germany — my current employer, Linköping University – had also been founded during that time, albeit for slightly different reasons.</p>
<p>Back to the student loans. Initially, it was both a complete gift by the government and equally given to everybody (including those without a need).<sup id="fnref:5"><a class="footnote-ref" href="#fn:5" role="doc-noteref">5</a></sup> Lawmakers quickly adjusted the law to decrease its big economic footprint, however. They added requirements where the applicant for BAföG had to demonstrate the inability of their parents to support them while studying. And then, the government required to pay back parts of the grant, turning it from a gift to a loan — albeit without any interest, and capped at a maximum amount.</p>
<p>This is essentially the setup that has remained until today. There is generally a willingness to increase the amount of BAföG each student can receive in loose coupling with the rate of inflation, but the perceived quantity of BAföG seems to have declined. Today, there are two big critiques of BAföG: First, there is a gap between the point at which your parents earn enough that you are no longer eligible to receive BAföG assistance, and the point at which your parents would actually be able to assist you. And second, any increases in the BAföG allowance have been dwarfed by the current crises from cost-of-living to rent to food to inflation to your average Netflix subscription.</p>
<h2>Working While Studying</h2>
<p>And this is my primary point that I want to make today: With all the support European governments have provided to lift working class people into the educated academic class, this appears no longer to be a priority. If education becomes so expensive that only wealthy families can afford to send their kids to university, we are back to a system of privilege. Recently, the German minister for education, Dorothee Bär, has made a statement that saw my German colleagues and me in rage. Essentially, she said that working besides visiting a university is a mere “nice to have,” not a necessity.</p>
<p><em>Au contraire.</em></p>
<p>Now, I can’t speak for the current state of BAföG, since I didn’t receive any since 2017, but I can certainly say that, even ten years ago, it was a <em>requirement</em> that I worked on the side. Not a nice-to-have.</p>
<p>You see, I come from a very poor family. My mother had been on-and-off employed for as long as I can remember, my father was in perpetual debt; my grandmother was the classical house wive supported by my grandfather who was an electrician. The fact that I could even visit school until the highest school degree (Abitur) was expensive.<sup id="fnref:8"><a class="footnote-ref" href="#fn:8" role="doc-noteref">6</a></sup> But my grandparents were fierce believers in social upward mobility and the <a href="https://en.wikipedia.org/wiki/Humboldtian_model_of_higher_education">Humboldtian ideal for education</a>, and so they made it happen.</p>
<p>When I started studying history in 2011, it was a basic fact of life that I would need to apply for BAföG, and it was considered a given that I’ll get the maximum amount. The first two or three months at university were dire — I had no savings, and the application needed processing time. After I received my first BAföG payment, the situation was more stable. But even with the maximum amount of BAföG – a little over €600 – it barely covered rent and necessities. And, mind you, that was at a time when my rent for rooms in shared flats<sup id="fnref:9"><a class="footnote-ref" href="#fn:9" role="doc-noteref">7</a></sup> didn’t exceed €250–€300. I’ve heard that nowadays rent is closer to €600 (roughly what I got <em>in total</em> back then).<sup id="fnref:10"><a class="footnote-ref" href="#fn:10" role="doc-noteref">8</a></sup></p>
<p>So I used the ability to work up to 20 hours a week to get a bit more money and be able to have some free time expenses (you know, like going out once in a while). I was very lucky with the type of jobs I got, because all of them were comparatively highly paid. This meant that instead of 20 hours I typically worked only 10 to 12 each week. Also, my jobs were all close to my studies, which means that I didn’t have to deal with internal alienation from the job. But again, I couldn’t have participated in most activities considered crucial during my studies<sup id="fnref:11"><a class="footnote-ref" href="#fn:11" role="doc-noteref">9</a></sup> if I didn’t work on the side.</p>
<h2>The Bigger Picture: Educational Retrenchment</h2>
<p>However, I believe that this is merely part of a larger trend. I don’t think that the current stress in ensuring educational opportunities for all is caused just by high prices. Rather, I think that high prices right now serve as a good backdrop to reduce funding without <em>nominally</em> reducing it. This all fits into a broader trend where universities are silently being turned back from general civic educational institutions to privileged clubs.</p>
<p>You see, you can defund government programs in two ways: First, you can cut them outright. That is typically the most straight-forward option, but it’s also very unpopular. Second, you can simply shift money around. This is less unpopular because you can always argue that you’re funding something else important. (Paul Pierson has done <a href="https://search.worldcat.org/title/698653999">great work in analyzing this phenomenon called “retrenchment”</a> in the U.S. context.) With university funding in general, it appears to me like governments across Europe are neither cutting nor shifting money around. Instead, they essentially let the economy do the work for them. By doing nothing during a time of high inflation and a cost-of-living crisis, they effectively cut, or retrench the programs.</p>
<p>The fact that BAföG is considered less and less viable for ensuring university access for all is just one part of a broader trend. Last year, when I finally was able to start applying for research grants myself, the acceptance rates across Swedish science funding bodies collapsed from sometimes up to 20 % to below 10 %. A program that I was really hoping to get into and which had a decent acceptance rate, suddenly plummeted to 8 %. In addition, the European Union has effectively <a href="https://erc.europa.eu/news-events/news/erc-scientific-council-readjusts-rules-reapplication">“rate limited” access to grants</a> to reduce pressure on its review boards. Money is tight everywhere, and at every position in the academic hierarchy.</p>
<p>Students struggle more with less BAföG (or other forms of government support in different European countries); prospective PhD students struggle with much more competition on reclining positions (I heard from a PhD position with 600 applicants); and above that, it feels more like the Hunger Games than trying to compete for ideas.</p>
<p>Academia has entered a race to the bottom. Full-time contracts haven’t been the norm for years, people fight for scraps, and we see a general contraction of university employment.</p>
<p>Culturally, this appears to be caused by a dwindling sense of the usefulness of academia in the powers-that-be. When I was in undergrad, there were large-scale cuts in university funding. Two fundamental trends have been set in motion back then. First, a dismantlement of what the German public called “orchid subjects” — that is, subjects with little perceived value. One of these “orchid subjects” that got cut at my university was Ukrainian cultural history.<sup id="fnref:12"><a class="footnote-ref" href="#fn:12" role="doc-noteref">10</a></sup> But second, and more importantly, the basic funding of universities (that is, money that is not tied to specific projects) got reduced just enough so that universities were only able to fund their administrative staff from the money. Research positions had to be funded almost exclusively via third-party projects.</p>
<p>This is the connecting tissue between dwindling funding for faculty positions and the silent cuts for students. German politicians have figured out that cutting or restricting funding for universities is hugely unpopular among the still large academic community in Germany. The most recent of these backlashes came to be known under the hashtag <code>#IchBinHanna</code>. So when it comes to BAföG, they don’t do any cutting, and instead let the economically worsening situation take over.</p>
<p>The effect is clear: the numbers of university admissions will go down, as those who are most dependent on external funding decide against a university degree first. The number of faculty members goes down as well because there is simply no money to employ all of those who deserve it. This is a vicious circle. Fewer students imply less funding for teaching from the state, which means tighter budgets, which means less ability to hire staff … you get the point. There are some complicating factors such as high pressure from the United States, which essentially goes through the same transition, albeit with much more force. In the end, university education will slowly but steadily go back to a privilege that only wealthy families will enjoy.</p>
<h2>The End of General and Free Public Education</h2>
<p>There are three reasons why I wrote all of this. First, because that statement which caused so much critique on social media is an epitome of the dismissive view of many policymakers towards higher education in general. Academia does not produce immediate value, and as such is seen more as a burden than an investment in the future. In times of high economic uncertainty, anything that was once considered self-evident gets thrown overboard (see also <a href="https://osf.io/preprints/socarxiv/fxrzk_v1">my preprint which essentially shows just that</a>). Especially those who come from underprivileged contexts will take issue with such statements, because they have experienced first-hand that this is incorrect. To those, it feels like a slap in the face; akin to gaslighting the next generation of BAföG recipients: “If money is tight, it’s your spending habits, not the amount of BAföG.”</p>
<p>But second, I see evidence of this declining priority of science as a public good in and of itself everywhere. While the United States government makes abundantly clear that they don’t like educated people, things don’t look bright in Europe either. But because the funding cuts in Europe aren’t mandated via executive order, they aren’t as visible. European academic institutions are slowly – <em>very</em> slowly – bleeding dry. That’s why I took that – in and of itself not too dramatical statement – as a cause to write these lines. Because if we don’t make these issues visible, nobody will.</p>
<p>And third, I think that it is dangerous to defund academic institutions in light of economic hardship. Asian states are just ramping up their academic systems. I know many people who left Europe for Singapore, Hong Kong, Beijing, and other Asian cities which have better job prospects for academics. By reducing funding and trying to pressure people into other industries, hoping for a manufacturing wonder, Europe is harming its own future. And this is one of the few instances where noble ideal of general education meets hard economic facts: Even though academic institutions don’t produce direct economic value, if we don’t educate our students and let go of our faculty staff, European competitors will run past us. And, moreover, now that the United States have started to drop out of the excellence game, it’s either Europe or no one to keep up with Asia.<sup id="fnref:13"><a class="footnote-ref" href="#fn:13" role="doc-noteref">11</a></sup></p>
<h2>Final Thoughts</h2>
<p>Academia is a game of attrition, and without a firm belief in the purpose of one’s own research, it is hard to stay here. But there is a difference between choosing a hard job, and getting sticks thrown into the bike’s wheel. As hard as academia is for faculty, we know the system. Students who just graduated from high school don’t. Don’t make students pay for something they had nothing to do with.</p>
<p>Working besides studying is perfectly fine. Even many of our students at the Master program here in Sweden are working on the side. It’s not a German phenomenon. Almost everyone I know did it, and if you do it, you’re doing it right. It is not your spending habits that cause the BAföG money to be insufficient at the end of the month.</p>
<p>Finally, dear policymakers, heed the calls: academia needs enough money to function. I know so many excellent scientists who decided to drop out of academia because it was untenable, and it is sad to see such bright people leave and not contribute to everyone’s benefit. Even if higher education looks like a pure cost factor on balance sheets, there is a reason European policy and trade negotiations typically work well, and that Europe is still not trailing too much behind other global powers. It’s because of the highly educated policy advisors and research staff, engineers and industrial designers that keep Brussels and each individual state afloat.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Also, that would involve so many nuances I could write a book. I believe there are others who already did that.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>We learned <em>a lot</em> in school about this transition, “Strukturwandel.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>Incidentally, they still do — if you don’t have European citizenship. For European citizens, tuition sits mostly at the level of administrative costs (maybe €1,000 a year, but that heavily depends on where you are). But if you want to study at a European university without European citizenship, tuition can climb up to €10,000 or even €20,000 easily.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:4" role="doc-endnote"><p>Beautiful bureaucratic for “Bundesausbildungsförderungsgesetz” (Governmental Educational Assistance Law). It even has <a href="https://en.wikipedia.org/wiki/BAf%C3%B6G">its own Wikipedia page</a> in case you ever wanted to learn how it’s pronounced (?).&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:4" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:5" role="doc-endnote"><p>Of course, not all, and I’m glossing over a lot of nuance here.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:5" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:8" role="doc-endnote"><p>In Germany, you can drop out a bit earlier and start earning money sooner.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:8" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:9" role="doc-endnote"><p>I later learned that sharing flats appears to be a uniquely German phenomenon.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:9" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:10" role="doc-endnote"><p>My colleague Sebastian Gießler has mentioned an important point here. He said that a common argument from German conservatives to the fact that BAföG barely covers rent is to “simply study in rural universities where rents are lower.” But this is still an argument for privilege: You’ll have much higher chances of high-paying jobs and of escaping the working class when your CV mentions the University of Cologne, the Ludwig-Maximilians-University Munich, or the Humboldt University in Berlin. If recipients of BAföG should go study in, say, Kaiserslautern rather than Berlin, you’d get back to a system of privilege where the student body of universities in expensive cities like Cologne, Berlin, or Munich, would consist of a very homogenous wealthy stratum of society, perpetuating the class divide in German society.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:10" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:11" role="doc-endnote"><p>I am not going to argue with you if you don’t believe that networking with your peers, building friendships through shared activities, and participating in the university traditions, are crucial parts of the university experience. Doing that, and getting to know academic culture is just as important as the actual material you learn. Just sitting inside and never leaving the house is not a viable alternative.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:11" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:12" role="doc-endnote"><p>Fast-forward ten years and policymakers were struggling to find experts on Ukrainian history to help them understand the context of the Russian invasion.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:12" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:13" role="doc-endnote"><p>It feels a tad weird writing these lines. To be clear: In an ideal world, <em>everyone anywhere</em> should have the ability to go to university. And I don’t like nation-state competition, because that typically ends in war. But the reality is that the biggest Asian superpower is governed by a non-democratic regime, and they heavily invest in university education. I’m just pointing out the obvious to European policymakers who are afraid of China.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:13" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>This App Should’ve Been A Website</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/this-app-should-have-been-a-website" />
  <id>https://www.hendrik-erz.de/post/this-app-should-have-been-a-website</id>
  <published>2026-05-29T07:00:00+00:00</published>
  <updated>2026-05-28T21:14:07+00:00</updated>
  <summary type="html"><![CDATA[Every once in a while I go on a trip with friends. And, while I can keep data brokers (mostly) off of my phone with a bit of research, this is less possible when a group needs to coordinate. Because someone must suggest how, e.g., budget splitting is being done. And typically, this ends with me having another single-use app on my phone. So today I will be yelling at the clouds about this app that should really have been a website.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/this-app-should-have-been-a-website">
    <![CDATA[<p>Technology is amazing. Hardware has brought us to the moon, and software made sure nothing went wrong on the way. But especially software is also going to be our downfall. There are a lot of issues with software, but today I won’t be talking about Microslop, brittle IT infrastructure, or malware spreading through Open Source repositories.</p>
<p>Today I want to talk about the bane of my interaction with others where everything nowadays has to be an app. While websites seem to have replaced actual pieces of software that you’d install on computers, it appears that the opposite trend has a tight grip over the phone ecosystem. Here, instead of moving everything from To-Do list to Excel into a website, every silly little website is being turned into an app.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> And I am sick of pretending that I am okay with this.</p>
<p>To be fair, I am relatively privileged in that I know how software works in and out, and <em>know</em> what functionality actually requires apps. But I do think that everybody else should at least get the gist of the problem. Because it’s not that I refuse to connect and interact with my friends — to the opposite. Things such as splitting bills, coordinating and booking trips, and other things are important, and I wholeheartedly commit to this.</p>
<p>But whenever I point out to friends that something for which they want to use an app should not be an app, but instead a website, I feel like I’m a downer — possibly with due cause.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> When it comes to planning and coordinating as a group, if you’re the one suggesting we search for something that does not come as an app, you will inevitably get the mark of a troublemaker. You might even run in the danger of not being invited anymore, because you cause too much of an issue with seemingly simple things. Because that one app works, it’s convenient, and nobody else complains, so why would you even have a point?</p>
<p>But I do not think that I should feel this way. Taking the extra 5 minutes to research alternatives that offer the same functionality, but without forcing your group to install yet another app, should be something we all (me included!) should strive to do. In the next few paragraphs, I want to outline what my issue with using apps for everything is, when I would prefer an app instead, and what the risks of defaulting to phone apps are.</p>
<p>So, what could possibly warrant an entire article about issues with apps? We all use them. Most of us use Instagram, some messaging app (WhatsApp, Signal, Telegram), many of us have Spotify, possibly Duolingo, maybe Zoom, Discord, and LinkedIn. Besides that, every phone nowadays ships with a metric ton of built-in apps; some of which are essential (Settings or the camera app), some of which are optional and can be uninstalled (especially on iPhones), some are unremovable bloatware (mostly on Android phones). In short: apps are nothing inherently bad.</p>
<p>My issue is a specific category of apps: those whose only reason to exist is to extort money from people in some way. The <em>casus belli</em> for this article was that I recently went on a trip with a few friends; and to split up the bills, the organizer asked us to use an app for that. Indeed, he shared a link, but that link just went to the homepage of the company that offers the app. There was a button that seemed to indicate I could see the shared budget, but it made my Safari throw an error that it was an invalid link. After asking around and realizing nobody else had the issue, I saw that the only functional thing on the website was to “download an app.” An app? For something as simple as splitting expenses?</p>
<p>Hesitantly, I downloaded the app (to not be seen as the idiot), and could quickly confirm my fear: The app is literally something that should have been a website. The app did not require me to create an account, which I hold in its favor. But the moment I clicked my friend’s link again to open the app again with the correct account, a notification popped up where the app tried to upsell me a credit card with a meager 10% interest rate.</p>
<p>Man, what the fuck, dude.</p>
<p>The task of splitting a budget requires a few form inputs that allows people to punch in their expenses. Then it uses some very, very simple arithmetic to ensure everyone receives/pays money in such a way that after paying each other, everything is balanced. It is a mere tool for a task that we could almost as easily do by hand if we wanted to. Again, nothing of this task is revolutionary, and there are hundreds of competing apps and regular websites out there that do this. It took me quite literally 10 seconds to dig up <a href="https://spliit.app/">this alternative</a> which offers exactly the same, but without the need to download an app.</p>
<p>“But wouldn’t an app be more convenient?” No, because you will in any case need a link to share with everyone else. There is no difference between having a link that redirects you to a web form, and a link that asks you to download an app. In fact, if that had been a simple website instead, a single click on the link would’ve directly taken me to the correct account. Instead, I had to figure out that I was supposed to download an app; then click on a button; get asked to scan a QR code with my phone; see the same website; click a button; be redirected to the app store; download something to my phone; open it; then click the same link again, but now on my phone to have the app open to the correct account. A simple website would have done all of this more convenient.</p>
<p>Except, the creators could not as easily make money off of a website.</p>
<p>The sole reason why so many simple tasks are transformed into downloadable apps is because apps are closer to the user. Instead of being served through a browser which (at least for now) puts some very heavy guardrails onto what a website can do, apps are literally compiled programs that run directly on your phone. The only convenience of such apps is for its developers, because they can make money off of you.</p>
<p>Most importantly: Websites cannot send you unsolicited notifications without you explicitly allowing that, while app notifications are essentially on by default. And even if the app does not use a single phone API, that pivotal difference to serve you notifications (read: ads) immediately is so lucrative that developers go through hoops to pay Apple and Google for access to the app stores; develop two different apps (for iOS and Android); and make you click a bunch of buttons just to download a glorified, single-use spreadsheet calculator to your phone. Also, once an app is on someone’s phone, it is very likely that this app will just stay there. And, at some point, the user who downloaded it may be in need for that functionality, and, conveniently, it is still on the phone. Thus, competitors are locked out for good. There’s much more to this, and entire books have been written on how to hook customers to extort them for money later on with free or “convenient” offerings.</p>
<p>But my point has become clear. There is a category of tools that simply should not be an app. And if you’re organizing a trip or a gathering with some friends, be nice and don’t force them to download a single-use app if the same functionality could be a website. Just ask yourself: Does the app you want to suggest to your friends require location services, access to the camera, or some live activities? Then please, go ahead, because those things cannot be properly done via a website. But for anything else: please take 30 seconds of your day to search for a website alternative and use that one instead. Your own friends should be worth this to you. Recommending apps that could be websites quite literally puts your friends’ data at risk. Remember when Facebook asked for full, unrestricted access to your address book which is why most of our contact details are now used for scam calls and spam mails?</p>
<p>I’m relatively sure that there should be a website that lists alternatives for apps, although I couldn’t find it yet. So if you know of such a website, let me know so I can share it here.</p>
<p>Because nobody should need to download an app for <em>that</em>.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Fun fact: Most apps are also actually just websites. Instagram and Spotify are the two prime examples. In other words, the “app” is a software program whose only purpose is literally to show you a website. Many apps do not make use at all of any of the additional features that phones offer, such as live location services, camera access, or machine learning frameworks.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>Or, as a friend has said: “Das ist so richtig deutsch von dir.” (“That is very German of you.”)&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>TrueNAS, or, How I Rediscovered The Joy of Owning My Media</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/truenas-or-how-i-rediscovered-the-joy-of-owning-my-media" />
  <id>https://www.hendrik-erz.de/post/truenas-or-how-i-rediscovered-the-joy-of-owning-my-media</id>
  <published>2026-05-15T07:00:00+00:00</published>
  <updated>2026-05-01T22:02:44+00:00</updated>
  <summary type="html"><![CDATA[A few weeks ago I received a free computer, and turned it into a NAS. I subsequently discovered my old music collection, and decided to turn my NAS into a streaming service. Now I am rediscovering the joy of having no perfect choice, and supporting artists I enjoy directly.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/truenas-or-how-i-rediscovered-the-joy-of-owning-my-media">
    <![CDATA[<p>A few weeks ago, a relative called me up and asked me if I had a use-case for an old computer of hers. It was one that I had built around ten years ago for her work. We are talking pre-pre-last-gen components. But how could I refuse? It was free, and in perfectly working condition. So I agreed, and a few days after that, I owned another computer. However, I did not intend to use it as a computer. Again, the components were all relatively old, so running any modern operating system on that, while possible, would not be the best use-case.</p>
<p>Instead, I decided to finally go ahead and do what I had been planning for about a year at this point: Turn it into a NAS, or Network Attached Storage. I actually needed one of these little network-storages for quite some time. Ever since I started my PhD and suddenly had to deal with large datasets in the tens of Gigabytes in size, my computer’s storage turned out to be a really scarce resource. Furthermore, all I ever had to back up data to was an old, crappy 1 Terabyte HDD. I had replaced that with a more modern 1 TB NVMe SSD with USB 3 support, but even that was quickly outgrown by the necessity to have more space available. For the past several years, I had no space left to back up anything. All the photos on my phone were essentially unique, and I didn’t have any meaningful computer backup either. This led to quite a few sweaty trips outside of Europe, where I was plagued by the fear of getting robbed. Luckily, that fear didn’t materialize. But running on luck should not be a strategy for maintaining access to our data — especially in a time when everything is born-digital and there are no physical copies left.</p>
<p>So in that case, it was a godsend that I suddenly got a working computer for free. And even though it consisted exclusively of old components, it was perfectly suitable for use as … a server!</p>
<h2>Building a NAS During a Storage Crisis</h2>
<p>Now, the one thing it decisively did <em>not</em> have was a lot of storage. All there was in the computer was a 128 GB SSD for the operating system. Given that my existing 1 TB of “backup” didn’t suffice, a tenth of that would be even less sufficient. So the first order was to actually do spend some money to get storage. But, in case you forgot, we’re still in the center of the AI Bubble, and storage prices have surged in lockstep with memory pricing. Not as drastic, but still very expensive. But it didn’t help — the perspective of finally having a safe haven for my data was important, and since I do earn some money, I can afford even heavily inflated prices.</p>
<p>So I went shopping. After some research, I figured out that I wanted to get a set of Seagate IronWolf HDDs. Since I want to have redundancy built-in, I opted for 3×4 TB of HDDs. But, understandably, these hard drives were either available via dubious third-party sellers on Amazon, or plain sold-out. So no luck. In the end, I received my three disks with the right capacity and from the right company, only a slightly different model. I threw them into the computer, installed TrueNAS (a very popular operating system for running NAS systems). I was pleased when quite literally everything worked out of the box. While typically, something goes wrong, here, everything went flawlessly. So I set up all my storage and the system, and spent a day transferring not just the one Terabyte of existing data onto the NAS, but also set up a backup for my laptop. So now I <em>actually</em> have a 3-2-1 backup strategy, and can be much less afraid of losing my phone or laptop. Don’t get me wrong, that would still suck hard, but at least I would still have 100 % of all my data available. Which is a big relief.</p>
<h2>Rediscovering my Music</h2>
<p>But when I checked the used storage (now I had 8 TB), I saw that, after cleaning everything up, the storage was only used up to about 10 %. It turns out that 8 TB of space is <em>much</em> more than I needed at the moment. So I asked myself: Do I have anything else that I could just collect centrally here?</p>
<p>Then it hit me: I still have all my music library somewhere! … at least that’s what I hoped. But I couldn’t find it; neither on my old 1 TB HDD, nor anywhere else. After a day of searching, I did fortunately find it on a hidden place on my other computer. And so I began moving that over to my NAS.</p>
<p>Some background: When I first discovered the joy of listening to good music as a teenager, I quickly made a habit of buying albums I quite enjoyed. Now, back then, we went around with hard drives and shared them to get more music. But I quickly decided for myself that I should rather actually pay the artists. And so I bought quite a few CDs over the years. In the end, I believe I had over a 100 different albums on CD. Then, I also got a vinyl player (shortly <em>before</em> that was trendy again, so I was for once ahead of the curve!) including the corresponding records.</p>
<p>Because back then we still had CD drives, I ensured I kept all of my music also digitally backed up. At some point, my CDs vanished into some basement, but the music was still on my computer. Some time in the early 2010s, the website Bandcamp started to gain traction. The idea was simple: We all don’t have too much space for CDs, and CD isn’t the best medium anyhow, so why not just offer people FLAC-files in high quality directly? This skips a middle-man we all were happy to leave behind (the CD-ROM drive), and possibly some logistics, too. It was a win-win for artists and fans alike.</p>
<p>Except in 2007 or so when I first discovered it, most labels were still firmly in the hands of the traditional distribution systems, and did not consider Bandcamp. What was on there was mostly niche bands without record labels who just uploaded their own music. (By the way, until today, there’s also an album I produced entirely myself available for free download. Let me know if you believe you found it!)</p>
<p>So while I did purchase a couple albums from Bandcamp, that remained a rare occurrence — not the least because I was a poor high school/university student.</p>
<h2>Spotify and the Death of Choice</h2>
<p>Also, a few years after that, streaming became a common occurrence. When it became popular, I switched to Spotify, and a few years ago, when the enshittification of that service became unbearable, I switched to Apple Music. The latter service at least pays their artists somewhat decently, and is not half as ugly as the Swedish start-up.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup></p>
<p>And I can tell you <em>exactly</em> when I started to use Spotify: 2017. Because that’s when my music library stops. I stopped paying artists, and paid a multinational conglomerate of international corporations instead. A few weeks ago, the last album in my library was from 2017. It was “You’re Not as ___ as You Think” from Sorority Noise. After that, I just stopped buying albums directly.</p>
<p>At first, Spotify seemed like the promised land: Just listen to music when you want to, and what you want. No need to either endure YouTube advertisements or paying for an entire album, even though you don’t yet know if you really like it enough. But soon, it turned out more and more to be … exhausting.</p>
<p>You see, there’s this phenomenon called “choice overload.” When you are faced with too many options to choose from, it becomes harder and harder to decide. And Spotify causes this, too. Since essentially all the music on planet earth was part of their catalogue, it became harder and harder to decide.</p>
<p>When we want to listen to music, we usually only have a vague idea. Now, if we are on Spotify, we <em>could</em> find the <em>perfect</em> song for the moment. But what if we don’t find it? We start to skip through song after song because all of them have <em>something</em> that we don’t like. In the end, we just don’t listen to anything really, tune in to one of their “mood” playlists, or start the auto-DJ. So in the end, we let an algorithm decide.</p>
<p>But now I had my music back. And that felt like breaking free. Because I suddenly had scarcity again. There <em>is no perfect song for every moment</em>, and that’s okay. Instead, now I have my very limited library of music, but which is heavily curated by the only person who knows what I really enjoy — me. I know both that I will never find the perfect song, but at the same I know that I don’t have any weird albums in my library. And it felt so relieving, finally being able to <em>decide</em> what to listen to again. Making it impossible for you to <em>maybe</em> making the perfect decision, it becomes possible to make a <em>fine</em> decision again.</p>
<h2>Open Source has Gotten a Long Way</h2>
<p>Another pleasant discovery I made after I transferred all my music to my NAS was that Open Source really has gotten a long way. I still vividly remember my uphill battle against Ubuntu 06.04 back in the day. But this was almost exactly 20 years ago. And in those two decades, a lot has happened. Here are pleasant discoveries I made in terms of being able to access my own music again with modern software:</p>
<ul>
<li>TrueNAS just works out of the box. No weird issues. I just had my storage, and could transfer data. Period.</li>
<li>It offers FOSS apps to enhance the NAS, and turn it from a simple storage device to a media server. Among them, a streaming service on top of your own music.</li>
<li>Tailscale, a weirdly satisfying, “It just works™” VPN with which I can always remotely access my data if I need to.</li>
<li>And, to my pleasant surprise, a variety of iOS ready apps from Immich for my photo backup to Amperfy, which <em>even has a macOS client</em>!</li>
</ul>
<p>Essentially, I now have a full replication of Apple Music, but on top of my existing music library. Everything works literally as well as with Apple Music. Only with one difference: Now, I can finally pay artists (almost) directly again.</p>
<h2>Bandcamp and Why Owning Music is the Ethical Choice</h2>
<p>This brings me back to the title of this post. Now that I have the space and now that my music is accessible again and no longer on some forgotten SSD somewhere in my flat — why not do the deed and buy all the albums I only streamed for the past ten years?</p>
<p>I did some quick napkin math recently. The average album on Bandcamp (and, before that, CDs) costs about $10, sometimes more, sometimes less. It typically includes somewhere around 10 songs. A single stream gives an artist about $0.01. (Remember: With Spotify it’s much worse, Apple Music is one of the better services.) So you’d have to listen to the average album 100 times before an artist probably gets this amount of money. Also, I bet that the $0.01 is possibly an optimistic estimate, so I might have to stream an album more often to give an artist the equivalent they would get if I directly bought their music.</p>
<p>And, let’s be frank: How long are you ready to listen to an album on Apple Music (or, Spotify, which takes much, much longer) before you can’t stand it anymore? I bet that threshold is reached far before you actually gave your favorite artist the money they deserve. That’s why I find it <em>also</em> relieving to have gotten back to just owning my music again.</p>
<p>First, if I pay for it upfront, it’s only $10, but I <em>own</em> the music. And then I can stream it, maybe even more than 100 times. But I don’t have the pressure to do so. And even if I <em>never</em> listen to that album ever again — I still supported an artist I deemed worthy of support, and did not defer the decision whom to pay to a multinational corporation.</p>
<p>Now, one caveat though: When I was a student, Spotify was a great boon. I could discover and listen to artists as much as I liked for a price that I could afford back then. Paying for every album is not entirely cheap, and so I get that these music streaming services enable low-SES groups to enjoy music they like without going into debt. But I believe those of us who can — and I certainly can — afford to buy music directly should do so. Because this way, we not only support artists much better than making them throw their music out for crumbs, but also enable poorer people to enjoy music.</p>
<h2>Final Thoughts</h2>
<p>When I got the opportunity to receive this old computer and turn it into a NAS, I expected just a way to finally offload some of my data to a more secure space. I did not expect the revelations that came with it. I neither expected to suddenly feel so much more relieved because I know my data is securely backed up, nor that I would start to rediscover the joy of enjoying music.</p>
<p>It really seems to be true: We’ve played through the internet, and now it is time for a “return to monke,” or how that meme goes. With proper broadband access wherever we are, and with cheap electricity to run a computer 24/7, we can stop relying on SaaS-businesses more and more. Now, having a NAS is a privilege that not everyone has. And I strongly believe that we are a far cry from when we can finally stop relying on enshittified services. But I see that there is a path.</p>
<p>So, if you have the disposable income and the time to do so, I invite you: Set up a NAS, stop paying for music streaming, and just set up your own streaming service. The feeling is incredible.</p>
<p>In the past two weeks, I have dumped I believe $100 onto some of the albums I came to enjoy in the past ten years but never owned. And now I finally do. And it feels great. I think I may even concoct a few music recommendations some time this year — so stay tuned!</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>To be clear: I really like Sweden and Swedes more generally. I’m a big fan of Swedish Metal (of course), I really enjoy Valheim, and love the people in Norrköping. But man, Sweden also has some of the worst offenders of late-stage capitalist start-ups in stock, including Klarna that drives people into debt, and Spotify that drives artists into debt.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>Why 8 GB of Memory Might Still Be Enough</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/why-8-gb-memory-might-still-be-enough" />
  <id>https://www.hendrik-erz.de/post/why-8-gb-memory-might-still-be-enough</id>
  <published>2026-05-01T07:00:00+00:00</published>
  <updated>2026-04-30T16:51:43+00:00</updated>
  <summary type="html"><![CDATA[Whenever I visit online discussions and someone is about to buy a new computer, one of the first and most fiercely discussed questions is always: &quot;How much memory do I need?&quot; This is typically answered with &quot;More&quot; or &quot;More than you think.&quot; But I think that this is silly, especially in times when memory is priced closer to gold than to consumer electronics. In this article, I want to provide some suggestions for you to determine how much memory you might actually need. Spoiler: Depending on what you do, 8 GB might still be sufficient.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/why-8-gb-memory-might-still-be-enough">
    <![CDATA[<p>Today, I’d like to talk about something that is a bit off for this place: the endless discussions and uncertainty connected to how much computer memory one needs. Whenever someone wants to buy a new computer, one of the first questions they typically ask online is: “How much RAM do I need?” And, especially on Reddit (which, typically, is being Reddit), the answer is almost always “You need more.” Since I’m mostly browsing Mac forums, this question pops up a bit more since Apple tends to price its RAM closer to the price of gold than to other electronic supplies. But, thanks to Sam Altman and his co-conspirators, this question has become a frequent occurrence in the Windows space, too.</p>
<p>Now, this article is not going to be ground-breaking. If you want or need the fastest and latest, go ahead and get a 192 GB set of DDR5 memory for … let me check … ah, the price of an entire M5 Pro MacBook. This article is about guiding you towards a good estimate for how much memory you <em>actually</em> need, not for saying you should “just get more.”</p>
<p>So, today I want to give a layman’s guide to “How much memory do I <em>actually</em> need?” I’ll walk through what memory is, why it’s hard to judge how much you will need, and how you can learn how to estimate how much memory you effectively use day-to-day.</p>
<h2>The Basics of RAM</h2>
<p>First, some basics. Memory, or RAM (short for “Random Access Memory”), is a piece in every computer that holds data just like your regular computer storage, but with four distinct differences: it is faster, closer to your CPU, has much higher bandwidth, and can look up data quicker.</p>
<p>Let’s compare it a bit with your regular storage that you have in your computer; typically an SSD or an NVMe-drive. Those have gotten huge increases in their capacity over the past decade. When I bought my first SSD, I believe it had 64 GB of capacity. Now 2-4 TB of capacity are quite common. On them, we store all our data – primarily files and applications. And the operating system, of course.</p>
<p>Storage is usually connected to your computer using an SATA cable or, in the case of NVMe drives, dedicated PCIe-lanes. However, both SATA- and NVMe-connections are relatively slow. Even though modern computers read data quite fast, the speed limitations of your storage mean that, if you open a very large application, it can take a few seconds until you see something on screen. The smaller an app is, the faster it typically starts. That’s the core reason why we have RAM: it uses a completely separate set of connections to your CPU and is also physically colocated with your CPU. Therefore, it can read and write data much faster. It works similarly to your storage, but because of its physical position and dedicated connection, it is much faster. You can imagine the difference of connection speed between regular storage and memory like the difference between a school-zone street and a twenty-lane highway.</p>
<p>When you start an application – especially one you rarely use (this is important for later) – it needs to be read from your computer’s storage, and placed into the computer’s memory. That’s why it sometimes takes a while to open a program: It quite literally needs to be moved from your storage to your RAM. Once it’s in there, it actually starts. The same happens with most regular files when you open them: The Word document you just clicked also has to be moved into your system memory before it will be displayed. (Some files can be streamed, but that is a different discussion.)</p>
<p>Once an application or file is in memory, your computer can productively work with it. Don’t get me wrong: theoretically it is possible to work with files directly from a storage device, but in almost all cases (i.e., what you do in your own personal home) that won’t work nearly as well as we would like. That is the fourth difference between storage and memory: Storage is meant to hold large amounts of data for a long period of time. Memory is meant to hold a bit of data momentarily, and make it accessible in an instant. Storage is optimized to read or write large chunks of data (such as files or an entire program) at once. But as your program does its thing, it will frequently jump between many different instructions. That requires — you guessed it — <em>random</em> access. And computer memory is optimized for that random access.</p>
<p>That’s why programs need to be kept in memory: If we didn’t, it would take an eternity for your program to do anything. For example, when I press a key in my editor that I write this blog post in, quite a lot has to happen. The key press needs to be registered, then handed off to the program. That program will then handle the key by, say, adding it to a file buffer, recording a “change” event in my editor that I can undo, and more. All of this means that the program needs to access several parts of its own instructions to do so. If all of that were to happen on your storage, you would feel it, because storage is magnitudes slower and less-optimized for this kind of task than your memory.</p>
<h2>Why Your Memory is Always Full</h2>
<p>Now with a basic understanding of memory, let’s tackle a common comment we can see on the internet: our memory is always full. When you open the Task Manager (Windows) or Activity monitor (macOS), you’ll likely see that  almost all of your available memory is taken. For me, for example, it currently shows that 14 GB of my 16 GB are taken.</p>
<p><em>Oh no! I need more RAM!</em> But do I, really? I myself personally (!) actually do need more memory, yes. But not because I am running out of it. You see, when you go and buy some storage, let’s say an external SSD, that is meant to hold data persistently. You move some photo backup from your phone onto it and then <em>it stays there</em>. The only thing you can do to make it go away is by actively deleting that data. Also, because of physics™, your storage has a limited lifetime. It’s made up of small memory cells that simply wear out over time. After a few million read- and/or write-accesses, it will just straight up break. Now, the rated lifetime of your storage is not a hard number, and there are ways to destroy it faster or slower. But it is something to consider.</p>
<p>This is why we oftentimes tend to keep some space on our storage free. This both gives us the ability to store more data if needed, and the safety that, if some sectors on our storage die, there is leeway for our computer to use the unused space to avoid the damaged memory cells.</p>
<p>Computer memory does not have this limitation. It does not care how many times you read or write data to or from it. And as such, any byte of unused memory is a waste of energy. You could literally write all the data from your storage into memory many, many times over, and your storage would fail much earlier than your memory.</p>
<p>That is why your computer will try to keep as much of the data from your storage in your memory as possible. Any application that is already in memory does not have to be read from storage first. This increases load times (you have to wait less until the app opens), and reduces wear and tear on your storage. A win-win situation. However, if your laptop’s battery runs out, all the data in your memory will be gone. That’s its Achilles Heel.</p>
<p><em>How</em> your computer decides what data to keep available in memory differs, but it will likely decide based on how often you use an application. Your browser is likely always in memory, even if you quit the app. And it is probably the first thing your computer loads into memory even before your wallpaper even appears after a reboot.</p>
<p>The more applications your computer can keep in memory even if you don’t use them, the better. These apps will start blazingly fast and allow you to do more in less time. But of course, it will only keep applications in memory if there is space. Any application that you actively use must be in memory because otherwise it would be unusably slow, as mentioned earlier. So when you start apps you rarely use, your computer must first load it into memory. And if there is not enough space, it will start dropping apps that it has thought you might use, but didn’t.</p>
<p>That’s why your memory will in many cases always be (almost) filled to the brim: It is inconsequential, but has many benefits for you. (Nota bene: you may see, especially in new computers or on Linux servers that memory isn’t actually filled. This is usually because you rarely open and close apps on a server. There, the programs the server runs are in memory, as is their data, but that doesn’t change often. But this is different on your personal computer, where you constantly open and close apps.)</p>
<h2>Swap, or, When A Lack of Memory Actually Becomes an Issue</h2>
<p>Now, with this knowledge at hand, it’s easy to identify a situation where all this neat “keep memory used all the time” can break down. And that is if you keep open so many applications at the same time, and work on so much data, that the amount of data your computer considers to be “important” is so large that it exceeds your available memory. If you then open that one large Excel spreadsheet or that one additional application, you will suddenly start noticing your computer become sluggish.</p>
<p>This is because now you have presented your computer with a challenge: You told it that you need that one big spreadsheet, but there is no more memory marked as “optional.” But because it needs to load it into memory, it has to make a decision. And so it will take a look at all your applications and documents, and identify one that you haven’t touched in a while. It then takes that block of memory, and moves it out of memory and onto your storage. That is known was “swapping,” because it swaps a part of your current memory with a new app or file so that you can continue your work. As soon as you pull that application that the computer just swapped to disk into the foreground, your computer then has to quickly move that data back into memory, and swap it with another block of data. You may be able to notice that when it takes a perceptible longer time to bring an app to the foreground. That’s typically a sign of your computer having to swap memory.</p>
<p>So that’s bad and a clear sign that we need to upgrade our memory, right? Here’s the thing: It might not be. Because how often does this really happen to you? All the time? Probably not. Typically, you can make it easier for your computer to actively close apps you don’t use. That will then mark that block of memory as “optional” and your computer can remove that if necessary. If you immediately restart the same app, it will be fast because it’s still in memory, but if you have opened another one in between, it can always re-load it from your storage. If you want to avoid upgrading your memory, the first step is to simply reduce the amount of memory that your computer deems important.</p>
<p>That is, coincidentally, what Apple refers to as “memory pressure.” If you check the memory tab of the Activity Monitor, you can see it in the bottom of the window. When memory pressure is low, that means that your computer thinks that there should be plenty of memory available for any unexpected move you might make. Then the memory graph in the Activity Monitor will be green. As you fill up your memory with applications and data, it will turn yellow, indicating that your computer still thinks it can fulfill all your requests, but might have to think a bit harder about how to do so. But once your memory pressure reaches red, that tells you that you have so many applications open that your computer simply cannot cope with what you’re doing. In that case, switching applications will almost certainly involve swapping. That is, while the one app you are currently using, is fast and in memory, no other app is, and so switching apps <em>always</em> means you’ll have to wait a moment.</p>
<p>And that’s when we can actually start to think how much memory you may <em>actually</em> need.</p>
<h2>A Note on Browsers</h2>
<p>Before we do so, however, I have to say a word about your browser. Until now, I have just talked about applications and data. But browsers are something special. The reason is that, nowadays, most websites are less a website and more full-blown software packages. And that means that your browser works more like its own mini-operating system inside your actual operating system.</p>
<p>Right now, I have two tabs open that take up more memory than the actual, main, Firefox process. That is because these are tabs that run an entire web application. Have you ever wondered why you can edit spreadsheets both with Excel on your computer and in Google Drive at the same time? Well, you certainly can, but it’s important to realize that Google Drive will consume just as much memory and processing power as Excel. The only reason is that this application will only be downloaded on demand when you open the website, rather than when you install the app.</p>
<p>That’s why browsers nowadays typically come with their own little task manager. Effectively, they have to reproduce the memory management of your operating system. Sometimes, you may see that clicking on a tab takes a noticeable amount of time until it actually opens. That is because your browser, likewise, takes a look at all your tabs and starts to unload unused tabs while you’re not looking. Then, when you focus the tab again, the browser has to load it once more. That’s the reason web browsers are typically the biggest memory consumers. Try to remember: Most tabs you have open are less boring websites, and more like all the apps you <em>additionally</em> keep running in the background.</p>
<h2>How Much Memory do You Really Need?</h2>
<p>Now we can finally talk about ways to determine how much memory you need. Of course, the simple answer is that “more is always better.” But we’re not here to discuss the maximum. Instead, we’re searching for the minimum. To figure out how much memory you actually require, there are two important numbers to consider. First, how many apps do you typically use at the same time? Their size is a good starting point for figuring out the amount of “mandatory” memory your computer needs. This includes figuring out the amount of browser tabs you typically need. When you have those identified, you’ll need to understand how much data you are working with. Every spreadsheet you open in Excel will count towards Excel’s memory usage. This is not a clear 1:1-relationship because every document has some overhead in terms of memory usage. So while the file sizes will give you an estimate, it’s not going to be exact.</p>
<p>Armed with that information, the minimum amount of required memory should be the memory consumption of your largest app with the largest amount of data that you can find. Because that needs to fit entirely into your memory. If it doesn’t, parts of it will be swapped back and forth, and that will make your experience worse.</p>
<p>Then, you can start adding more apps and data, because that number only tells you how much memory you need to just run that one app. But you oftentimes run more than that. So start adding the memory footprint of other apps you use very often. For me, this would be Firefox, Zettlr, and my mail program. Most other apps I typically keep closed until I need them. And I tend to keep the amount of tabs in Firefox small, so I don’t differentiate between open tabs. But I also don’t regularly use big web-apps. I try to keep actual work out of the browser. Your use-case may differ, so take a second look at how much your browser actually consumes.</p>
<p>Once you have a set of your “most common apps” and “most-used data,” you’ll have a good understanding of how much memory will be sufficient for you. And I’d argue that for most people, 8 GB of memory will be plenty. If all you <em>actually</em> have to use for work are two programs at any single time, but you always run out of system memory, try not keeping every program installed on your computer open at all times ;). Quitting programs from time to time is a good habit to foster.</p>
<h2>When you Actually Need More Memory</h2>
<p>Now, that gives a good way to estimate the amount of memory for the average person. But there are indeed groups that need more memory, and here I want to shed some light on why I believe to belong to one of these groups, and how to identify if you belong to one such group, too.</p>
<p>Because while checking which apps you typically use is a good first indicator, it will be a false flag for many people. Let me use myself as the Guinea pig. Sometimes I’m very promiscuous with keeping apps open, but for most of my day, I would actually be quite fine with just 8 GB of system memory. Right now, looking at my memory consumption, the only obscene number is Java, which currently has 5 GB of memory. This primarily means that I have to complain to LanguageTool (my spell checker) that their app is quite hungry, but aside of that the actual “important” memory on my machine right now sits at about ~4-5 GB. Which means, 8 GB gives me plenty of overhead to open a few more apps.</p>
<p>However, the guide I provided above is a red herring in my case. The reason is that multiple times a week I work with large amounts of data. “Big data” as we’ve called it 20 years ago. My datasets are typically about 60–200 GB in size. Does this mean I need to buy the MBP worth of DDR5 RAM? I mean, technically it would make everything faster. But no, typically it is possible to stream data, which means that, of these 200 GB of my data, I keep only ~10 GB in memory at all times, closing files again if I don’t need them anymore. This requires a bit of dabbling with the memory management, but it’s easy once I write code that actually keeps the memory requirement low. It’s just another constraint to keep in mind.</p>
<p>But in any case, this does add to the minimum RAM requirement. Because there’s a balance to be had between memory efficiency and speed. I could write code that requires 100 MB of memory at all times, but that would be awfully slow, because my computer would have to read and write data from and to storage much more often. So it’s beneficial to keep larger amounts of my data in memory at all times. (Again, remember that your storage is good at reading large chunks of data, but often struggles if you load many tiny portions.)</p>
<p>That’s why the minimum requirements for my memory are at about 16 GB which is exactly the amount of memory I have.</p>
<p>But I mentioned that I wanted more, and the next computer I will get will have at least 64 GB of memory. Why the quadrupling? Well, there is a cheesy reason, and a pragmatic reason. The cheesy reason is that I really want to run bigger LLMs on my computer. An LLM, just like an app, really should be loaded completely into memory at once. So if you want to run a model of 16 GB size, you will really need to have at the very least 24 GB of memory. But, more pragmatically, while I can make every data analysis work with 16 GB of memory, and I have done so for the past six years, it is slowly starting to become a bottleneck. As my analysis skills improve and the data I handle gets more complex, the benefit of having more fast memory available to do more becomes more appealing. Because, at the end of the day, time is money, and being able to more quickly churn through my data will have a huge impact on my ability to deliver research results.</p>
<p>To be absolutely clear: For more than half of my average work week, I won’t need more than 10–14 GB of system memory. But for those days when I actually have to run my entire pipeline to fix a problem in some paper? I absolutely will be thankful for having those 64 GB.</p>
<h2>Final Thoughts</h2>
<p>This article deliberately gave no fixed numbers, because there are many variables at play. Rather, what I wanted to do is give a how-to guide to determining how much memory you actually need without either falling for the “you always need more!” trap nor accidentally kneecapping oneself. I strongly believe that if you don’t do much fancy stuff on your computer and 8 GB of memory is insufficient, this is something you can fix yourself, and I’m sure you have more apps running than you realistically need. But I also believe that, if you feel like your computer is slow because you have only 16 GB of memory but deal with 200 GB of data, you won’t need 200 GB of memory.</p>
<p>One can say what one will about the memory pricing of Apple, but the 8 GB of the MacBook Neo are more than sufficient <em>for their use-case</em>. Nobody who needs to run data analysis should buy one, but data analysts also aren’t the target audience for the Neo. I feel like a lot of people tend to get anxious with choosing memory because it’s so weird to think about it. It’s not like some storage where the formula is literally “the size of my current data + 20 % overhead.” It is a dynamic number that will always fluctuate. And I do see why that makes it difficult to make a confident decision as to what you might actually go for.</p>
<p>But please, folks, especially on Reddit, stop suggesting some high school students computers with 64 GB of memory. That is a waste of money. Especially in this economy.</p>]]>
  </content>
</entry>
<entry>
  <title>Security Advisories and Cognitive Overload</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/security-advisories-and-cognitive-overload" />
  <id>https://www.hendrik-erz.de/post/security-advisories-and-cognitive-overload</id>
  <published>2026-04-03T10:00:00+00:00</published>
  <updated>2026-03-30T08:20:55+00:00</updated>
  <summary type="html"><![CDATA[Security advisories are a mechanism by the open source community to distribute potential software vulnerabilities to their developers confidentially. It is a vital mechanism to ensure software remains safe to use. However, in recent years, there has been an increase in low to medium severity reports which tend to drown out critical reports that need much faster response times. A rant on cognitive overload that decreases the security of software.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/security-advisories-and-cognitive-overload">
    <![CDATA[<p>A few weeks ago, <a href="https://www.hendrik-erz.de/post/can-we-still-trust-our-software">I have written about some issues I have experienced with software in recent time</a>, prompted by issues with Zotero after their 8.0 update. But there is a second dimension to security and trust in software: Software – especially Open Source – is currently bombarded with “security advisories” from left and right. I think that’s bad. Not because I want software to suddenly become less secure, but because this has – at least for me – led to some serious issues that I believe are crucial to address.</p>
<h2>Trusting Software</h2>
<p>To quickly recap the primary point I made with my previous article: Yes, we still can trust software. But we should be cautious, because the various recent issues are likely aggravated by the lenient use of GPT models in writing code. In a few years, software might be less stable than we would like. The main point is that some software developers appear to be less concerned with stability than with the mindless addition of AI features out of fear to be “left behind.” This includes some very large software companies upon which millions of people rely.</p>
<p>Today’s article is about the other side: the developers. Because not just users need to trust their software. Developers need to trust the software they use, too. They need to trust in their ecosystem. And I have the gut feeling that this trust is beginning to crumble.</p>
<p>Nota bene: I am not talking about the constant issues with package managers that get bombarded with malware. That has been discussed at length elsewhere. This is a related phenomenon, but not the primary focus of this article.</p>
<h2>Security Advisories</h2>
<p>The point of departure for this argument is an observation I made in late March. After a few days off from developing in order to finish an R&amp;R and do other research-related activities, I took a few hours on a weekend to clean up some of the piled up work in the Zettlr repository. While doing that, I noticed a disturbing amount of security advisories on the app.</p>
<p>If you don’t know what I’m talking about: A security advisory is basically a notification which developers (especially on GitHub) get that some of the libraries they depend on have critical vulnerabilities. If you’re not developing software, you have likely never seen one, because they get distributed only via private channels to developers directly. The basic idea: if they are being treated as confidential, developers have a bit more time to patch these vulnerabilities before they become public knowledge. Because as soon as they are, malicious actors will be able to exploit them.</p>
<p>So in principle, these advisories are great. And usually, they are easy to fix. For most of them, GitHub’s own management system, “dependabot,” is able to produce automated changes that can be included in the code base with a single click. Sometimes, however, it’s more difficult. And then you have security advisories open on your own app that you can’t really close.</p>
<p>I mean, you could (by marking them as “not applicable”), but I’m convinced that getting into the habit of dismissing advisories can set a bad precedent. Instead, I prefer to keep them open until the corresponding developers have fixed them. Which leaves a bad taste. Right now there are about a dozen security advisories open on Zettlr, none of which I am able to close without wreaking havoc on the entire build process.</p>
<h2>Cognitive Overload</h2>
<p>But that is not even the main issue. There were always advisories I couldn’t close. There was always something security-critical happening. So is life. The main issue is the <em>frequency</em> with which this happens right now. A few months ago, <a href="https://techcrunch.com/2025/08/04/google-says-its-ai-based-bug-hunter-found-20-security-vulnerabilities/">Google made the news</a> with a GPT-model that allegedly had found twenty security vulnerabilities completely autonomously. That was in August of last year. By now, such pipelines are likely capable of much higher speeds. They automatically sift through repositories and flag anything that’s not as tight as an airlock on the ISS.</p>
<p>This also applies to security researchers: Even if security researchers don’t deploy a fully automated system to find security vulnerabilities, they likely employ the help of AI to find new issues. All in all, it appears to me that the speed of new security vulnerabilities being reported outpaces the speed of developers fixing them.</p>
<p>And this is a problem. Because developers – especially those that do it in their spare time – have limited capacity to react to this quantity of reports. It feels a bit like fixing a breaking dam.</p>
<p>There is a good point to be made here: malicious actors likewise employ AI systems to find vulnerabilities faster. So the argument goes: if a security AI finds an exploit, we can expect that a malicious AI will find the same one, and possibly already has. So we better open up reports to ensure that those are at least known and the developers can work through them.</p>
<p>However, that’s not how humans work. I have noticed a significant reduction in the alertness I have towards security vulnerabilities. When I see a new report that can’t be fixed right away, I am becoming more and more likely to dismiss this. Then I forget about it until the report closes itself because someone on the other side of the globe has finally updated their dependencies. And this is bad, because with the influx of non-critical security advisories, the actually severe ones are more likely to slip through, because there is nothing inherent to distinguish them from the less severe.</p>
<h2>Loss of Context</h2>
<p>This is compounded by a significant loss of context in security reports. There are security advisories, and there are security advisories. I have not yet seen a security advisory that didn’t point out an actual issue. But, and this is the crucial difference: I have seen plenty advisories that are simply not critical in a certain context.</p>
<p>When I started writing software, I was very afraid of receiving such advisories, because I was afraid that someone could do bad stuff with it. It took a few years, until a good friend mentioned an instance where there was a security vulnerability open on some software package that hasn’t been fixed <em>for over a decade</em>. And that with good reason: It complained about a piece of code being insecure that was working <em>as intended</em>. I don’t remember which software it was, but this was eye-opening.</p>
<p>A truly secure piece of software is one that doesn’t do anything. <em>Every</em> piece of software is inherently insecure. Because if it didn’t do things that <em>may</em> become problematic, it would not be of much use to us. There are nuances here, so let me use an example.</p>
<p>Zettlr allows remote code execution. That’s usually a red flag for every security-minded person. But not for Zettlr. Why? Because that scary-sounding “remote code execution” essentially just means that Zettlr uses Pandoc to bind together some HTML file with MathTeX downloaded from a CDN server. The MathTeX library is effectively code that comes from somewhere else and that could do anything on your computer. And yes, Zettlr could block that. But that would also mean depriving its users of much of the perks during export.</p>
<p>It is this context that is hugely important. Most still-open security advisories on the Zettlr repository are related to software that are specified as <em>development</em> dependencies. Which means, they never make it in the final app. Rather, they are only used while making the app. Which, in turn, means that this software will be executed primarily by people with strong knowledge about the pitfalls of JavaScript code and who know what could go wrong.</p>
<h2>An Example</h2>
<p>Let’s look at one such advisory together, shall we?</p>
<p><img src="https://www.hendrik-erz.de/storage/app/media/blog/security_advisory_tar.png" alt="security_advisory_tar.png" /></p>
<p>The screenshot shows one currently open security advisory on the Zettlr repository: “node-tar Vulnerable to Arbitrary File Creation/Overwrite via Hardlink Path Traversal.” A few things to note here:</p>
<ul>
<li>GitHub mentions that it cannot automatically update the dependency to a non-vulnerable version.</li>
<li>The package is pulled in via a lot of different packages.</li>
<li>If exploited, attackers can supposedly essentially read any file on a server.</li>
</ul>
<p>Sounds scary, right? Well, certainly. There’s just one issue: the chances of this actually happening are closer to zero than to one percent. Now, don’t get me wrong: This could still be exploited. But in the particular context in which the vulnerable code is used here, it’s negligible. Let me tell you why:</p>
<ol>
<li>The good news: Users of Zettlr aren’t affected, because that code never ends up in the binary. It is only used during development.</li>
<li>More specifically, <code>tar</code> is used during development to download assets necessary to build the application.</li>
<li>The archives that are being fetched are known, and a malicious actor cannot supply a link to their own, malicious archive.</li>
<li>Last but not least: This advisory assumes that the code is executed with a user that can do <em>anything</em>. Further below in the advisory is a table with potential effects of this. One line is instructive: “User Creation: <code>/etc/passwd</code> (if running as root) → Add new privileged user.” Think about this: only on a server where the node user is root could this become a problem. Essentially, the advisory therefore assumes that one node-package must offset all the bad decisions of someone setting up a server.</li>
</ol>
<p>That is what I mean with “context matters.” It is absolutely an exploitable vulnerability. But only in very narrow and specific contexts. That is the reason this hasn’t yet been fixed by the maintainers of <code>@electron-forge</code>.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> But: it sows some doubts about how safe the software really is that I use to build software.</p>
<p>This is a pattern I observe frequently: Someone reports a potential issue, but when thinking about this for more than ten minutes, it’s clear that it’s not actually an issue. Many security reports – especially in recent months – have started to look like someone opened a 101 book on secure software design and wrote a dumb script that reports these patterns whenever it finds them. To make matters worse, this approach then can even miss things that malicious actors can <em>actually</em> exploit, because it only looks for “exploitable-looking” patterns. And this noise can drown out real issues, where assumptions are violated and which can actually hurt users.</p>
<h2>Missing the Obvious</h2>
<p>One final example. A few weeks ago, I received an email about a potential security vulnerability in Zettlr. I was very happy, because the email looked thorough and verbose. It included code snippets showing the vulnerable code, how it could be done, and proposed a fix. Unfortunately, none of that was relevant.</p>
<p>First, while the mentioned code had a weakness in the past, a few days before I received the email I actually fixed what they reported. Moreover, that fix to the vulnerability was even contained <em>in the code snippets the email included to show supposedly vulnerable code</em>! So someone just copied the code from somewhere without double-checking it.</p>
<p>But secondly, and much more importantly: They completely overlooked another place in which I have made <em>the same</em> mistake. After reviewing the report, I found that other place, and fixed that as well. This report broke something. Because it showed me that I cannot rely as much on security researchers as I possibly wanted to.</p>
<p>I am still fighting with reports of possibly vulnerable code that isn’t a vulnerability in our context, because it is <em>expected</em> that it can do certain things that – in other contexts – would indeed constitute a vulnerability. This lack of consideration for context drives me mad.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> Because I take every report seriously – because there could be a severe issue hidden under a pile of nonsense.</p>
<h2>Final Thoughts</h2>
<p>Possibly severe vulnerability reports are currently being buried under mountains of semi-relevant reports that require so much context that being hit by a piano on the street sounds more likely. And this has real consequences. cURL has stopped their bug bounty program. And I have seen it necessary to add the following clause to Zettlr’s security notes:</p>
<blockquote>
<p>We take every security-related notification seriously, will read through them, and respond to them. If we determine that you have reported expected behavior, we will indicate that in our response to you. In this case, you may not open a CVE (<a href="https://github.com/Zettlr/Zettlr/blob/develop/SECURITY.md#what-happens-after-a-report-has-been-sent">see our security protocol below</a>). If we further have strong reason to believe that your notification has been made in bad faith, we take the liberty to fully ignore your report or even take action depending on the situation. In such or similar cases, we affirm our legal right(s). Zettlr is a collaborative effort that only works if everyone works together, and we will defend this.</p>
</blockquote>
<p>So, long story short: With security vulnerabilities it’s like with everything else in the world; too much of it can also kill you. We need <em>much less</em> security vulnerabilities, and much fewer people trying to become security researchers by reporting semi-relevant stuff just to get a CVE under their belt. We need to critically re-evaluate what constitutes a security vulnerability, and under which circumstances, and we need to increase the amount of “low” and “medium” advisories. Because no, just because you found unsanitized HTML code doesn’t mean it’s “critical.”</p>
<p>Y’all need to chill the f*** down.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>If you want to hear a silly joke: yes, Electron forge is maintained by Microsoft.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>What also drives me mad is that every such advisory has a severity attached to it, which can be low, medium, high, or critical. 99% of all reports I receive are marked as “high” or “critical.” I’m sorry, but no. If every security problem is high or critical, then none is high or critical.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>Heuristics and Assumptions</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/heuristics-and-assumptions" />
  <id>https://www.hendrik-erz.de/post/heuristics-and-assumptions</id>
  <published>2026-03-28T17:00:00+00:00</published>
  <updated>2026-03-28T19:55:50+00:00</updated>
  <summary type="html"><![CDATA[Heuristics are everywhere. But every heuristic is always also just a good assumption. And assumptions can be violated. In this post, I share a story about when my data suddenly turned foul, and the cause of this was anything but obvious. The lesson? The road to hell is paved with good assumptions.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/heuristics-and-assumptions">
    <![CDATA[<p>Humans operate by heuristics. Much of our daily lives are driven by heuristics, where we can infer from past experience how a current situation will most likely play out. Instead of having to consciously think about, say, calling the elevator, we already know what that button does, so we waste no conscious thought on this little action. Instead, we can reserve our mental capacities to new and unseen problems. In the social sciences, this has been referred to as the “System I/System II” approach. You have likely already heard of Daniel Kahneman’s book “Thinking, fast and slow.”</p>
<p>One thing we rarely think about, however, is that every heuristic is always also an assumption. And in science, assumptions are risky decisions. The reason I am writing all of this – in case you wondered (which you probably did, right…?) – is because recently I had the unfortunate experience of a heuristic turning into a wrong assumption.</p>
<h2>Background: Reproduction</h2>
<p>First, some background. For an R&amp;R, I recently wanted to reproduce my main results using a different data set. As a reminder, I work with U.S. Congress, and so the data was obviously additional congressional floor speeches. I essentially had to download some new data for more recent congresses, and mold it such that it looked somewhat like my existing data. Then, all it took was copying over my old code, checking that it worked, and using it to run the same analysis on the new data.</p>
<p>In this context, I experienced both a heuristic (which was clever), and an assumption (which of course did not hold true). So, let me share my experience.</p>
<h2>Heuristic</h2>
<p>First the heuristic. One issue I had was that the new data did not have any references to Congresses, but instead actual dates. My old data had references to the congress in which a speech was held, so I somehow needed to get that for the new data, too. I could have indeed extracted the exact start and end dates of each Congress, written some logic to check if a speech’s date lies between a congress’s start and end dates, and called it a day.</p>
<p>That would have been perfectly precise. But also a headache to implement, because dates can be messy. So instead I decided to go the lazy route and use a heuristic. To check which congress a speech was held in, I came up with this genius contraption:</p>
<pre><code class="language-python">def year_to_cong (year: int) -&gt; int:
    return ceil((year - 1788) / 2)
</code></pre>
<p><em>What?!</em> Well, this function is a heuristic turned to code. Each congress is two years long, and the first congress convened in 1789. So, if we want to know which congress a speech on November 11, 2011, has been held in, we just take the year (2011), subtract 1789–1 from it (221), divide it by 2 (111.5) and round it up (112). A quick check tells us that a speech held on Nov. 11 of that year would have indeed fallen in the 112th U.S. Congress. Why minus 1, you ask? Well, if we wanted to know the congress of the year 1789, just using 1789 that would give us $\frac{1789-1789}{2} = \frac{0}{2} = 0$, which is incorrect. However, using one less gives us $\frac{1789 - 1788}{2} = \frac{1}{2} = 0.5$, which, when rounded up, is $1$. A neat side effect of this is that the second year of that congress, 1790, will give us $\frac{1790-1788}{2} = \frac{2}{2} = 1$, which remains $1$ even after rounding up.</p>
<p>Now, if you know anything about U.S. Congress, you may now have thought to yourself “Wait a moment, that doesn’t work out! Congress doesn’t convene on January 1st! Also, this has changed over time! Until the early 20th century, congresses started in March, a whole quarter year in!”</p>
<p>And yeah, that’s correct! And that’s what makes this a heuristic: I can rely on two additional pieces of context. First, the new data only encompasses congresses 111 until 119 (the current one). So these all convened on a January 3rd. This means that at most two days will be “wrong.” And second, this is only for an interesting tidbit, and not for the main results. Which means that it does not have to be perfect.</p>
<p>In this particular instance, this is a perfectly reasonable assumption to make.</p>
<p>Which brings me to an important insight: Every heuristic is also an assumption. If we were talking about the main results of this paper, the reviewers would be absolutely correct to slap my wrist for doing something like this. But since I knew I wasn’t going to lose a quarter of a year with this assumption, but rather only 2 days of an entire congress (which is $\frac{2}{365 * 2} \approx 0.003 = 0.3%$), it felt like a reasonable tradeoff to make.</p>
<p>But as with any assumption, these can go wrong.</p>
<h2>Assumptions</h2>
<p>After I verified several times that the shape of my new data was the same as my old data, I copied over my old analysis code, only adjusting it for the newer dates I was dealing with. By keeping the code as close to the original as possible, I wanted to ensure that the steps performed on the new data are the same as the old one. And it all worked perfectly!</p>
<p>…until it didn’t.</p>
<p>You see, I was relying on the <code>numpy</code> library for a lot of numerical calculations in the analysis. And <code>numpy</code> is nice enough to let you know if there is any division by zero.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> And this happened in my code. But there was also a different warning, one that I haven’t seen before: “overflow encountered in scalar.” Wait, <em>what</em>?</p>
<p>I dug a bit deeper, but could not find any scalar multiplication, so I dismissed it, because – again – the code itself just worked fine. But I checked the results, and I had impossible numbers that <em>could not have been right</em>. So I had to investigate.</p>
<p>After several hours of checking every line of my code, I was almost ready to give up, until I tried something that made these warnings disappear, and that fixed the results.</p>
<p>But I wanted to understand what happened. So let’s unwrap this.</p>
<h3>The Malicious Code</h3>
<p>I had a series of vectors with whole numbers (integers), and the central operation was to calculate cosine similarity on them. So my code looked essentially like this:</p>
<pre><code class="language-python">vec1 = np.array([0, 1, 1, 0, 2], dtype=np.uint16)
vec2 = np.array([1, 0, 1, 2, 1], dtype=np.uint16)
cos = 1 - distance.cosine(vec1, vec2)
</code></pre>
<p>If you have encountered this same issue already, you may already see the problem. To give you a hint, if you want to try to solve it yourself, here’s the solution that fixed the issue for me:</p>
<pre><code class="language-python">vec1 = np.array([0, 1, 1, 0, 2], dtype=np.uint16)
vec2 = np.array([1, 0, 1, 2, 1], dtype=np.uint16)
cos = 1 - distance.cosine(vec1.tolist(), vec2.tolist())
</code></pre>
<p>If you still wonder what is wrong with this, let me explain:</p>
<p>I instantiate the vectors with the data type <code>uint16</code>, that is, an unsinged 16-bit integer. I do so because I don’t need more. 16 bit are sufficient to store numbers from 0 until 65,535. This choice rested on the assumption that no element of these vectors could become negative (hence the “unsigned”), and that they would stay well below the 65,535 limit.</p>
<p>And these two assumptions were, indeed, right. So nothing wrong here.</p>
<p>Next, these vectors get passed to a cosine distance calculation. This function calculates the distance between two vectors, and it ranges from 0 (they are the same) to 1 (the vectors are orthogonal to each other). Since I needed the cosine <em>similarity</em> instead, I simply subtracted the distance from 1, which turns the distance into a similarity.</p>
<p>And this code is also correct. This is why it took me so long to find the error: There is nothing wrong with this code.</p>
<h3>Understanding Memory Layouts</h3>
<p>Instead, the issue lies somewhere else. To understand this, we have to properly understand how <code>numpy</code> works, because it differs quite tremendously from how Python works. Python does not have many types of numbers. Essentially, it knows the difference between integers and floating point numbers, but that’s mostly it. (There are some nuances, but these are not important here.) <code>numpy</code>, however, knows a lot more types of numbers. It knows signed and unsigned numbers, and various <em>widths</em> of these numbers (that is, how large they can be), ranging from 8 to 126 bits.</p>
<p>This is not because it would be <em>mathematically</em> necessary, but rather due to technical reasons. You see, math has infinite precision, but computers can only work in binary – zero or one. Furthermore, <code>numpy</code> aims to be a very fast library, because it is intended to work with large amounts of data, including LLMs. And once you need to ensure that some generative pretrained transformer (that is, ChatGPT or Gemini) runs reasonably fast, you need to start thinking heavily about optimization.</p>
<p>One of the most fundamental optimizations you can do is write your code in machine code directly. This is why <code>numpy</code> is actually not written in Python, but mostly C and Fortran. The central building piece of <code>numpy</code> is a numerical library called <a href="https://github.com/OpenMathLib/OpenBLAS">OpenBLAS</a>. This makes the code <em>much</em> much faster than if it just used regular Python code.</p>
<p>A second optimization you can do is perform calculations not one by one, but instead by performing many calculations in parallel. This is one of the central reasons why <code>numpy</code> has made LLMs so fast on personal computers. But how does this work in detail? Well, I’m glad you asked!</p>
<p>Here is (a simplified version of) how <code>numpy</code> will store the array <code>[0, 1, 1, 0, 2]</code> in memory:</p>
<pre><code>00000000 00000000
00000000 00000001
00000000 00000001
00000000 00000000
00000000 00000010
</code></pre>
<p>You will notice that each row is one number and each row has exactly 16 bits. That’s what happens if you tell <code>numpy</code> to store some data as a <code>uint16</code> (“unsigned 16-bit integer”). Here’s the same numbers, but stored as <a href="https://en.wikipedia.org/wiki/IEEE_754">IEEE 754</a> 64-bit (double) floating point values, the default <code>numpy</code> type:</p>
<pre><code>00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00111111 11110000 00000000 00000000 00000000 00000000 00000000 00000000
00111111 11110000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
01000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
</code></pre>
<p>Quite a bit different! First, you can clearly see that this representation of the same numbers uses four times as much space (320 bits instead of 80). And secondly, <em>how</em> the numbers are represented is wholly different. It does not matter too much how exactly this representation works, but if you are inclined, <a href="https://www.h-schmidt.net/FloatConverter/IEEE754.html">here is a demo</a> that allows you to play around with some numbers.</p>
<p>What matters here is primarily that the <code>uint16</code> standard requires less memory to store data than <code>float64</code>.</p>
<h3>The False Assumption: <code>numpy</code> performs type checking</h3>
<p>With that context, here is finally what goes wrong with the code above:</p>
<p>When we pass two <code>uint16</code> arrays into the cosine similarity calculation, at some point, the function will perform a dot-product between the two. And this is where things go wrong: Because the dot-product involves multiplying scalar values with each other. For example, $3 * 4 = 12$. After that step follows a division.</p>
<p>And what can happen when you multiply or divide values? Exactly: you may end up with fractional values. And that’s why <code>numpy</code> will operate on these two arrays <em>as if they were following the default floating point layout</em>, and that is double-precision 64-bit integer. Essentially, since <code>numpy</code> just operates on the raw memory, it was trying to stuff a 64-bit floating point number into the memory space of a 16-bit integer, which it <em>then</em> tried to <em>read</em> as a 16-bit integer again. Or rather, it did not try, it actually did succeed — at least for the first result. All other results led to what is known as a buffer overflow (that is, the second number did not fit entirely, so the remainder got written into the void). This caused the first warning: scalar overflow.</p>
<p>Then, when the library tried to <em>divide</em> two such Frankennumbers, it could very well be that the denominator had only zeros and that is the textbook definition of a division by zero, which is undefined. That led to the second runtime warning.</p>
<p>Now the next question is: why did all of this get solved once I called the <code>tolist</code> function of each <code>numpy</code> array before passing them to the function? Well, because that function turns the <code>numpy</code> array into a regular Python list. You will rarely see this, because for any number that needs more than 64-bit, this will be a lossy procedure (native floats in Python usually use 64-bit double precision, so 128-bit or 256-bit numbers will simply be cut off). But in this case it worked, because it <em>removed an assumption</em> from my code.</p>
<p>Essentially, <code>numpy</code> tries to be fast, and as such it relies on the user to ensure everything is correct.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> <code>numpy</code> itself assumes quite a lot of things, and if you – the user – trust <code>numpy</code> a bit too much (which I did), it will fail you. If <code>numpy</code> actually checked every individual array, this would severely slow down the operations, and would make things such as locally running AI almost impossible. Thus, it foregoes the type checking and just assumes that, if you give it a <code>numpy</code> array, that it will “just work,” because it hopes that you have done the thinking, and it can just start churning numbers.</p>
<p>But in my case, that was wrong. Once I gave <code>numpy</code> a regular Python list, I essentially <em>forced</em> it to convert that list to a <code>numpy</code>-array so that it could do its magic. And by default, <code>numpy</code> uses – you guessed it – a 64-bit floating point double! That’s what fixed that code.</p>
<h3>Investigating Further</h3>
<p>Finally, why did this become a problem for my data reproduction step, but was not an issue while I was calculating my main results? Well, for the main analysis I actually saved down the data to CSV files and then loaded them in after the fact. And, since at one point I also did try to work with floating point arrays, not simple integers, here’s the function that loads in the arrays from disk before passing them to the cosine function:</p>
<pre><code class="language-python">vector = np.array([float(l) for l in loadings], dtype=np.float64)
</code></pre>
<p>Do you spot it? Right: I loaded the data from disk and cast them directly to float 64. This works with both integers (because there’s enough space for them), and floating point values. If you pass an array of these types of numbers into the cosine function, the memory layout of the numbers is as expected and nothing goes wrong.</p>
<p>Here’s my mistake: For the new data, since I already had the code, I decided never to save down the data in between steps, because it’s a much smaller dataset. And as such, I decided in my infinite wisdom to use <code>uint16</code> because it saves on memory. Well, now you see what I got from trying to be smart!</p>
<h2>Lessons Learned</h2>
<p>So, what do we learn from this? I personally have learned quite a few things:</p>
<ul>
<li>Optimizing code too heavily is usually dangerous. In my example, using 64-bit floats would not have dramatically increased my memory consumption (after all, the data isn’t all that big), but because I was afraid it would, I violated an assumption of <code>numpy</code>.</li>
<li>If you are working with an interpreted language such as Python or R, it is easy to forget about the intricacies of how everything is stored in memory. Since your programming language typically does the heavy lifting for you and will decide whether to store something as a float or an integer, it is easy to forget about this. But <code>numpy</code> isn’t Python, it is just very well hidden C code. Because it wasn’t obvious, I did not consider memory layout (even though the explicit usage of the <code>dtype</code> argument would have you assume I did).</li>
<li>You <em>will</em> overlook the fine nuances between what you did with your old code and what the changes you do to your new code will actually imply. That’s why it is always important to (a) check your results, and (b) trust your gut. If your gut tells you that some numbers can’t be, do not give up until you find the issue.</li>
<li>Pay attention to errors. If there is an error – or even just a warning, and you’re dealing with scientific code, do not dismiss it. It <em>will</em> come back to bite you.</li>
<li><strong>Raise errors where errors are due!</strong> This one goes out to the developer who decided that buffer overflows and divisions by zero should not raise an exception by default (forcing you to fix them before your code runs). What were you thinking? It is very easy for humans to ignore warnings, but very hard to ignore errors. A division by zero is by definition undefined. It is impossible to return a value in that case, not even <code>inf</code>. If you still do, you violate mathematical rules. By making this a mere warning and returning a nonsensical number, you endanger the trust users have in your library. Indeed, the first thing I did after understanding the issue was go back to my <em>old</em> analysis code to see if that would now also raise these warnings. Fortunately, it didn’t, and I found the reason for it. But I actually did continue to use my wrong data until the next step in the analysis in R raised an exception that forced me to finally go back and fix these warnings that I ignored at first. Ignoring warnings is indeed convenient.</li>
<li>And, lastly: data is invisible. When you write code, you have the delicate task to write explicit statements that deal with data that will flow through these statements, without you usually observing that happening. You will have to trust your abilities and do a lot of debugging to make your code work. And this can lead to wrong assumptions, as in this case. So always try to remember: your data is invisible in your code.</li>
</ul>
<p>Let’s see how many weeks that insight lasts until I make this mistake again ;)</p>
<p>Thanks for reading!</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Nota bene: Who decided that this should only print a warning and return some default value, instead of raising an exception?! In what world is a division warranting a little wave with the finger and returning anything other than <code>undefined</code> (or, <code>None</code>, in the case of Python)?&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>This is sometimes referred to as a “footgun.” Because it allows you to conveniently shoot yourself in the foot.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>Can We Still Trust Our Software?</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/can-we-still-trust-our-software" />
  <id>https://www.hendrik-erz.de/post/can-we-still-trust-our-software</id>
  <published>2026-03-13T11:00:00+00:00</published>
  <updated>2026-03-12T08:04:42+00:00</updated>
  <summary type="html"><![CDATA[In the past months, the software I use daily has started to get less and less reliable. While mostly anecdotal evidence, I believe this to be a potential canary for deeper problems that plague software. For now, we may still be able to trust our software. But what about in ten years?]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/can-we-still-trust-our-software">
    <![CDATA[<p>A few hours before starting to write these lines, I experienced something I hadn’t seen in almost a decade. Visual Studio Code – probably the most used code editor globally, and what I had been using for almost 10 years at this point – crashed. The entire window just died shortly after an update with a catastrophic error, and everything had to be reloaded.</p>
<p>Now, a simple crash should not cause such a hefty judgment as is the title of this article. But if it just was this error, I would hardly be writing these lines. No, it is just the most recent instantiation of an issue that I observe with increasing frequency, and possibly a pattern that we may witness more frequently going forward.</p>
<h2>Crashing Software</h2>
<p>Software crashes are neither new nor uncommon. Software crashes every day, everywhere, and it did so since the day software was born. The bugs of today may be more metaphorical than literal, but the impact on the functionality of the tools we use every day has stayed the same for decades.</p>
<p>I remember when I first started working on computers, I experienced software crashes multiple times a day. Be it Microsoft Works, video games, or the operating system (Windows 95) itself. Nothing seemed stable. But back when I started playing around with computers, this was acceptable. Everything important was still written down on physical paper. Software crashes were annoying, but even if the computer was so broken it took days to get it back up, this was nothing to worry about, since we (back then school children) had plenty of non-computer activities to choose from.</p>
<p>Later, I also experimented with Linux. In 2005 or 2006, I decided to try out the brand-new Ubuntu distribution for myself. It took me about two weeks (!) to get all the necessary drivers installed so that the OS would properly detect the basic hardware that was present. This was not a nice decision, and, being a teenager, I quickly went back to Windows so that I could get my games to work. Windows wasn’t great either (until Windows 7 came around), but it worked better than Linux.</p>
<p>There is a certain stickiness to the trust we place in technology. I have been burned so much by malfunctioning software that I even hesitated to get smartphones when they came out. When Apple released the first iPhone in 2007, I could not be bothered less. I remember vividly the day I got my first own smartphone, an HTC. I believe it was some time in 2014. Until then, I used a Nokia 3310 (of course), and after that a Sony Walkman phone. All of them had physical buttons. First, it was a necessity (because there were no “smartphones”), then a habit.</p>
<p>It took me years to buy an all-touchscreen phone because I feared what would happen if the touchscreen broke. I really wanted to have hardware buttons because I believed that they would be more reliable than virtual (software) buttons. But, alas, we all have to go with the times, and as such I got an all-screen phone once they were relatively cheap to get, and I had the necessary funds.</p>
<h2>Trusting Software</h2>
<p>From then on, something changed. I started to trust software to do its thing. Just as I came to realize that, no, touchscreens “just work” unless something so bad happened that would also wreck hardware buttons (like a car running over it), I also started to rely on software more. Besides the fact that term papers and presentations just couldn’t reasonably be made with just pen and paper, software wasn’t as unpolished anymore.</p>
<p>The times of having to pray to the gods of C# that your software let you at least save your work before it died in a fulminant blue screen of death seemed to be over. And it just went better. A year after getting my first all-touchscreen phone, I decided to get a used 2011 MacBook Pro and try out the Apple ecosystem. The switch from Windows to macOS (I’m sorry, <em>Mac OS X</em> back then) was eye-opening. Suddenly, I was actually able to use a Unix operating system <em>without</em> having to spend weeks installing drivers. And, moreover, it quite literally “just worked.” Software crashes seemed to be a thing of the past. While I did retain my habit of shutting down my laptop every night until a few short years ago, I don’t think that this was a necessary precaution to prevent computer crashes.</p>
<p>And every piece of software that came afterward just got better and more stable. By 2017, I had even forgotten that software could crash. Yes, it could still have bugs, and I wasn’t as naïve as to think that I shouldn’t keep any backups anymore. But the frequency with which I had to resort to my backups went to zero very quickly. Aside from a few instances of data loss that, for the most part, could be attributed to my own doing, there was nothing to fear. Software was perfectly stable. I could <em>trust</em> software. Not superficially, but deeply.</p>
<h2>The Watershed</h2>
<p>And this golden age of software stability would continue for years. Even – or <em>especially</em> – during the pandemic, software was reliable and stable. Whatever I did to my computer, I was unable to make the software budge. I started getting more adventurous with installing untested and highly unstable software on my computer (in early 2021, that was <a href="https://www.hendrik-erz.de/post/setting-up-python-numpy-and-pytorch-natively-on-apple-m1">PyTorch with its more than experimental Metal backend support</a>), but alas – it just continued to work.</p>
<p>The same also counts for my cloud setup. I run quite a few servers, the oldest of which is now running almost uninterrupted since 2015. Both the server and the applications on it have worked almost without any hiccups for more than a decade at this point. No matter where I ran software – my computer, my phone, my server – everything just worked; incredibly stable and without many notable issues.</p>
<p>And it continued to work up until 2025. We had a glorious decade of incredible software stability.</p>
<p>But then, something changed.</p>
<p>At first, I didn’t pay too much attention because it just affected the Windows ecosystem.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> Windows – especially Windows 11 – got worse again, after a phase in which Microsoft did a good job at improving the OS. The watershed for Windows 11 came probably some time around late 2024. Reports started to mount that the quality of Windows was beginning to deteriorate. Users complained about being forced to link their Microsoft accounts to their computers; advertisements in the start menu; more frequent crashes and blue screens.</p>
<h2>Microslop</h2>
<p>And then, Copilot happened. After Microsoft acquired a large share in OpenAI and thus got exclusive access to its Codex model, they started incorporating Copilot functionality into every corner of the operating system. At least for the general consumer. My computer, which runs on Windows 11 Education (thanks, LiU), has neither seen any ads in the start menu nor Copilot popups. But that may change at any time. In any case, Microsoft has clearly doubled down on pushing generative AI into every corner.</p>
<p>And this has affected every Microsoft software. And I (have to) use a lot of Microsoft software in my day-to-day work.</p>
<p>For example, GitHub. I use both their mobile app, and the platform itself. The platform is still relatively robust. There are no major issues that I could identify, and I use GitHub <em>heavily</em>. But there weren’t any meaningful improvements either. Their mobile app is pretty much in the same shape as a few years ago, too, when Microsoft acquired them. Many of the issues that plagued the app back then still plague the app today.</p>
<p>But what has changed in the meantime is the Copilot integration. And boy, were they aggressive.</p>
<p>The first sign of Copilot being shoved down our throats was when, all of a sudden, a big, fat round button appeared in the GitHub app after an update a few months ago. I certainly did not activate this myself. Instead, it was an on-by-default feature. Since I don’t use Copilot and would like to decide when I do, I quickly turned this off in the settings. The wording for that setting is instructive: It is called “Hide Copilot.” As if it was an integral feature of the app that one could hide if <em>absolutely necessary</em>. Also, one is always enrolled in what Microsoft calls “Copilot Free” – insinuating a subscription that I never signed up for. The wording itself is maddening.</p>
<p>What is worse is when I was authoring a quick change to one of my README files some time ago via the website. When I clicked on “Commit…” all of a sudden, something was filling in the commit message for me. I hadn’t even touched the keyboard, but some text suddenly appeared in the text field. It turns out that Microsoft decided to add a new on-by-default setting that would let their GPT-model auto fill-in commit messages without being asked. And you know the best part? When I immediately headed into the settings to turn this nonsense off, the website told me that I had allegedly “used up” some amount of my “free tokens” for the month. All by the magic of some unquestioned management decision that I sincerely believe the developers implemented regretfully.</p>
<p>Since then, it has been an uphill battle to keep Copilot out of every corner of Microsoft products that I have to use. And I have to use a lot of Microsoft products:</p>
<ul>
<li>My university has fully bought into the Microsoft ecosystem and as such, I have to deal with the eternal bugs of Sharepoint every single day.</li>
<li>This also means having to deal with the insane brittleness of the Active Directory SSO system. Every three to four days I have to fully reset all website data from all Microsoft domains in Firefox to be able to log into Outlook, because of a bug that has been unfixed for 3 years at this point that, when I log in to Outlook, will trigger a log<em>out</em> loop that I cannot escape otherwise.</li>
<li>On top of that, the usability issues of the GitHub mobile app and the constant new Copilot buttons in every free space on the GitHub website.</li>
<li>The shenanigans Microsoft pulls with its <a href="https://www.hendrik-erz.de/post/code-signing-with-azure-trusted-signing-on-github-actions">code signing system</a>.</li>
</ul>
<p>At this point, I’m actually quite <em>happy</em> that the bugs of its Office suite – Word, PowerPoint, and Excel – have remained the same for the past ten years. Indeed, I have started to appreciate the reliability of Microsoft Office. Yes, it has bugs, but it has <em>predictable</em> bugs, so once you encounter them, you develop a muscle memory to avoid them.</p>
<p>Which brings me back on track.</p>
<h2>The Downfall of Software</h2>
<p>While my little Copilot rant was somewhat off-topic (I <em>really</em> just needed to vent, apologies), it is also very much on-topic for the broader phenomenon I seek to describe. I don’t want to insinuate that software should be bug-free, both because that’s completely unrealistic, and because that’s not what’s important. Instead, I believe that the big issue plaguing software these days is that previously stable features started to break.</p>
<p>When software has shortcomings, we can account for that. We learn quick ways to work around bugs. A shortcut doesn’t work as intended? We’ll be able to learn a workaround using the mouse that is, while slower, reliable and does what we need. Pasting text directly into an app doesn’t work? Simply leave a text editor window open and paste text in there first. Certainly not as quick as directly pasting into the app we want to, but it’s workable.</p>
<p>But there is also a <em>regression</em> in software behavior that is starting to become noticeable. The biggest example of this is macOS Tahoe. I will never forget how Apple purposefully cut their keynote in such a way that at some point we had, I believe Tim Cook, stand in such a way in front of the logo that it just read “hoe.” And indeed, while I personally think that there are many good reasons for “Liquid Glass,” the issue is that during this revamp of the user interface, a bunch of stuff broke in other places.</p>
<p>One bug that is entirely new to Tahoe 26.3.1 is that keyboard layout switching is broken. As a refresher: I still use a MacBook with a German keyboard layout (a mistake I will fix with the next purchase), all my keyboards use the American ANSI layout.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> This means that I have to frequently switch layouts between <code>en-US</code> and <code>de-DE</code>. Thankfully, Apple has made that <em>as painless as it can get</em>: Simply press <code>Ctrl+Space</code>. This will toggle between the two keyboard layouts I have configured. Or rather, it <em>should</em>. Since the last update, I have noticed that this isn’t as straight-forward anymore. Today, when I need to switch between layouts, I first have to press <code>Ctrl+Space</code> for a second or more, then release the keys, and then press it again. If I’m unlucky, I have to repeat this process a few times until the operating system will actually switch to the other keyboard layout.</p>
<p>I am not going to bug you with any additional macOS bugs, since these have been documented <em>plentiful</em> elsewhere. Instead, I want to draw a bigger picture here. The big issue with Tahoe was not that it had a somewhat bumpy launch, and that the new design system faced issues. This is to be expected and not dramatic. Every new feature has a bunch of bugs that need to be ironed out. No, the problem with Tahoe is that it has <em>broken things that previously worked well</em>. If you introduce a <em>new</em> feature and that has a bug, users can adapt and develop workarounds. But if you have a feature <em>that has worked for years</em> prior, and suddenly that once reliable feature breaks, this is an issue. Because users have developed muscle memory, and if that muscle memory betrays them, this is a big deal. Because this will have a halo-effect on the rest of the software for years to come. Users still haven’t forgotten the regression in the iOS virtual keyboard that led to it <a href="https://youtu.be/hksVvXONrIo?si=QtY6MYB8dH5plQ-T">quite literally typing the wrong keys</a>.</p>
<p>This is what I believe to be the biggest issue right now: That once stable features start to deteriorate. What was once reliable and safe suddenly is brittle and could fail at any time. It seems impossible to trust your software.</p>
<h2>Betraying Trust</h2>
<p>And this is a bigger trend I wish to draw attention to in this article. I may be salty about Microsoft sprinkling Copilot everywhere you don’t look, but Copilot is usually a new feature that wasn’t there before, so we have no expectations towards it. Some may like it, some may not, but we all see it initially as a new feature that could be buggy, and so we don’t trust it. If it at some point starts working perfectly reliable, we will eventually start to trust it.</p>
<p>But the opposite is what is more concerning: When once stable features suddenly, and out of the blue, stop working.</p>
<p>A few weeks ago, such a thing happened, and I am still surprised by the fallout it had. A few weeks ago, Zotero 8 launched. As is customary with every new major version, I spent some time to study the Changelog before updating, and carved out an hour or so to be able to mitigate any issues. So I pulled the plug, and updated Zotero. And it worked well.</p>
<p>But a few days after the official release, I started seeing reports on the Zettlr forum that the <a href="https://forum.zettlr.com/d/120-csl-json-citation-file-issue">citation key autocomplete</a> was broken. For me, it wasn’t, so I was very puzzled why that may happen. Was it a bug I overlooked? No. A few hours later, a colleague sent me his also broken citation library. I loaded it into Zettlr and realized what was wrong: Somehow, citation keys were missing. I have <a href="https://zettlr.com/post/on-the-recent-issues-with-zotero-8-and-better-bibtex">written about the issue elsewhere</a>, so here is the gist: With Zotero 8, the developers finally added a dedicated field for storing citation keys; a feature that for the past decade has been provided by one of the staples of Zotero – a plugin called Better BibTex (BBT). However, apparently this change was not at all communicated beforehand. Indeed, if you study <a href="https://www.zotero.org/support/changelog">the changelog for Zotero 8.0.0</a>, you will not even find an entry announcing this change. It was silently done.</p>
<p>Since BBT was providing its own citation key logic, the plugin and Zotero were suddenly competing for attention. And this was an issue, because BBT is installed of thousands of Zotero installations globally, and therefore broke <em>a lot of users’ setups</em>. On the one hand, plugins always come secondary, and will need to adapt to whatever the main software decides. But BBT is one of the core plugins for Zotero. And apparently, Zotero didn’t communicate this change at all. Cue three weeks of frantic back-to-back updates for both Zotero and BBT in an attempt to re-harmonize how the plugin works with Zotero. Zotero itself even released a patch that <em>explicitly disabled unstable BBT versions</em>. That’s how bad that got.</p>
<p>And users were left wondering: Why was Zettlr’s citation key autocomplete broken? Or, worse (this happened to me), <em>why are all my cite keys gone</em>? Indeed, in my case, after restoring the cite keys, some (where authors and years of multiple items were the same) even got swapped, meaning I will have to carefully double-check <em>every citation</em> for those papers not yet published to ensure I actually cite the right stuff at the right time.</p>
<p>In short: It was a mess. And a significant breach of trust. Because the citation key feature worked <em>so incredibly reliable</em> that I didn’t realize that Zettlr, for the entirety of its existence of almost 10 years at this point, simply could assume that whatever CSL files Zotero would spit out with BBT, they would be properly formatted. I never heard any complaint about that in the past decade. And now, suddenly, my users were all over the place because stuff started to inexplicably break. And my software had to stop assuming, or <em>trusting</em>, that Zotero’s outputs were all well-formed.</p>
<p>This is not compartmentalized to this one anecdotal episode with Zotero. The <em>casus belli</em> for writing this article has been another piece of software, the one I started the article with. Visual Studio Code, or VS Code is the default development IDE today, having supplanted most of its competition over the past ten years. I myself use it, too, exclusively. It works for my PHP workflows, for my Node.js workflows, and my Python workflows. Hell, it works almost as well as RStudio in R workflows.</p>
<p>But for the past year, VS Code has seen a serious downgrade in terms of support. Previously, every monthly upgrade would have a bunch of fixes and features scattered over the user interface. But ever since Microsoft decided to go Microslop, the changelog of VS Code every month was a disappointment: 95 % “Copilot improvements,” and 5 % “Miscellaneous bug fixes.” Now, don’t get me wrong: Please, if you want to, add a bunch of AI-related features to your apps. I don’t care too much. And in the case of VS Code, I am able to disable all of it with a single setting.<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">3</a></sup></p>
<p>No, the real problem with VS Code is that the energy of the developers is turned <em>so much</em> towards Copilot improvements that existing and previously reliable features just break. Within the past 48 hours, I experienced (a) <a href="https://bsky.app/profile/hendrik-erz.de/post/3mgmmh64hv226">the built-in terminal disintegrating</a>, and (b) <a href="https://bsky.app/profile/hendrik-erz.de/post/3mgroz56dr226">the app outright crashing</a>. What makes these two issues so detrimental is not that software is not allowed to have bugs. No, it’s because neither of these issues has happened in the past 7 years of me using the software <em>daily</em>. I have come to trust VS Code to just work. It has never crashed, and the terminal has never malfunctioned before. Again, mind you: I use the software for <em>hours every day</em>.</p>
<p>If you allow me (in this very long article), one final example: Nextcloud. I have been using Nextcloud for more than a decade at this point to keep my important documents available both on my computer and on my phone. And it has always worked pretty flawless. One setting the app has is that you can tell it to not automatically sync folders over 500 MB. This setting has worked well for the past eleven or twelve years. But a few days ago, I noticed something weird: A file that I <em>knew</em> I had placed in the correct folder was nowhere to be seen when searching for it on my phone. It was a PDF of a new research paper that I added to Zotero. So once I was back home, I double-checked: the file was indeed where it should be. It just wasn’t synchronized. And that’s when I noted a yellow exclamation mark next to the folder. Hovering over it with my mouse told me what was going on: “Ignored folder.” Why would it suddenly ignore a folder that has synchronized for years prior?</p>
<p>What I assume must have happened is that a recent Nextcloud update has messed with the setting of which folders to not sync. And this has led to quite a few folders <em>never</em> synchronizing. Without telling me. After fixing these settings in the app, the next problem occurred: Because some files have been heavily edited (*cough* papers in R&amp;R *cough*) after Nextcloud decided to stop synchronizing them, I now of course had merge conflicts. And of course, it came at the worst time: Shortly before a deadline. Thanks to meticulous backups, I only lost about an hour of work. But it was still an hour of work that nobody could ever give back to me. Nextcloud, too, betrayed me.</p>
<p>When VS Code crashed this morning, I didn’t lose any data. At least the data retention of VS Code is remarkably reliable, so that feature is still not lost. But it was the one instance too many that finally made me stop and think. Our software is disintegrating, and if things go really awry, data loss in VS Code may become a thing in the future.</p>
<h2>Final Thoughts</h2>
<p>What do we make of this? Can we still trust our software? For now, we can. In this article, I barely enumerated instances of where software went <em>wrong</em>. I left out the much more plentiful instances of where software just works. But I think these few instances are a canary in the coal mine. We are not dead yet, but the bird is telling us to better get out. Yes, we still can trust software. But if these instances aren’t some spurious blips, software is starting to deteriorate — primarily consumer software. And once consumer software deteriorates further, what shall we do? Go back to pen and paper? Mistrust touchscreens? I highly doubt it.</p>
<p>I also do not think that these signs of deterioration are all caused by the advent of AI. That would be a wrong conclusion. Yes, AI certainly has made sloppiness part of the daily developer business, but the causes of the issues I have outlined here go far beyond this. There are middle managers forcing developers to implement features nobody has asked for, diverting scarce resources away from maintaining core functionality. There are communication issues that could be avoided by understanding the user base better. And there is what developers frequently call “software rot,” when software issues just don’t get fixed because the application is too large to have anyone keep an overview over it. There are simple hiccups that cause deterioration, and there is cohort change, where old senior developers leave with a bunch of intrinsic knowledge that the next generation has to start building up first.</p>
<p>The causes of software deterioration are as plentiful as there are applications on the market. But one thing is clear: Something <em>has</em> changed, and that is causing more and more issues across apps. The question is: what will we do? Will we just continue as we were, hoping that <em>our</em> data at least won’t be lost in a catastrophic crash? And what will developers do? Will they be forced to work on new shiny feature that some focus group has shown may increase user retention, instead of finally fixing that one absurdly outdated dependency with ten CVEs?</p>
<p>I honestly don’t know, and I’d argue that this article is more a rant than an analysis. I don’t like the developments I see, but I also lack the data to put a proper causal mechanism behind this. Is it just a fluke? Possibly, but given the amount of errors the many pieces of software I use every day exhibit I doubt it to be a simple coincidence. However, I also strongly believe that there is no monocausal explanation for what is happening.</p>
<p>We will have to see where all of this leads, but I hope that the software we all trust gets back on track soon. Yes, we can still trust our software. The question is: will we be able to trust it in ten years from now?</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Insert a flaccid joke á la “First they came for the communists” here.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>If you want to know why ANSI layouts can sometimes be better than ISO-layouts, <a href="https://www.hendrik-erz.de/post/coding-and-keyboards">read this</a>.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>Although it’s a setting that is deeply hidden in the preferences, and I failed several times to find it by searching for it. If you’re curious: It’s called “Chat: Disable AI Features.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>Where Zettlr Failed: How I Wrote My Entire Thesis Using (Almost) Only One Program</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/where-zettlr-failed-how-i-wrote-my-entire-thesis-using-almost-only-one-program" />
  <id>https://www.hendrik-erz.de/post/where-zettlr-failed-how-i-wrote-my-entire-thesis-using-almost-only-one-program</id>
  <published>2026-03-06T09:00:00+00:00</published>
  <updated>2026-02-10T13:50:33+00:00</updated>
  <summary type="html"><![CDATA[This is a (late) extension to both my PhD series and my “How I work” series. In this article, I explain the technical setup of my PhD thesis — how I integrated my data analysis pipeline into my writing, and how I enabled exports for the various journals I had to submit my work to.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/where-zettlr-failed-how-i-wrote-my-entire-thesis-using-almost-only-one-program">
    <![CDATA[<p>It has been a few days since I <a href="https://liu.se/en/news-item/research-reveals-the-link-between-language-and-lawmaking">successfully defended</a> my dissertation “On the Record: Understanding a Century of Congressional Lawmaking Through Speech and Vote Behavior” at the Institute for Analytical Sociology in Sweden. In the weeks preceding my dissertation, I have written a couple of reflection articles on <a href="https://www.hendrik-erz.de/post/five-years-of-studying-us-congress-what-remains">the case</a>, <a href="https://www.hendrik-erz.de/post/between-theory-and-methods">the theory</a>, <a href="https://www.hendrik-erz.de/post/what-is-analytical-sociology">and AS background</a> of my thesis. I explained lessons learned, how I view questions from before my thesis now, and so on.</p>
<p>But there’s one article which I knew I had to write, and I just realized I never did. Recently, someone asked on the Zettlr Discord whether they should fully commit to Zettlr for writing their dissertation. I wanted to link to this article, only to realize that it doesn’t yet exist. I had confused it with a lecture I gave shortly after defending at the <em>other</em> IAS (the Institute for Application Security in Brunswick, Germany) which I titled exactly like this article.</p>
<p>So, with a few months of (hopefully understandable) delay due to mental confusion, here it comes!</p>
<p>In this article I want to talk about how I wrote my entire thesis using only one (writing) application. Bear in mind that I have already written about the organizing principles like folder structure in another article, so I won’t focus on that here. Instead, I want to explain how the setup worked <em>technically</em>, with an emphasis on the part that I’m a <em>quantitative</em> sociologist and as such have to write a lot of code. If you’re interested in getting some inspiration for how to organize or structure your setup content-wise, <a href="https://www.hendrik-erz.de/post/how-i-work-part-v-zettlr-and-academic-markdown">please read this article</a>.</p>
<p>What I want you to take a way from this article is two things: First, Zettlr is absolutely up to the task of writing an entire thesis, and so I can only recommend it to anyone out there who doesn’t want to write everything in LaTeX. And second, even if Zettlr ultimately fails (because no program is perfect), if you commit to Zettlr, you can use a wide range of resources to fix whatever issues you may face on short notice.</p>
<h2>Writing a Dissertation: How I Used Zettlr</h2>
<p>In the following, I want to introduce you to the technical setup I used to facilitate writing an entire 200-page dissertation using Zettlr. This section focuses on three parts: First the integration of an entire data analysis pipeline into my writing workflow; second the integration of export templates and custom layouts into the exporter; and lastly a bird’s eye view over the entire structure.</p>
<h3>Integrating Data Analysis</h3>
<p>The main issue I had to solve when it comes to setting up my PhD workflow was that I wanted to keep Zettlr strictly as a writing app. However, as a quantitative sociologist, I also needed to write code and run data analyses, and that I didn’t want to do that in Zettlr. So how could I integrate that with the task of writing up my papers?</p>
<p>To get started, I first created a set of folders – one per paper – in which I could place all my text. If you’re writing a monograph, I still recommend a set of folders – one per chapter. Each of these folders I turned into a project so that I could export them whenever necessary. (In Sweden, you have a series of intermediate examination seminars for which you need to provide the drafts as PDFs, so having a way to export them alone in one click was a real time-saver.)</p>
<p>Within these folders, I set up one file per (sub-)chapter – in my case the classical “intro-background-methods-results-discussion.”</p>
<p>At the same time, I created a folder elsewhere that I didn’t load as a workspace into Zettlr where all my code needed to go. Effectively, it was a huge <code>.git</code>-repository<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> that I had constantly open in VS Code and RStudio. Within it, I created consecutive folders for each task, in the format <code>ddd_description</code>, where <code>ddd</code> was just an increasing number. The first digit was always <code>0</code> for purely exploratory analyses, <code>1</code> for analyses relating to my first paper, <code>2</code> for my second, and so on. The other two digits were simply incremental. The <code>description</code> was always one or more very short keyword(s) that helped me figure out which code was in what folder.</p>
<p><img src="https://www.hendrik-erz.de/storage/app/media/blog/phdata_folder_structure_annotated.png" alt="The folder structure of my data analysis code, annotated." title="The folder structure of my data analysis code, annotated." /></p>
<p>Next, I needed to somehow link my writing in Zettlr with the results of my data analyses. What helps is to recognize that the only link between analysis and writing are plots and tables. Your data analysis should produce some plots or tables, and those you need to include in your writing.</p>
<p>How I did that was relatively straight-forward: Whenever I produced a plot or table that would end up in a paper, I saved it to a file. Specifically, I saved it to the <code>assets</code>-folder within the corresponding paper project. So, for example, the path to my first paper folder that was loaded in Zettlr was <code>/Users/hendrik/Nextcloud/PhD/Paper 1</code>, and so I dropped the files into <code>/Users/hendrik/Nextcloud/PhD/Paper 1/assets</code>. Then, I dragged and dropped them into the text at the appropriate place.</p>
<p>One example is the following code that exports a plot I referenced in my third paper:</p>
<pre><code class="language-python">fig, ax = plt.subplots()
fig.set_dpi(1200.0)
ax.scatter(dem_xy[0], dem_xy[1], color = (0, 0, 1))
ax.scatter(rep_xy[0], rep_xy[1], color = (1, 0, 0))

# Annotate the points
for i, speech in enumerate(speakers):
  speaker = speech[0]
  x = speaker_xy[0][i] + 0.02
  y = speaker_xy[1][i] + 0.02
  ax.annotate(speaker, (x, y))

title = &quot;&quot;&quot;Semantic Speech Centroids on July 2, 2025
House discussion, &quot;One Big Beautiful Bill Act&quot; (H.R. 1)
Dimensionality-reduction via MDS&quot;&quot;&quot;

plt.savefig(&quot;PhD/Paper 3 - Vote Defection/assets/fig_1_speech_centroids.png&quot;, dpi = 1200.0)
plt.show()
</code></pre>
<p>See how the code at the very end both saves the plot to a file in my Zettlr workspace, and shows it for quick inspection. Within my actual text, I could then easily drag and drop the figure into the text.</p>
<p>The benefit of this setup was huge:</p>
<ol>
<li>All my outputs from the analyses were readily available to reference in my writing.</li>
<li>Whenever some part of my code has changed, I could just re-run the exporting code to overwrite the file. Each plot only had a single file, and so whenever I exported my papers for sending them out, they would always include the correct (=latest) plot (or table).</li>
<li>It avoids context-switching. Either I had VS Code or RStudio open and would think about data structure, or how to improve an analysis. My brain could completely forget all theory or the framing of my paper and only focus on the data. Or I had Zettlr open and could exclusively focus on the theory or framing of my paper, and completely ignore how I ran the analysis.</li>
</ol>
<h3>Integrating Templates</h3>
<p>The next big issue started to appear when I had to submit my work to journals. Usually, journals have their own style guide and expect any submissions to follow it closely. If you’re lucky, you get a LaTeX source file. If you’re somewhat unlucky, you get a correctly-formatted Word file. And if you’re really unlucky, you only get some vague instructions.</p>
<p>Luckily, the journal I submitted to (Network Science) offered a LaTeX template. The process applies to any kind of template, really.</p>
<p>I wanted to make the template available to Zettlr in a way that I could export individual files or entire projects with it (because I may very well submit another paper to the journal in the future). I also wanted to ensure that the template is generic so that I could share it with others. For that, I created a “template” folder and placed the template in there. The way I set everything up led to the <a href="https://docs.zettlr.com/en/guides/journal-latex-template/">comprehensive guide on submitting to a journal that you can find in the Zettlr documentation</a>.</p>
<p>With a prepared template at hand, I could point a defaults file to it, and then export whatever I wanted from Zettlr using this specific template. So, whenever I chose “Network Science PDF” as an export target, the PDF would follow the Network Science style guide.</p>
<p>If you are interested in exploring how I modified the Network Science template, <a href="https://github.com/Zettlr/pandoc-templates/tree/main/templates/cup-journal">have a look at the repository here</a>. If you plan on submitting to any Cambridge University Press journal (Network Science, Political Analysis, Political Science Research and Methods, Evolutionary Human Sciences, or Natural Language Processing), the template will work for you.</p>
<h3>The Overall Structure</h3>
<p>There were a few more things to integrate into Zettlr as a hub that I am skipping over here. This is because I already wrote extensively on them. If you’re interested in how I create and reference reading notes, <a href="https://www.hendrik-erz.de/post/how-i-work-part-iv-reference-management-reading-literature">read this article</a>. And if you want to learn how to couple your Zotero library with Zettlr, <a href="https://docs.zettlr.com/en/editor/citations/">read this documentation page</a>.</p>
<p>To show you what the <em>entire</em> setup looked like at the end, here is a comprehensive diagram:</p>
<pre><code class="language-mermaid">flowchart TD
    A[&quot;Literature (Books, papers)&quot;]
    B[Reading Notes Folder]
    C[Paper Files]
    D[Statistical Code]
    E[Paper Assets Folder]
    F[Pandoc]
    G[PDF Export]
    H[Zotero]
    I[Library File]
    J[LaTeX Templates]
    K[Pandoc Defaults]

    A --&gt; B
    B --&gt; C
    D --&gt; E
    E --&gt; C
    C --&gt; F
    F --&gt; G
    H --&gt; I
    I --&gt; C
    I --&gt; F
    J --&gt; K
    K --&gt; F
</code></pre>
<p>As you can see, whenever I read something, I created a reading note (within Zettlr) that I would then reference in my paper files (i.e., I looked at the reading notes to see what I could cite, and find arguments). At the same time, any output from my statistical code would end up in the corresponding <code>assets</code>-folders, also to be referenced in my paper files. Finally, all my citations are stored in a single, large CSL JSON file (currently ca. 1.6 MB large).</p>
<p>These three parts all went directly into the paper source code (read: the Markdown documents I wrote). Finally, whenever I exported a paper, this would actually run Pandoc under the hood, which itself referenced the various templates and defaults files I had to produce a PDF export.</p>
<p>The setup is quite complex, but it requires a one-time effort to set up and proves to be a huge time saver. In the end, it followed a few good principles:</p>
<ol>
<li><strong>DRY (Don’t repeat yourself)</strong>: This means that every file was unique. Instead of mindlessly copying back and forth files, I kept exactly one copy of each so that I never got confused what the most recent figure was that I needed to include in my writing. If I <em>ever</em> needed to reference a previous version of a file, I would have had the option to search the <code>git</code> history. But this has never happened once in the five years.</li>
<li><strong>Modularization and separation</strong>. I kept everything logically separated from each other. My writing was entirely contained within Zettlr, my data analysis in VS Code and RStudio in a different folder, my library file is located at yet another place, and all my templates were also in a different folder. This way, I always knew where to head when I had to adjust something. It also ensured that I didn’t have frequent context-switches while working. When I was writing, it was impossible for me to accidentally stumble upon a template and thus get mentally distracted. Instead, when I wrote, I could focus only on that. When I improved my analyses, I wasn’t getting confused by my writing. And so on.</li>
<li><strong>Responsiveness</strong>: We all know this issue. We’re ready revising a paper, and because we’re good students we are done a few days before the deadline. <em>However</em>, we suddenly get an email from a colleague who did indeed manage to provide feedback on it after all. Unfortunately, they did find a major flaw in your analysis. What do you do? Well, I’ve had plenty of such situations, but the way this setup worked, it was extremely efficient to go back to the analysis, find and fix the error, re-run the code, and simply re-export the paper. The setup was incredibly fast and responsive in this sense. It maximized the time I could spend working on the contents of the paper, and minimized the time to, e.g., export it.</li>
</ol>
<p>In fact, this setup was so tuned at the end that fixing various parts across all my papers took me mere hours instead of weeks. Towards the end of the dissertation, it worked extremely well, and I even started to generalize this setup for my Postdoc time.</p>
<h2>Where Zettlr Failed: Last-Minute Thesis Export</h2>
<p>As I approached the defense date, suddenly my calendar filled up quite a bit with tasks I had to do. I had to get in touch with our library to request an ISBN for my dissertation, I had to contact my university’s printing office to set up a timeline, retrieve the test-print, and submit the final proof of the dissertation. This means that my weekends suddenly also filled up, and I lacked time to work on Zettlr itself.</p>
<p>Initially, I thought that what worked with individual papers of my dissertation should also work for the entire dissertation: Instead of exporting single papers, just turn their containing folder into a project, and export <em>all</em> of them at once.</p>
<p>However, there were a few issues with this:</p>
<ol>
<li>Since my dissertation was cumulative, the papers needed to be standalone, even if they ended up being bound into a single book. This means that things like bibliographies couldn’t be shared between them, and footnotes needed to start at <code>1</code> every time.</li>
<li>I needed to add a few additional things into the mix, such as an abstract in both English and Swedish, or an acknowledgements section, which was a separate file. This also includes additional metadata such as an AI and funding statements.</li>
<li>Each paper/essay/chapter needed to start with a divider page featuring a large ornament.</li>
<li>Despite all four essays (introduction + my three papers) being standalone, the page numbers needed to be consecutive from start to finish.</li>
</ol>
<p>Some of these issues only require a bunch of Pandoc settings, so they weren’t insurmountable. A dedicated reference section for each essay was more tricky. And these ornament dividers had to be implemented in LaTeX, so I couldn’t think of a clever way to include them.</p>
<p>Add to all this the fact that the original LaTeX template of my university, while comprehensive, was in no way compatible with Pandoc. Due to the extensive amount of custom elements that had been implemented, it was also very difficult to “template-ize” it to use it with Pandoc.</p>
<p>So I faced a choice: Either spend the majority of my – at that point precious – time forcing this beast of a template into Zettlr’s structure, or just stick with LaTeX entirely and have more time to proofread my thesis before sending it to the printing office. I opted for the latter option.</p>
<p>First, I adjusted the LaTeX template a bit to see that it would work. I then proceeded to insert all metadata and information that wouldn’t change anyhow (such as the ISBN). Next, I prepared TeX files for the acknowledgements (which essentially consisted of mostly plain text, so there isn’t really a difference between Markdown and LaTeX, and I could just copy and paste my acknowledgements from the Markdown document), and the abstracts.</p>
<p>After that work had been done, I needed to somehow get my papers into the mix. However, I knew that I would have to adjust some of the text, so if I were to simply export all my papers to LaTeX once, it would be a pain to, e.g., adjust the citations. Also, it would violate the DRY principle, since then I suddenly would have two copies of the same text lying around, making adjusting both even more tedious.</p>
<p>So I opted for a hybrid approach. I kept my papers in Zettlr. However, I sidestepped Zettlr’s internal exporter, and instead created a <code>makefile</code> that would just run Pandoc directly on the files. To do so, I went with the most stupid, but working approach to just verbosely write everything out. Here’s how it looked:</p>
<pre><code class="language-bash">@echo Compiling kappa ...
pandoc -f $(PANDOC_READER) -t $(PANDOC_WRITER) -o $(KAPPANAME) \
$(PANDOC_OPTS) --resource-path=&quot;$(KAPPADIR)&quot; \
&quot;$(KAPPADIR)/01 - Introduction.md&quot; \
&quot;$(KAPPADIR)/02 - Policymaking and Sociology.md&quot; \
&quot;$(KAPPADIR)/03 - AS and CSS.md&quot; \
&quot;$(KAPPADIR)/04 - Politics in US Congress.md&quot; \
&quot;$(KAPPADIR)/05 - Policymaking in US Congress.md&quot; \
&quot;$(KAPPADIR)/06 - Computational Text Analysis.md&quot; \
&quot;$(KAPPADIR)/07 - Ethical Concerns.md&quot; \
&quot;$(KAPPADIR)/08 - Summaries.md&quot; \
&quot;$(KAPPADIR)/09 - Conclusion.md&quot; \
&quot;$(KAPPADIR)/10 - Outlook and Future Research.md&quot;

# Repeat this for all four papers
</code></pre>
<p>The “name” and “dir” variables simply held the source folders and my wanted output filename. To configure Pandoc required a bit more code, so I collected all arguments I would need at the top of the file:</p>
<pre><code class="language-bash"># Bibliography file for Pandoc
BIBFILE = /path/to/my/library.csl.json

# Base dir for all papers
BASEDIR = /path/to/paper_dir

# Pandoc reader and writer properties
PANDOC_READER = markdown+mark
PANDOC_WRITER = latex
# Options for Pandoc (split up into multiple variables)
CITEPROC = --citeproc --bibliography &quot;$(BIBFILE)&quot; --csl apa.csl
# DEBUG: I want to use --file-scope=true, but can't because that will mess with the placement of the `#refs` special references div.
PANDOC_ARGS = --top-level-division=section --table-caption-position=above
PANDOC_OPTS = -F pandoc-crossref $(CITEPROC) $(PANDOC_ARGS) --template=chapter.shim.tpl.tex
</code></pre>
<p>As you can see, instead of creating a defaults file (which I could have done, now that I think about it), I provided everything directly as CLI flags.</p>
<p>Also, that “Debug” statement you see there? Well, that was <a href="https://github.com/jgm/pandoc/issues/11072">an issue with Pandoc</a> I couldn’t solve ad-hoc. I could’ve actually reduced the exporting code by quite a bit if that bug didn’t appear. But I had to just repeat the same exporting code four times over, and in the end, it wasn’t too big of a deal, so I stuck with it.</p>
<blockquote>
<p>Note also the Pandoc reader <code>markdown+mark</code>. This extension is enabled by default when you export with Zettlr since a few versions, because it turns out it’s quite good to highlight sections you need to look at again in yellow, even in a final proof. Also, note that this command enabled <code>pandoc-crossref</code>, which I needed to cross-reference my plots and tables. Again, this is a feature I implemented in Zettlr before finalizing my thesis because it turns out to be quite useful. This is how most features have made it into Zettlr in the past five years: Because I literally faced this issue and decided to solve it.</p>
</blockquote>
<p>At this point, I had a one-line command that would trigger an export of all my essays into LaTeX files. Great! To make the papers fit stylistically into the entire template, I created a new LaTeX template, <code>chapter.shim.tpl.tex</code>. It was very basic:</p>
<pre><code class="language-tex">% Minimal Pandoc-compatible template to render a single paper into a chapter
% using the YAML frontmatter title property as chapter title.
$if(essayno)$
\orndivider{$essayno$}
$endif$

% From the memoir manual (p. 77):
% \chapter[⟨toc-title⟩][⟨head-title⟩]{⟨title⟩}
% ...where
% * toc-title: Table of Contents-Title
% * head-title: The title displayed in the head of the section
% * title: The displayed actual title.
$if(subtitle)$
% ToC: &quot;Title. Subtitle&quot;
% Head: Just the (short) title
% Chapter page: Title, and then small just the subtitle
% NOTE: title_separator is a quick hack because I need a separator in one case.
\chapter[$title$$if(title_separator)$$title_separator$ $else$ $endif$$subtitle$][$title$]{$title$ \\%
\vspace{10mm}%
\fontseries{m}\selectfont\small{\textsl{$subtitle$}}}
$else$
\chapter{$title$}\label{chapter:$title$}
$endif$

$if(author)$
\begin{center}
  $author$
\end{center}
$endif$

$if(abstract)$
\begin{abstract}
$abstract$
\end{abstract}
\newpage % Enforce a new page if the abstract goes too long.
$endif$

$body$
</code></pre>
<p>Again, a few things to note:</p>
<ol>
<li>The divider page was only required for my three essays, but not the introduction, and so I opted again for a somewhat dumb solution: Simply set the YAML front matter property <code>essayno</code> which was simply 1, 2, or 3 depending on the essay, and unset for my introduction. The command <code>\orndivider</code> was a completely different kind of beast and hides about 100 lines of LaTeX code (that were fortunately shipped with the template, so I didn’t have to write that myself).</li>
<li>The “NOTE” comment hints to another issue: Theses at my institute usually have the form “Title. Subtitle,” but Pandoc’s template syntax cannot tell me whether the title might already end with a punctuation mark. It did so in two cases, but not in a third case. So, again a very dumb solution: Another YAML front matter property that allowed me to specify a title separator if needed.</li>
<li>The template is merely a “<a href="https://en.wikipedia.org/wiki/Shim_(computing)">shim</a>,” and leaves out most of what LaTeX actually needs to produce a standalone file. This is because those files were not intended to work standalone, but to be integrated into the larger template itself.</li>
</ol>
<p>In the end, including all four works into the thesis PDF was as simple as telling LaTeX to load them:</p>
<pre><code class="language-tex">% Include your chapters here
\include{kappa}
\include{paper1}
\include{paper2}
\include{paper3}
</code></pre>
<p>Finally, I just needed one more command to “knit” everything together:</p>
<pre><code class="language-bash">xelatex -interaction nonstopmode -halt-on-error -file-line-error liuthesis.tex
</code></pre>
<p>From then on, everything else was just touching up the template where necessary, and improving the actual text. Whenever I changed anything in my text, I just had to run two commands to turn my papers into LaTeX files, and to produce the thesis PDF.</p>
<p>Shortly before the deadline to send the thesis to the printing office, I also realized that it would be my duty to produce what the Swedes call “spikblåd.” The spikblåd needs to include the thesis title, abstract, name of the opponent, and the location of the public defense. It turns out that the thesis template actually came with that thing, and whoever created the thesis template already thought ahead: The spikblåd would simply pull in the information that I already had provided in the first step to the thesis template. So all it took was to run another command:</p>
<pre><code class="language-bash">xelatex -interaction nonstopmode -halt-on-error -file-line-error exhibit-page_spikblad.tex
</code></pre>
<p>Et voilà! It did indeed take quite some time to get this pipeline going, but what I wanted to show you with all of this is that if all things break apart, the open nature and interoperability of Zettlr ensures that anyone with some technical knowledge (or who knows someone with technical knowledge) can quickly break out of Zettlr to fix things Zettlr can’t handle by itself.</p>
<p>And you won’t even have to use LaTeX. The process to export everything to Word and then adjust a Word document would look very similar to this one.</p>
<h2>Final Thoughts</h2>
<p>Lastly, a few words on a fact that I never really reflected upon: One big benefit I had throughout this entire time was that I quite literally would “invent the universe to make apple pie.” Instead of choosing some existing piece of software and adjusting my workflow so that it works for me, I could quite literally adjust my software so that it fits into my workflow.</p>
<p>Over the past five years, I have tremendously changed how Zettlr works in an attempt to make it fit snugly into my workflow. Whenever I encountered issues where I would need some functionality to work frictionless, I would sit down on the weekend and make that feature happen.</p>
<p>I realize that this is a big privilege that only a few people enjoy. Now that I am in my Postdoc-phase, you can expect me to further refine the app to work more generally with academic workflows beyond the dissertation. However, this also means that I won’t be as obvious to potential friction in a PhD-student’s workflow as before.</p>
<p>I want to rephrase this as a call to action: If you’re an undergrad, graduate, or doctoral student, and encounter an issue where you need to do something, but Zettlr does not possess this feature, suggest it! I (and all the newly arrived contributors) can only implement what we know of. There is only one requirement: Your issue must be applicable to a few more people than just yourself. But I am very confident that most of your problems aren’t unique to you – at least not when it comes to writing a dissertation. So please help us make Zettlr even better for the next generation of students!</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>I was curious and checked: even without the datasets (circa. 60 GB in size), the <code>.git</code>-repository alone – that is: mostly just text and plots – clocks in at 667 MB. A real chonker I got there.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>WebGL Series, Part 8: Implementing Multi-sample Antialiasing (MSAA)</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa" />
  <id>https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa</id>
  <published>2026-02-27T11:00:00+00:00</published>
  <updated>2026-02-28T19:02:15+00:00</updated>
  <summary type="html"><![CDATA[In this last article in my series on WebGL, I re-implement antialiasing to make the rendered graphic look more crisp. This step concludes the full setup of the iris indicator that you can see on the demo page.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">
    <![CDATA[<p>Welcome to the very last article of this series on WebGL. Here, I’m going to explain how to complete the animation with the addition of MSAA. I will also explain a final piece to understanding the rendering pipeline of OpenGL. <a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">To read last week’s article, click here</a>.</p>
<p><a href="https://nathanlesage.github.io/iris-indicator/">View the demo-page</a></p>
<hr />
<p>When we just drew the rays directly onto the canvas, they looked sharp and crisp, but now with the post-processing, they don’t anymore. The reason is that, while the WebGL canvas has antialiasing enabled by default, we lose this ability once we draw to textures instead of the canvas directly. To understand why that happens, we have to understand the hidden part of the OpenGL pipeline better that runs in between our vertex and fragment shaders.</p>
<h2>Understanding OpenGL’s Rendering Pipeline, Part Three</h2>
<p>To recap, our vertex shader transforms screen coordinates into clip coordinates. And our fragment shader receives all pixels that are touched by one of the vertices, and provides color for that. To understand why we suddenly have ragged edges, we need to understand how OpenGL decides whether a vertex actually touches a pixel. To do so, it simply checks whether half of a pixel is covered by the vector. If it is, the pixel is considered “completely covered,” and if it doesn’t, the pixel is considered “completely outside” the vertex.</p>
<p>The reason this happens is that a texture only provides a single <em>sample</em> to check this. To add antialiasing, we actually have to introduce more samples. Adding more samples simple means that, instead of checking whether a vertex covers at least half of a pixel, we need to tell OpenGL to test each pixel <em>four times</em> (or more). It then checks first the top-right quadrant of a pixel, then the top-left quadrant, then bottom-right, then bottom-left. If a vertex covers two of these quadrants, the pixel is considered “half-covered,” if it covers three, it is considered covered to 75%. In that case, the fragment shader will receive the pixel and should calculate a color as if the pixel was <em>completely</em> covered by the vertex. However, instead of just using that color as-is, OpenGL will now add <em>another</em> check <em>after</em> our fragment shader that mixes the color with the background color to the degree that the pixel is <em>actually</em> covered by the vertex. In other words, OpenGL somehow needs a way to remember how much of the full color coming out of the fragment shader it should actually apply.</p>
<p><a href="https://learnopengl.com/Advanced-OpenGL/Anti-Aliasing">LearnOpenGL has a great visual introduction</a> into this sampling, which I recommend you read.</p>
<p>Unfortunately, sampling doesn’t work with textures. For that, we will need yet another concept called a <em>render buffer</em>. This is the final puzzle piece to understanding OpenGL’s rendering pipeline.</p>
<p>A render buffer works <em>almost</em> like a texture, but not quite. In principle, a render buffer can be used as if it was a texture. However, it cannot be used as a <em>source</em> for a fragment shader. In order to use the contents of a render buffer as input to a fragment shader, we have to copy its contents into a regular texture and use that one instead. So, in short, you can let a fragment shader <em>write</em> to a render buffer, but you cannot let it <em>read</em> from it. To access the contents of a render buffer for any additional post-processing steps, you need to “blit” the contents of the render buffer onto a texture.</p>
<p>All of this may sound very abstract, so let us use a real-world example — the iris indicator.</p>
<h2>Implementing MSAA</h2>
<p>To implement MSAA manually, we first have to understand <em>when</em> using MSAA is actually useful. Remember that, for most of the rendering passes, we essentially just have a texture the size of our canvas and modify it progressively to apply blur and bloom and tone mapping. That means that we copy pixel information between two textures of the same size, merely adjusting each pixel’s color.</p>
<p>There is no way aliasing can happen here. The only place where aliasing can happen is if we convert from vector space to pixel space. And that only happens a single time: When we actually draw our rays.</p>
<p>So, to enable MSAA again, we have to convert our <code>scenetarget</code> to use render buffers. So let’s get back to the rendering engine and change the scene target:</p>
<pre><code class="language-typescript">this.scenetarget = {
  fb: gl.createFramebuffer(),
  scene: this.createTexture(),
  fbMSAA: gl.createFramebuffer(),
  rbMSAA: gl.createRenderbuffer()
}
</code></pre>
<p>As you can see, instead of <em>replacing</em> the existing <code>scene</code> texture with a render buffer, I have instead opted to adding a second frame buffer/texture pair. This way, one can turn MSAA on and off at will. If MSAA is active, we use the newly created frame buffer/render buffer pair, but if it’s disabled, we simply continue to use our existing frame buffer/texture pair.</p>
<p>Next, we have to set up this new pair:</p>
<pre><code class="language-typescript">gl.bindFramebuffer(gl.FRAMEBUFFER, this.scenetarget.fbMSAA)
gl.bindRenderbuffer(gl.RENDERBUFFER, this.scenetarget.rbMSAA)
gl.renderbufferStorageMultisample(gl.RENDERBUFFER, gl.getParameter(gl.MAX_SAMPLES), internalFormat, cWidth, cHeight)
gl.framebufferRenderbuffer(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.RENDERBUFFER, this.scenetarget.rbMSAA)
</code></pre>
<p>As you can see, this code is almost exactly the same as the one to set up the texture. However, instead of binding a texture, we now call <code>bindRenderbuffer</code>. Furthermore, instead of allocating a texture buffer, we allocate a <code>renderbufferStorageMultisample</code>. This tells OpenGL that we want to enable multi-sample antialiasing in this render buffer. What this does, technically, is only enable a setting that tells OpenGL: “Whenever you write to this render buffer, check each pixel multiple times whether a vertex touches it.” The last line now is again the same as binding a texture, only that we bind a render buffer.</p>
<p>Using this new render buffer is as simple as changing which frame buffer we draw our rays onto:</p>
<pre><code class="language-typescript">if (this.msaaEnabled) {
  this.setFramebuffer(this.scenetarget.fbMSAA, cWidth, cHeight)
} else {
  this.setFramebuffer(this.scenetarget.fb, cWidth, cHeight)
}
</code></pre>
<p>However, the crux now lies in how to get the information back <em>out</em> of this render buffer. Since we probably want to do some post-processing, we need to convert this render buffer into a regular texture. Otherwise, we won’t be able to do anything with this data. Fortunately, this next step is straight forward:</p>
<pre><code class="language-typescript">if (this.msaaEnabled) {
  gl.bindFramebuffer(gl.READ_FRAMEBUFFER, this.scenetarget.fbMSAA)
  gl.bindFramebuffer(gl.DRAW_FRAMEBUFFER, this.scenetarget.fb)

  gl.blitFramebuffer(0, 0, cWidth, cHeight, 0, 0, cWidth, cHeight, gl.COLOR_BUFFER_BIT, gl.LINEAR)

  gl.bindFramebuffer(gl.READ_FRAMEBUFFER, null)
  gl.bindFramebuffer(gl.DRAW_FRAMEBUFFER, null)
}
</code></pre>
<p>It turns out you can explicitly specify whether you want to <em>read</em> from or <em>draw</em> to frame buffers when you bind them. What this code essentially does is tell OpenGL that we want to read data out of the render buffer, and into our regular texture (that is attached to the “normal” frame buffer). Then, we call <code>gl.blitFramebuffer</code>, and this is where the magic happens.</p>
<p>What this function does is it realizes that the source of the operation is not a texture but a render buffer. So it will look at the multiple samples to determine <em>how much</em> a pixel on the edge of an object should be colored. For example, if you have a pixel where ¾ samples are set to “Yes, this is touched by the vertex,” it will use the color in 75% intensity, and write that information into the target texture. The <code>gl.LINEAR</code> tells OpenGL that we want to use linear interpolation to perform that work.</p>
<p>Lastly, we unbind the frame buffers for good measure to prevent OpenGL from even thinking about making funny noises.</p>
<p>The beauty of this approach is that all the remaining code can remain unchanged, because regardless of whether MSAA is on or off, the <code>scenetarget.scene</code>-texture at this point will contain the rendered iris. And since no more aliasing can happen after this point (because we’re transferring equal pixels), this concludes adding MSAA to our little project.</p>
<h2>Why did MSAA work in the first place…?</h2>
<p>One final question you may now have is: Why did this whole antialiasing work to begin with, but stopped to work when we started drawing to textures instead of the canvas? Here’s a revelation: We never really talked about the canvas other than that it really just is another frame buffer. However, if it’s just another frame buffer, there is nothing that prevents us from attaching a render buffer to it, instead of a texture. It turns out that there is a setting, called <code>antialias</code> that is true by default:</p>
<pre><code class="language-typescript">const gl = canvas.getContext(&quot;webgl2&quot;, { antialias: true })
</code></pre>
<p>If we activate this setting, this tells the browser that we want the canvas to use a render buffer. If we disable this setting, we tell the browser that we want the canvas to use a regular texture. Furthermore, if we write from a texture that is the same size as the canvas onto the canvas, then regardless of this setting, no antialiasing can happen. The reason is that, again, we are transferring pixels, so regardless of how many samples you use, <em>all</em> pixels will be considered “fully covered” because there is no vertex information available.</p>
<p>Finally, you may ask how the browser then uses this render buffer? Here are the last two concepts to understand: In 3D rendering, you have usually two buffers: A front buffer and a back buffer. Whenever you draw something onto the screen, you aren’t actually drawing anything onto a screen, but in reality you draw onto a “hidden” back buffer. The contents of this back buffer are what the browser will use to <em>actually</em> draw onto the canvas (the front buffer).</p>
<p>The reason you can’t draw <em>directly</em> onto the canvas is that the browser needs to compose your 3D rendering with the rest of the website, which can include elements or colors <em>behind</em> the canvas. In addition, drawing onto a back buffer means that, if your browser must re-draw the entire website before your code is actually done rendering, it can just re-use the last frame. This avoids seeing half-drawn frames, because only “finished” buffers will be drawn on screen.</p>
<p>This is where the browser can actually perform the MSAA: Because you only write to a render buffer, the browser can check whether your code is done writing, and then the browser simply performs such a “blit” behind the scenes to transfer the written information onto the front buffer for you.</p>
<p>For very simple applications, this is great because you don’t have to manually do any MSAA and can just render things onto the canvas directly without having to worry. But, as you have seen: as soon as you need to do any post-processing, you need to use textures. And, as soon as the vertex-information has been rasterized, there is no way to undo any type of antialiasing. So, as a rule of thumb, whenever you convert vertex information into pixel data, you probably want to use a render buffer.</p>
<h2>Final Thoughts</h2>
<p>This concludes this … rather lengthy exploration into the realm of WebGL. When I sat down to write a few lines of code to make some triangles dance, I would’ve never thought that it would take me <em>so long</em> just to get to any barely acceptable state, and so much longer to get to a very visually pleasing state.</p>
<p>It was a crazy difficult project, and I am very happy that I won’t have to touch much WebGL code anymore in the near or mid-future.</p>
<p>What started as an “An Iris is a simple object, how hard can it be?” turned out to be an entire odyssey into obscure parts of programming. It really helped me to write everything down and tell you about my journey, because I really feel a great relief now.</p>
<p>In the end, I produced 1,685 lines of JavaScript code; 270 lines of GLSL code, and 344 lines of HTML — all just to render a bunch of triangles and make them shine (and give you a way to play with the settings). It’s mind-boggling to think about how much work there is to render things on a computer. And it took me just over 15,000 words to tell you about this journey.</p>
<p>This entire project gave me a whole new appreciation for the 3D artists that produce the movie effects we have come to enjoy; the game developers who allow us to play photorealistic games; and everyone who has to implement all of this nitty-gritty in such an optimized way that we rarely see any stuttering in animated graphics.</p>
<p>But, as for me: I am happy to having learned more about the fundamentals behind my own research; how LLMs calculate their weights; how we turn text into machines; and how mind-bogglingly complex juggling a bunch of numbers can become.</p>
<p>That being said, it feels as if I have freed myself from a curse. Now that I have written down these lines, I am extremely happy to being able to go back to just doing what I enjoy for much longer than two intensive weeks: sociology, and democracy.</p>
<p>I hope you enjoyed this rabbit hole! As always, if you have any questions, ping me on social media. Au revoir!</p>
<h2>The Full WebGL Series</h2>
<p>Jump directly to an article that piques your interest.</p>
<ol>
<li><a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">A Rabbit Hole Called WebGL</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Setup and the OpenGL Rendering Pipeline</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Drawing Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">Animating Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Computing Colors</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">Post-Processing</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">Adding a Bloom-Filter</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">Implementing Multi-Sample Antialiasing (MSAA)</a></li>
</ol>]]>
  </content>
</entry>
<entry>
  <title>WebGL Series, Part 7: Adding a Bloom-Filter</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter" />
  <id>https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter</id>
  <published>2026-02-20T11:00:00+00:00</published>
  <updated>2026-02-28T19:02:06+00:00</updated>
  <summary type="html"><![CDATA[In the second-to-last installment of my series on WebGL, I explain how a Bloom filter works and how I added it into the processing-pipeline of the iris indicator.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">
    <![CDATA[<p>Last week was all about preparing the code to run post-processing, but aside from some simple tone-mapping, we haven’t done anything that actually requires this elaborate setup. Today’s article changes that. <a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">To read up on last week’s article, click here</a>.</p>
<p><a href="https://nathanlesage.github.io/iris-indicator/">View the demo-page</a></p>
<h2>Extracting Luminance</h2>
<p>After adjusting the code last week, you should still see the rendered texture, but behind the scenes we write it to a “hidden” frame buffer, and only then copy this information to the canvas. In between these two steps, we can now add our post-processing. So let’s write a simple bloom filter. Remember that bloom consists of extracting a brightness map, blurring it, and combining the result with the original image. Let’s first write a way for the fragment shader to extract brightness information:</p>
<pre><code class="language-glsl">if (v_pass == FRAGMENT_PASS_BRIGHTNESS) {
  fragColor = texture(u_texture, v_texcoord);
  float l = luminance(fragColor.rgb);
  fragColor = l &gt; 1.0 ? fragColor : vec4(0.0, 0.0, 0.0, 0.0);
}
</code></pre>
<p>The luminance function is simple:</p>
<pre><code class="language-glsl">float luminance (vec3 c) {
  return dot(c, vec3(0.2126, 0.7152, 0.0722));
}
</code></pre>
<p>This just computes a luminance value from the red, green, and blue values of a color. These magic numbers are floating around the internet in many places. They sometimes differ a bit, but you can find variations of these numbers all across the internet. I unfortunately forgot where I got these particular ones from.</p>
<p>But this is now finally the explanation why we use HDR colors and textures that exceed a brightness of 1.0. If we used regular colors, the luminance would be harder to calculate, and it would be more difficult to extract brightness information. By using bright colors, we can just check if the luminance is $&gt;1.0$, and not worry if we may accidentally extract colors that aren’t supposed to shine.</p>
<h2>Applying Gaussian Blur</h2>
<p>Once the brightness information is there, we also need a way to apply blur to this. I have essentially just copied the function from LearnOpenGL, because it worked very nice. <a href="https://learnopengl.com/Advanced-Lighting/Bloom">Read their explainer</a> for how the blur filter works. What I did change is the blur weights. I added some more and modified some numbers because I felt it looked nicer. Also, this blur function works with alpha values (stay tuned for that!).</p>
<pre><code class="language-glsl">uniform bool u_blur_horizontal;
float blur_weight[7] = float[7] (0.227027, 0.1945946, 0.1216216, 0.054054, 0.016216, 0.007, 0.002);
const int repeats = 7;
vec4 blur () {
  vec2 texel = vec2(1.0, 1.0) / vec2(textureSize(u_texture, 0));
  vec4 result = texture(u_texture, v_texcoord) * blur_weight[0];
  if (u_blur_horizontal) {
    for (int i = 1; i &lt; repeats; i++) {
      result += texture(u_texture, v_texcoord + texel * vec2(i, 0.0)) * blur_weight[i];
      result += texture(u_texture, v_texcoord - texel * vec2(i, 0.0)) * blur_weight[i];
    }
  } else {
    for (int i = 1; i &lt; repeats; i++) {
      result += texture(u_texture, v_texcoord + texel * vec2(0.0, i)) * blur_weight[i];
      result += texture(u_texture, v_texcoord - texel * vec2(0.0, i)) * blur_weight[i];
    }
  }

  return result;
}
</code></pre>
<p>The <code>u_blur_horizontal</code> is another uniform we need (see the explainer I linked above), but I’m not going to repeat the same information for how to set this here.</p>
<p>Allow the fragment shader to perform the blur by adding another condition:</p>
<pre><code class="language-glsl">if (v_pass == FRAGMENT_PASS_BLUR) {
  fragColor = blur();
}
</code></pre>
<p>And finally, for compositing the blurred image with the original image, another conditional:</p>
<pre><code class="language-glsl">if (v_pass == FRAGMENT_PASS_COMPOSITE) {
  vec4 originalColor = texture(u_texture, v_texcoord);
  vec4 blurColor = texture(u_blurTexture, v_texcoord);
  fragColor = originalColor + blurColor;
}
</code></pre>
<h2>Adjusting the Rendering Code</h2>
<p>With the shaders at hand, we can add a bloom pass function. Here, we pass the rendered rays as well as some number of how many bloom passes we want (this is the bloom intensity setting). The bloom function is relatively complex, and it makes use of the “ping-pong” buffer. We create the ping-pong buffers like the <code>scenetarget</code> – see the code to verify that it’s essentially the same setup code.</p>
<pre><code class="language-typescript">private bloomPass (sourceTexture: WebGLTexture, nPasses = 32): WebGLTexture {
    const gl = this.gl
    const { cWidth, cHeight } = this.textureSize()

    this.setFramebufferRectangle(cWidth, cHeight)

    this.setFramebuffer(this.pingpong[1].fb, cWidth, cHeight)
    gl.bindTexture(gl.TEXTURE_2D, sourceTexture)
    gl.uniform1f(this.passUniformLocation, FRAGMENT_PASS_BRIGHTNESS)
    gl.drawArrays(gl.TRIANGLES, 0, 6)
    gl.bindTexture(gl.TEXTURE_2D, this.pingpong[1].rbuf)

    gl.uniform1f(this.passUniformLocation, FRAGMENT_PASS_BLUR)

    for (let pass = 0; pass &lt; nPasses * 2; pass++) {
      gl.uniform1i(this.blurHorizontalUniformLocation, pass % 2)
      this.setFramebuffer(this.pingpong[pass % 2]!.fb, cWidth, cHeight)
      gl.drawArrays(gl.TRIANGLES, 0, 6)
      gl.bindTexture(gl.TEXTURE_2D, this.pingpong[pass % 2]!.rbuf)
    }

    this.setFramebuffer(this.pingpong[1]!.fb, cWidth, cHeight)

    gl.uniform1f(this.passUniformLocation, FRAGMENT_PASS_COMPOSITE)

    gl.bindTexture(gl.TEXTURE_2D, sourceTexture)
    gl.activeTexture(gl.TEXTURE0 + 1)
    gl.bindTexture(gl.TEXTURE_2D, this.pingpong[lastActiveTexture]!.rbuf)

    gl.drawArrays(gl.TRIANGLES, 0, 6)

    gl.bindTexture(gl.TEXTURE_2D, null)
    gl.activeTexture(gl.TEXTURE0)
    gl.bindTexture(gl.TEXTURE_2D, null)
    gl.bindFramebuffer(gl.FRAMEBUFFER, null)

    return this.pingpong[1]!.rbuf
}
</code></pre>
<p>Let’s unpack this function. First, I ensure that there is our full-screen rectangle in the buffer. Then, I tell the shaders to run a brightness pass. I bind, as a source, the original rendered image (<code>scenetarget.scene</code>), write that into the <em>second</em> ping-pong frame buffer, and let the shaders do their thing by drawing the two triangles.</p>
<p>Then, I immediately bind the ping-pong buffer’s texture, which now contains the brightness information. I then switch the shaders to perform blur-passes instead, and enter a loop that progressively applies more and more and more blur to the image. This is why I wrote the brightness information into the <em>second</em> ping-pong: Because the loop can then just start at 0, write the first blur result in the first ping-pong buffer and then simply re-use the second one to do the second blur pass.</p>
<p>We always do 2 passes, one of which applies horizontal blur, and one of which applies vertical blur. Whenever I call <code>drawArrays</code>, this will take the blurred image and apply even more blur to it.</p>
<p>Finally, a composite pass. The final image is now in the second ping-pong buffer, so I bind its texture. Now we actually need two textures, so we have to <em>switch</em> the texture slot we are working with to the second one. We bind the source image to the first texture slot and the blurred image to the second texture slot. Then, we tell our fragment shader to run a composite pass (which only adds the colors from the two images), and make sure to reset the state accordingly. Finally, I return the texture. This allows me to conditionally enable or disable the blooming:</p>
<pre><code class="language-typescript">let outputTexture = this.bloomEnabled
  ? this.bloomPass(this.scenetarget.scene, this.nBloomPasses)
  : this.scenetarget.scene
</code></pre>
<p>If bloom is disabled, it will simply use the unmodified scene as a source to draw onto the canvas. But if you enable bloom, and pass that output texture to your final draw-to-canvas pass, it should show you a bloom effect. Hooray!</p>
<h2>Moving the Tone-Mapping into its own Shader Pass</h2>
<p>One additional thing we now want to do is move the tone-mapping around. Until now, we just applied this to the color, but the issue is that we want to retain the HDR colors for as long as possible so that the bloom really pops.</p>
<p>So we should move the tone mapping into its own fragment shader pass and make sure to only adjust the colors just short before drawing to the canvas. The fragment shader change is minimal and at this point self-explanatory:</p>
<pre><code class="language-glsl">if (v_pass == FRAGMENT_PASS_TONEMAP) {
  vec4 result_color = texture(u_texture, v_texcoord);
  fragColor = vec4(tonemap(result_color.rgb), result_color.a);
}
</code></pre>
<p>In our WebGL engine, we then only have to run the <code>outputTexture</code> from the bloom pass through the tonemapping once:</p>
<pre><code class="language-typescript">gl.bindTexture(gl.TEXTURE_2D, outputTexture)
this.setFramebuffer(this.pingpong[1].fb, cWidth, cHeight)
gl.uniform1f(this.passUniformLocation, FRAGMENT_PASS_TONEMAP)
gl.drawArrays(gl.TRIANGLES, 0, 6)
outputTexture = this.pingpong[1].rbuf
gl.bindTexture(gl.TEXTURE_2D, null)
</code></pre>
<p>As you can see, I bind the output texture, but instead of writing to the canvas, I use one of the two ping-pong frame buffers as a target. Then, I tell the fragment shader that it should perform tone-mapping, and commence the actual shader run by drawing the two triangles once more. Then, I overwrite the <code>outputTexture</code> to use the now-tone-mapped one from the ping-pong.</p>
<p>If you run this version of the code, the colors will pop and the bloom should look adequate. This means we’re <em>almost</em> done!</p>
<h2>Final Thoughts</h2>
<p>The animation should now look almost like the demo page.</p>
<p>But there’s just one issue: Depending on your display, you may have noticed that the edges of the rays somehow look very ragged, especially if you disable the bloom effect. Why is that, and how can we fix this?</p>
<p>This is something you might’ve noticed in the last article as well, and which I didn’t address in this one. So stay tuned for the final article in this series that completes the animation by implementing antialiasing!</p>
<h2>The Full WebGL Series</h2>
<p>Jump directly to an article that piques your interest.</p>
<ol>
<li><a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">A Rabbit Hole Called WebGL</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Setup and the OpenGL Rendering Pipeline</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Drawing Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">Animating Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Computing Colors</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">Post-Processing</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">Adding a Bloom-Filter</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">Implementing Multi-Sample Antialiasing (MSAA)</a></li>
</ol>]]>
  </content>
</entry>
<entry>
  <title>WebGL Series, Part 6: Post-Processing</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing" />
  <id>https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing</id>
  <published>2026-02-13T11:00:00+00:00</published>
  <updated>2026-02-28T19:02:01+00:00</updated>
  <summary type="html"><![CDATA[In part six of this series on WebGL, I introduce the concepts behind post-processing a rendered image, and how to implement that in a WebGL program.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">
    <![CDATA[<p>This week, I’ll be walking you through post-processing. I’ll start with adding a little bit of post-processing immediately, so that you can see what it does, and then re-organize the code quite a bit to enable us to run arbitrary post-processing stages. <a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Check out last week’s article for a refresher</a>.</p>
<p><a href="https://nathanlesage.github.io/iris-indicator/">View the demo-page</a></p>
<h2>Adding Tone Mapping</h2>
<p>Before we actually get into the guts of post-processing, let us first do a post-processing stage. Last time, I have introduced HDR-colors that you could control using an <code>hdrFactor</code> of <code>10.0</code>. I have mentioned that you may want to set this factor to <code>1</code> for the time being. The tone mapping is intended to ensure that you can use HDR-colors, and still see everything.</p>
<p>If you haven’t changed the HDR factor, you may have noticed that the colors of the circle are very bright, maybe almost white (depending on your display). This is not nice. But this is due to us using HDR colors that must be brought back into the regular color range of $[0; 1]$. If we leave the colors as they are, OpenGL will just forcefully move the colors to the range $[0; 1]$, resulting in a lot of white.</p>
<p>To convert the bright colors back ourselves (and as such prevent any harsh artefacts), we employ tone mapping. I essentially follow the guidance of LearnOpenGL here and adjust the exposure and gamma down to manageable levels.</p>
<p>Since tone mapping involves changing the <em>colors</em>, we will have to add this to the fragment shader. I decided to write a simple function for this:</p>
<pre><code class="language-glsl">vec3 tonemap (vec3 color) {
  const float exposure = 0.5;
  color = vec3(1.0) - exp(-color * exposure);
  const float gamma = 0.8;
  color = pow(color, vec3(1.0 / gamma));
  return color;
}
</code></pre>
<p>While I have seen “gamma correction” several times in some video game settings before, I never knew what exactly it did. It is kind of interesting to see all of these little formulas in action that do something to the colors of our displays! You can play around a bit with both exposure and gamma to see what it does. However, all that it should do at this point is restore the original, vibrant colors with no over-exposure or being too white.</p>
<p>This is already quite nice, and was a simple step to take. But there are a few additional post-processing steps that I want to add. And for this, we will have to rewrite the shader code quite a bit, and understand frame buffers. This is the second part of the OpenGL rendering pipeline that I announced earlier.</p>
<h2>Understanding OpenGL’s Rendering Pipeline, Part Two</h2>
<p>Rendering in OpenGL always follows some well-defined steps: First, you set up any data your shaders need. This is both vertex buffers for the actual geometry, and some additional values. Our little iris indicator thus far receives a transformation matrix, the triangle data, and colors and ratios for segments. Once this is set up, we have to tell OpenGL that we want to draw something to the canvas by setting the frame buffer to null. Finally, we call <code>gl.drawArrays</code> to actually draw the geometry onto the canvas by running the vertices through the vertex shader, and assigning each pixel a color in the fragment shader.</p>
<p>However, one part I was consistently skipping over was that you aren’t required to directly write to a canvas. And, moreover, the canvas is indeed just a regular frame buffer. So what if we just create a random, other frame buffer to write to? This is how we can add post-processing effects. To understand this a little better, let’s focus on frame buffers first.</p>
<p>A frame buffer in OpenGL is nothing but a data structure that you can write to. You tell OpenGL where you want your shaders to write data to by calling <code>gl.bindFramebuffer</code>. If you bind the frame buffer <code>null</code>, you effectively tell OpenGL to use the frame buffer that is the canvas. But you can also bind your own frame buffers. However, in order for OpenGL to write <em>to</em> something, we also have to attach either a texture or what is called a render buffer to it.</p>
<p>You see, a frame buffer is less an actual “buffer” (as one might understand it in terms of data storage), and more like a container for other buffers that you <em>then</em> actually write to. The basic idea behind frame buffers is that they allow you to organize your rendering stages. For example, you may have one frame buffer to write your actual geometry to, then you may have a frame buffer that you use in the post-processing stage. And only once you’re done with the post-processing steps, <em>then</em> you finally select the canvas to draw to and effectively transfer all the processed pixels onto the screen for the user to see.</p>
<p>There are three additional concepts to understand here. First, how you actually perform post-processing using textures; second the concept of performing post-processing using what is called a “ping-pong” setup; and lastly reading <em>from</em> and writing <em>to</em> multiple sources/targets at the same time.</p>
<p>First, how does post-processing actually work, given the tools we have? Well, I found the solution hilarious, but at the same time very smart. So, the rendering consists of taking some geometry and drawing it onto a frame buffer. Then we have just basic geometry that doesn’t look very nice. But what we also have is an entire width × height <em>picture</em>. And we can essentially take that entire picture and run post-processing on it. When we then draw this picture to the canvas, we see a processed image. So here we are at the stage of, quite literally, doing the browser’s work of displaying a simple image, but one that we had to generate before.</p>
<p>In order to draw your geometry onto a picture, you will need to use a texture. Textures are the only way to transfer large chunks of data into and out of your GPU and running shaders onto them. If you write to a texture instead of the canvas, you can transform this texture, and then you have to simply paint the texture onto the canvas. How do you do it? Well, a texture must be attached to some geometry. But we already drew our geometry, right? So, how do we do it? By drawing a rectangle that is the same size as the canvas, and telling it to use our texture.</p>
<p>That’s it. That’s the entire magic trick. Effectively, to post-process some image, we first draw our <em>actual</em> geometry onto a texture that is the same size as our canvas. Then, we just have to draw a rectangle that is also the same size as our canvas, using the texture as a source for the fragment shader and transform the colors in the fragment shader according to what we want. And we can repeat this step ad infinitum, if we so wish, constantly drawing a texture the size of our canvas onto a rectangle the size of our canvas, progressively adjusting the colors of that texture. That brings us to the second concept.</p>
<p>To do post-processing you <em>could</em>, in principle, generate as many frame buffers as you have steps for post-processing. But that can become cumbersome, and OpenGL does allow us to re-use a lot of our code. So why not re-use frame buffers, too?</p>
<p>This is where the “ping-pong” method comes into play. For this, you need two frame buffers and two textures, one for each frame buffer. Then, you load your source image for the fragment shader to use; bind the first ping-pong frame buffer, and draw our canvas-sized rectangle, telling the fragment shader to use the source image; and transform the colors on it to draw on the frame buffer. Then, we have the result of this step in the texture of the first ping-pong frame buffer. Now we load that resulting texture as the source for the fragment shader, bind the <em>other</em> frame buffer to write to, and run the fragment shader <em>again</em>. We can do so for as long as we want, and all we need are two frame buffers that we switch back and forth between (hence the name). After the post-processing is done, the texture that is associated with the last ping-pong buffer contains the result.</p>
<p>Finally, one last concept to understand is that you can actually read and write from and to multiple sources and targets at the same time. For example, when you call <code>gl.bindTexture</code>, you tell OpenGL that you want your frame buffer to <em>read</em> this texture. And, whatever texture is associated with your target <em>frame buffer</em> is what the fragment shader will write its <code>fragColor</code> to. But we can also pass multiple textures, e.g., to combine two pictures. We do so by calling <code>gl.activeTexture</code> before <code>gl.bindTexture</code> to select one of the available texture slots. If I have seen it correctly, OpenGL allows up to eight textures to be provided to the frame buffer at the same time.</p>
<p>To tell your shader to use multiple textures as sources, you just specify all textures using variables (<code>uniform sampler2D u_texture;</code>). In your drawing code, you then only have to use <code>gl.activeTexture</code> to select one of the slots, <code>gl.bindTexture</code> to provide the data, and then tell your fragment shader where the correct texture is by setting the uniform <code>u_texture</code> to the correct number slot (i.e., <code>1</code> if you want the second texture).</p>
<p>Likewise, you can also specify multiple <em>outputs</em> of a fragment shader. For this, you must bind multiple textures to a frame buffer. A frame buffer also has up to eight (?) slots, but here they are called “color attachments.” To enable your shader to write to multiple targets, you have to define the output variables, specifying a “layout location” that corresponds to the color attachment slot (i.e., <code>layout (location = 0) out vec4 fragColor;</code> will always write to color attachment zero). Additionally, you need to make sure that the frame buffer you are writing to also has a color attachment (read: texture) assigned to each <code>location</code> that you are producing output for.</p>
<p>There is a third part to the rendering pipeline that I will explain in due time, but with what we know now, we can continue to do the first “cool” post-processing step.</p>
<h2>Preparing Post-Processing</h2>
<p>Since I want the colors of the iris indicator to pop a bit more, my brain immediately jumped to “Oh, bloom!”</p>
<p>If you have been alive in the early 2000s, there was a video game called “The Elder Scrolls IV: Oblivion” which I enjoyed as a child. One innovation it brought to computer gaming was the extensive use of a bloom filter. Apparently, video game developers used bloom to convey brightness before other techniques became possible. But because of how much Oblivion overdid the bloom filter, everyone was talking about it. To see how much is too much bloom, I invite you to go to the demo again and set the bloom intensity to 8×. Then you have an idea of what Oblivion looked like at times.</p>
<p>But an iris indicator is not a video game, and here I personally believe that it can really benefit from some overly bright filter. So, how do we actually implement a bloom filter? That is quite simple, and I’m indebted to LearnOpenGL for providing a simple algorithm for this.</p>
<p>Bloom involves three steps: First, extract only the bright spots of a rendered scene. Then, blur the hell out of those bright spots. Finally, combine the blurred highlights with the original image to get this impression of brightness. Let’s see how this is implemented in OpenGL.</p>
<p>First, let us focus on the WebGL engine. We now need to add a few frame buffers, and we need to stop directly rendering to the canvas. Instead, we want to render our rays onto a frame buffer and then, in a <em>second</em> pass, we want to process them to apply bloom.</p>
<h3>Adding a Frame Buffer</h3>
<p>To get started, we define a frame buffer and a texture called “scene target” because it’s going to be the frame buffer we write our geometry (the triangles) to:</p>
<pre><code class="language-typescript">this.scenetarget = {
  fb: gl.createFramebuffer(),
  scene: this.createTexture()
}
</code></pre>
<p>For this, we also need a routine to create a new texture. Why? Because we have to adjust some settings for each texture. Here’s how that works:</p>
<pre><code class="language-typescript">private createTexture (): WebGLTexture {
  const gl = this.gl
  const texture = gl.createTexture()
  gl.bindTexture(gl.TEXTURE_2D, texture)
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE)
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE)
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR)
  gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR)
  return texture
}
</code></pre>
<p>Again, you can see that we first have to “bind” a texture, and then we can adjust the settings of the currently bound texture. To learn what these settings do, I recommend reading WebGL fundamentals, from which I adapted this function. Back to the frame buffer, we now have to set up the frame buffer to use the texture. Because this involves modifying the settings of the frame buffer, we need to – you guessed it – bind it first:</p>
<pre><code class="language-typescript">gl.bindFramebuffer(gl.FRAMEBUFFER, this.scenetarget.fb)    
gl.bindTexture(gl.TEXTURE_2D, this.scenetarget.scene)
gl.texImage2D(gl.TEXTURE_2D, 0, internalFormat, cWidth, cHeight, 0, format, type, null)
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, this.scenetarget.scene, 0)
</code></pre>
<p>In this code, we first bind the frame buffer and texture, because we want to couple them. With <code>texImage2D</code> we effectively just tell OpenGL that we want to allocate enough space for a fragment buffer to write an image of <code>cwidth</code> × <code>cHeight</code> into it.</p>
<h3>Changing the Texture Size</h3>
<p>Where do <code>cWidth</code> and <code>cHeight</code> come from? Well, here’s another lesson: There are CSS pixels, and there are actual pixels. Many modern displays have such a high resolution that, drawing a pixel the size of an actual pixel would be almost imperceptible. Instead, on such high-resolution displays, drawing one pixel usually involves drawing four pixels (a 2×2 square). The canvas size is provided in CSS pixels, which already account for high-resolution displays. This is why we have set the frame buffer viewport to the actual canvas size whenever we drew to it. But for writing to a <em>texture</em>, we want that texture to have as much resolution as the display itself. This means that the textures should always have the <em>actual</em> resolution of the canvas, not its reported CSS size.</p>
<p>To do so, we simply multiply the reported Canvas size with the <code>window.devicePixelRatio</code>. This device pixel ratio will be, e.g., <code>2</code> for my MacBook screen, and it is <code>1</code> for my normal work monitor. For your displays, it might differ. In any case, by using the device pixel ratio, we ensure that one pixel on the textures we’re writing to corresponds to one actual, physical pixel. I decided to write a simple utility function for that:</p>
<pre><code class="language-typescript">textureSize (): { cWidth: number, cHeight: number } {
  const gl = this.gl
  const cWidth = Math.ceil(gl.canvas.clientWidth * this.textureSizeModifier)
  const cHeight = Math.ceil(gl.canvas.clientHeight * this.textureSizeModifier)
  return { cWidth, cHeight }
}
</code></pre>
<p>The <code>textureSizeModifier</code> is just a variable so that you can change the resolution of the texture to other values than your device pixel ratio. Go ahead and try it out on the demo page to see the effect.</p>
<p>But now, back to the frame buffers: By calling <code>framebufferTexture2D</code> we attach the texture to the frame buffer. We here do so as color attachment 0, but there are multiple ones to choose from. The various settings for the texture are as follows:</p>
<pre><code class="language-typescript">const internalFormat = gl.RGBA16F
const format = gl.RGBA
const type = gl.FLOAT
</code></pre>
<p>It is common to use simple <code>RGBA</code> as the internal format and use <code>UNSIGNED_BYTE</code> as the type of texture. <em>However</em> because we’re working with HDR colors that exceed the common maximum of <code>1.0</code>, we can’t do that. One little known fact that I had to research first is that OpenGL actually does not support using floating point colors by default. This needs to be explicitly enabled:</p>
<pre><code class="language-typescript">gl.getExtension('EXT_color_buffer_float')
</code></pre>
<p>Now we have a texture to render to. After the setup, it’s good habit to unbind both again, because if you accidentally leave a texture or frame buffer bound, WebGL can get kind of funny:</p>
<pre><code class="language-typescript">gl.bindFramebuffer(gl.FRAMEBUFFER, null)
gl.bindTexture(gl.TEXTURE_2D, null)
</code></pre>
<h3>Adjusting the Draw Code</h3>
<p>Next, we have to modify the <code>draw</code> code to draw not to the canvas, but rather to this frame buffer. We do so by setting our just created frame buffer as the rendering target. The <code>setFramebuffer</code> is simply a method that binds the provided frame buffer and also sets the viewport, which I have adapted from WebGL fundamentals:</p>
<pre><code class="language-typescript">this.setFramebuffer(this.scenetarget.fb, cWidth, cHeight)
gl.clear(gl.COLOR_BUFFER_BIT)
</code></pre>
<p>We also have to clear the buffer again to reset all the colors in the texture. Otherwise, we would simply overwrite <em>changing</em> colors, and that would lead to a smearing effect. Now when we call <code>gl.drawArrays</code>, the shaders will write to the scene texture. And then we can use this scene texture as a source for our post-processing. But how do we draw the result of that onto the canvas? Let’s for now just immediately implement that so that we can add our post-processing in between.</p>
<p>To draw to our canvas, we first have to bind the canvas as our rendering target, read: as the frame buffer:</p>
<pre><code class="language-typescript">this.setFramebuffer(null, this.gl.canvas.clientWidth, this.gl.canvas.clientHeight)
</code></pre>
<p>Next, we have to bind the texture we have just written to as the source for our fragment shader:</p>
<pre><code class="language-typescript">gl.bindTexture(gl.TEXTURE_2D, this.scenetarget.scene)
</code></pre>
<p>Now we have to add a way for the fragment shader to actually retrieve the colors from this texture. For this, we first create a new texture:</p>
<pre><code class="language-glsl">uniform sampler2D u_texture;
</code></pre>
<p>Since this is a variable we also have to fill, we have to adjust our engine code accordingly:</p>
<pre><code class="language-typescript">this.textureUniformLocation = gl.getUniformLocation(this.program, 'u_texture')
gl.uniform1i(this.textureUniformLocation, 0)
</code></pre>
<p>This tells OpenGL to always attach the texture from slot 0 to that variable. Note that we only pass the <em>index</em> of the texture here, we don’t have to copy the texture. That is work that OpenGL does for us whenever we call <code>bindTexture</code>.</p>
<blockquote>
<p>Nota bene: While writing this article I realized that I never actually did this. But yet, the animation worked. Why? Well, because OpenGL pre-sets everything with zero, and it keeps this value as long as you don’t overwrite the values by calling <code>uniform1i</code>. So even if you only declare a texture in your fragment shader, but never set the value in your WebGL code, it will still work, because the value is implicitly preset to <code>0</code>. It wouldn’t work with a second texture, of course.</p>
</blockquote>
<h3>Conditional Shading</h3>
<p>Next, we have to change our shaders. Until now, both vertex and fragment shader could assume that they would receive a bunch of triangles, and had to draw those onto a canvas and compute an appropriate color for each pixel. This would be the point at which you could create another shader-pair. But this entire project should only render an iris indicator, and we are currently at ~12,000 words, so I won’t go through the added complexity of having to juggle multiple shader programs.</p>
<p>Instead, we’re going to hide all our different shaders within the two we already have. To do so, we have to define a new variable that will tell our shaders how they should behave. So let’s define an enumeration to know which passes we have:</p>
<pre><code class="language-glsl">float FRAGMENT_PASS_PASSTHROUGH = 0.0;
float FRAGMENT_PASS_NORMAL = 1.0;
float FRAGMENT_PASS_BLUR = 2.0;
float FRAGMENT_PASS_COMPOSITE = 3.0;
float FRAGMENT_PASS_TONEMAP = 4.0;
float FRAGMENT_PASS_BRIGHTNESS = 5.0;
</code></pre>
<p>(There are a bunch of additional ones that we will slowly add to the shaders). You need to copy these definitions to both shaders. Then, in the rendering engine, you’ll also want to add them:</p>
<pre><code class="language-typescript">const FRAGMENT_PASS_PASSTHROUGH = 0.0
const FRAGMENT_PASS_NORMAL = 1.0
const FRAGMENT_PASS_BLUR = 2.0
const FRAGMENT_PASS_COMPOSITE = 3.0
const FRAGMENT_PASS_TONEMAP = 4.0
const FRAGMENT_PASS_BRIGHTNESS = 5.0
</code></pre>
<p>Now, we have to let our shaders know which pass we currently run with a simple uniform. We define it in the vertex shader, and pass it through to the fragment shader:</p>
<pre><code class="language-glsl">uniform float u_pass;

out float v_pass;

void main () {
  // ... other code
  v_pass = u_pass;
}
</code></pre>
<p>The fragment shader:</p>
<pre><code class="language-glsl">in float v_pass;
</code></pre>
<blockquote>
<p>Side note: I had to learn the hard way that, even though “uniforms” are kind of constants, WebGL will make funny noises if you simply define them in both shaders. So if you have one value that you need to address in both shaders, you’ll need to only declare it in the vertex shader, and pass it through to the fragment shader.</p>
</blockquote>
<p>Of course, we also need to be able to change this value, so we’ll have to adapt the rendering engine. Nothing here is new:</p>
<pre><code class="language-typescript">this.passUniformLocation = gl.getUniformLocation(this.program, 'u_pass')

// At render time, for example to tell the shaders we do a regular pass:
gl.uniform1f(this.passUniformLocation, FRAGMENT_PASS_NORMAL)
</code></pre>
<p>Now we can write conditional logic in our shaders. First, let’s change the vertex shader. In the first pass, the vertex shader will receive triangles and is supposed to transform them (translate to center of canvas, and apply a rotation). But in all other passes, we will only give it a rectangle to draw, and it should not do anything fancy with it. So we have to change the code accordingly:</p>
<pre><code class="language-glsl">vec2 transformed = u_pass == FRAGMENT_PASS_NORMAL
    ? (u_matrix * vec3(a_position, 1)).xy
    : a_position;

vec2 normalized = transformed / u_resolution;
</code></pre>
<p>That is all our vertex shader needs to know: If there are triangles incoming (in our “normal” rendering pass), it should transform the coordinates, and in all other times, it should simply convert them to clip space. The shader part that actually does need all of these funny constants we just defined is the <em>fragment</em> shader. Because now everything that changes affects the colors. For now, we can just check the pass value and either compute a pixel color, or literally pass through the texture value:</p>
<pre><code class="language-glsl">void main () {
  if (v_pass == FRAGMENT_PASS_NORMAL) {
    fragColor = compute_color();
  } else if (v_pass == FRAGMENT_PASS_PASSTHROUGH) {
    fragColor = texture(u_texture, v_texcoord);
  }
}
</code></pre>
<h3>Drawing a Texture to the Canvas</h3>
<p>Now all we have to do is give the fragment shader the correct texture, which we have already done. We now return to the rendering engine. The last thing we have done is assign the rendered scene as a texture:</p>
<pre><code class="language-typescript">gl.bindTexture(gl.TEXTURE_2D, this.scenetarget.scene)
</code></pre>
<p>Now we want to draw this texture simply onto the canvas. How do we do that? Well, first we must define a rectangle that equals the canvas size and write its coordinates into our position buffer:</p>
<pre><code class="language-typescript">this.setFramebufferRectangle(cWidth, cHeight)
</code></pre>
<p>This function is very simple:</p>
<pre><code class="language-typescript">function setFramebufferRectangle (width: number, height: number) {
  const coords = new Float32Array([
    0.0, 0.0, width, 0.0, width, height,
    0.0, 0.0, width, height, 0.0, height
  ])
  this.gl.bufferData(this.gl.ARRAY_BUFFER, coords, this.gl.STATIC_DRAW)
}
</code></pre>
<p>Note that for <code>gl.bufferData</code> to do the right thing, you need to make sure that the position buffer is bound. Because we only use a single buffer, the position buffer should still be bound. What we do here is effectively overwrite the triangles with our single rectangle. At this point, we have the correct frame buffer (<code>null</code> for the canvas), the correct texture (<code>this.scenetarget.scene</code>), and a simple rectangle in the buffer. Now, let’s start up our rendering pipeline by issuing the draw command:</p>
<pre><code class="language-typescript">gl.drawArrays(gl.TRIANGLES, 0, 6)
</code></pre>
<p>This tells OpenGL to draw triangles, and to expect six elements (read: x/y coordinates). It will pass those first to the vertex shader which now will only convert the coordinates to clip space but ignores the matrix transformation. Then, OpenGL will calculate which pixels are affected (which, for this particular rectangle, are just all) and passes the information to the fragment shader which is instructed to just copy the information from the texture to the drawing target. Since the drawing target has the same size as the texture, this means that it effectively copies the information.</p>
<h2>Final Thoughts</h2>
<p>At this point, everything is set up to add whatever post-processing stages you like. Next week, I’ll explain how I added a bloom filter using this setup.</p>
<p>Also, you may notice, if you’ve followed along, that the image suddenly looks very rough and pixelated. That’s because, due to what we just did, we lost the ability to antialias the rendered image. That will follow in article 8 in two weeks. So, as always, stay tuned!</p>
<h2>The Full WebGL Series</h2>
<p>Jump directly to an article that piques your interest.</p>
<ol>
<li><a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">A Rabbit Hole Called WebGL</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Setup and the OpenGL Rendering Pipeline</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Drawing Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">Animating Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Computing Colors</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">Post-Processing</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">Adding a Bloom-Filter</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">Implementing Multi-Sample Antialiasing (MSAA)</a></li>
</ol>]]>
  </content>
</entry>
<entry>
  <title>WebGL Series, Part 5: Computing Colors</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors" />
  <id>https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors</id>
  <published>2026-02-06T11:00:00+00:00</published>
  <updated>2026-02-28T19:01:54+00:00</updated>
  <summary type="html"><![CDATA[In this fifth article on WebGL, I explain how I procedurally generate colors and animate them to convey changes in the state of the iris indicator.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">
    <![CDATA[<p>Last week, I walked you through animating the render so that it can convey a sense of movement. But one central part of the entire indicator is to visualize various <em>states</em> of tasks that we might have. To refresh your memory of what the entire purpose of this is, <a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">read the first article</a>. To get a quick refresher on what we did last week, <a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">click here for last week’s article</a>.</p>
<p><a href="https://nathanlesage.github.io/iris-indicator/">View the demo-page</a></p>
<h2>Setting Up the Data Structures</h2>
<p>With the animation out of the way, the next construction site to tackle concerns color. Until now, the rays are all just equally colored, and you have probably even changed the color if you were bored. But if you remember, I want to indicate various states in the iris using colors. So let’s add them now.</p>
<p>This is unfortunately very convoluted. I tinkered a lot with the code until I found a proper solution. This solution is what I call “segments.” I define four segments that one can set. Why four? Two reasons. First, shaders are really tight and don’t allow for variable sized arrays. Second, I want to blend the colors into each other, meaning that, if you use all four segments, you will already have eight colors in the circle. Any additional color would drastically reduce the informativeness of the iris, countering its purpose to be quickly comprehensible.</p>
<p>The segments are part of the overall state, so let’s move them there. I use three properties to remember their state (actually, it’s four, but let’s keep it simple for now):</p>
<pre><code class="language-typescript">private segmentCounts: Vec4
private segmentColors: Vec4&lt;Vec4&gt;
private segmentRatiosTarget: Vec4
</code></pre>
<p><code>segmentCounts</code> contains the current amount of counts for each segment. If one segment is zero, it will be discarded, so you can use fewer, if you prefer. The idea is that you associate one state with each segment. How I see it: The first segment includes failed tasks. The second segment includes successful tasks. The third segment contains in-progress tasks. (I am still pondering if I should use the fourth for cancelled tasks.) So I provide four colors to the engine in <code>segmentColors</code>: red, green, blue, and (the for my purposes unused segment) purple. I decided to hard-code the colors to ensure they always “pop” and are easily distinguishable from each other. The colors are provided by a simple color map:</p>
<pre><code class="language-typescript">this.colormap = {
  blue:   [0.2, 0.5, 1.0, 1.0],
  red:    [1.0, 0.3, 0.3, 1.0],
  green:  [0.3, 1.0, 0.3, 1.0],
  yellow: [1.0, 1.0, 0.3, 1.0],
  purple: [1.0, 0.3, 1.0, 1.0]
}
</code></pre>
<p>As you can see, these colors already follow the RGBA-format used by the OpenGL shaders, for simplicity.</p>
<p>One side note, I defined a new utility type, <code>Vec4</code> which is simply defined as an array with four numbers. Again, if you use JavaScript, you won’t need that.</p>
<p>I’ll skip the getters and setters for this function, but I need to provide <em>ratios</em> from these counts (because a circle just has 100%, and not an arbitrary count):</p>
<pre><code class="language-typescript">const sum = this.segmentCounts.reduce((p, c) =&gt; p + c, 0)
this.segmentRatiosTarget = this.segmentCounts.map(c =&gt; c / sum)
</code></pre>
<p>Next, we have to tell the shaders about this. For this, I decided to use a <code>struct</code> because it makes working with the data simpler:</p>
<pre><code class="language-glsl">struct Segment {
  float ratio;
  vec4 color;
};

uniform Segment u_segments[4];
</code></pre>
<p>Each segment has a color (this way, I could modify the colors dynamically if I really wanted to), and a ratio between 0 and 1.</p>
<p>Now we have to tell OpenGL how we can provide the data to the shaders. This requires a few new pointers in the rendering engine:</p>
<pre><code class="language-typescript">this.segmentLocs = []
for (let i = 0; i &lt; MAX_SUPPORTED_SEGMENTS; i++) {
  this.segmentLocs.push({
    ratio: gl.getUniformLocation(this.program, `u_segments[${i}].ratio`),
    color: gl.getUniformLocation(this.program, `u_segments[${i}].color`)
  })
}
</code></pre>
<p><em>Huh, what is that?</em> One limitation of the shader language is that everything is <em>really tightly</em> guarded, and this includes the memory positions of data. I could have forced the shaders to move the segment colors to very specific memory locations and then address them using indices (e.g., <code>firstLocation + 1</code>, <code>firstLocation + 4</code> and so on), but that seemed a bit too cryptic to understand in six months from now.</p>
<p>So instead I used a property of WebGL that allows me to get the memory location for each individual ratio and color individually. This is a bit verbose, but remember that OpenGL will retain the last data we have provided, and only update what we explicitly change. So as long as we don’t need new data in those memory locations, we can just not access these locations. This keeps the amount of updating to do during each individual rendering run much lower, as the shaders can reuse existing data.</p>
<p>To set the segments, we can now add a simple function to the rendering engine (and pass this data from our <code>IrisIndicator</code> class):</p>
<pre><code class="language-typescript">setSegments (segments: Vec4&lt;Segment&gt;) {
  const gl = this.gl

  for (let i = 0; i &lt; MAX_SUPPORTED_SEGMENTS; i++) {
    const seg = segments[i % segments.length]
    let [r, g, b, a] = seg!.color
    r *= this.hdrFactor
    g *= this.hdrFactor
    b *= this.hdrFactor
    gl.uniform4fv(this.segmentLocs[i]!.color, [r, g, b, a])
    gl.uniform1f(this.segmentLocs[i]!.ratio, seg!.ratio)
  }

  let blendRatio = Infinity
  for (const { ratio } of segments) {
    if (ratio === 0.0) {
      continue
    }

    if (ratio &lt; blendRatio) {
      blendRatio = ratio
    }
  }

  blendRatio /= 2.0
  blendRatio = Math.max(Math.min(blendRatio, 0.1), 0.01)

  gl.uniform1f(this.blendRatioUniformLocation, blendRatio)
}
</code></pre>
<p>This code requires some elaboration. <code>const seg = segments[i % segments.length]</code> simply ensures that, if someone ever manages to pass in fewer or more segments, this will only use at most four of those, repeating if necessary. The next few lines disassemble a passed in color, and multiply by an <code>hdrFactor</code>. <em>What is that?</em>, you may ask. Well, it is a factor of currently 10 that simply makes the colors “pop” more. This is literally what “HDR” or “High-Dynamic Range” means: Color values that can get <em>much</em> brighter than regular standard definition (SD) colors. I took this advice from the LearnOpenGL tutorials because it will be necessary for the bloom filter, and it works well. This doesn’t really need to be a setting, but if I <em>ever</em> want to revisit this code again, this will come in handy.</p>
<p>(Also, you may want to set this factor to <code>1.0</code> until we have implemented tone mapping, which we do in the next article. This way the colors are discernible.)</p>
<p>Lastly, we update each segment location with both its (possibly new) color and its new ratio. Below that follows code that determines the range in which we should blend two adjacent colors instead of returning a solid color. I have found that this blend ratio needs to be dynamically calculated. Since it won’t change for all pixels in a single run, we can pre-calculate this ratio here and provide it for all rendering passes. To use this information, we can now turn to the <em>fragment</em> shader.</p>
<h2>Computing Colors</h2>
<p>The fragment shader now has access to both the segments and their associated colors and ratios and the blend ratio. It can use only this information to accurately compute a color based on the pixel’s position. To do so, I have after many hours come up with the following monstrosity of a function:</p>
<pre><code class="language-glsl">vec4 compute_color () {
  vec2 coords = (v_texcoord.xy - 0.5) * 2.0 * vec2(-1, 1);
  float rad = atan(coords.y, coords.x) + PI;

  float radThreshold = MAX_RADIANS * u_blendRatio;

  float segmentStart = 0.0;
  float segmentEnd = 0.0;
  vec4 prevColor = INACTIVE_COLOR;
  for (int i = u_segments.length() - 1; i &gt;= 0; i--) {
    if (u_segments[i].ratio &gt; 0.0) {
      prevColor = u_segments[i].color;
      break;
    }
  }

  for (int i = 0; i &lt; u_segments.length(); i++) {
    if (u_segments[i].ratio == 0.0) {
      continue;
    }

    vec4 currentColor = u_segments[i].color;
    segmentEnd = segmentStart + u_segments[i].ratio * MAX_RADIANS;

    if (rad &gt;= segmentStart &amp;&amp; rad &lt;= segmentStart + radThreshold) {
      float blendStart = segmentStart - radThreshold;
      float blendEnd = segmentStart + radThreshold;
      return mix(prevColor, currentColor, (rad - blendStart + 1.0) / (blendEnd - blendStart + 1.0));
    } else if (rad &gt; segmentStart + radThreshold &amp;&amp; rad &lt;= segmentEnd - radThreshold) {
      return currentColor;
    } else if (rad &gt; segmentEnd - radThreshold &amp;&amp; rad &lt;= segmentEnd) {
      vec4 nextColor = INACTIVE_COLOR;
      int next = i == u_segments.length() - 1 ? 0 : i + 1;
      Segment nextSegment = u_segments[next];
      for (int j = 0; j &lt; u_segments.length(); j++) {
        if (nextSegment.ratio &gt; 0.0) {
          nextColor = nextSegment.color;
          break;
        }

        next++;

        if (next &gt;= u_segments.length() - 1) {
          next = 0;
        }

        nextSegment = u_segments[next];
      }

      float blendStart = segmentEnd - radThreshold;
      float blendEnd = segmentEnd + radThreshold;
      return mix(currentColor, nextColor, (rad - blendStart) / (blendEnd - blendStart));
    }

    segmentStart = segmentEnd;
    prevColor = u_segments[i].color;
  }

  return INACTIVE_COLOR;
}
</code></pre>
<p>Let’s go through it piece by piece. First, we take the texture coordinates that have been provided by the vertex shader (see my earlier note). Interpolated by OpenGL, this will be one of the pixels that is touched by the vertex shader. We now have to re-transform them back into coordinates that are on the unit circle (i.e., where $x$ and $y$ are between $-1$ and $+1$).</p>
<blockquote>
<p>Side note: Why do we have to transform the coordinates into clip space and then back again?! Well, as I mentioned earlier, in between the vertex and the fragment shader, OpenGL will take a look at the output produced by the vertex shader, and determine the affected pixels to provide to the fragment shader. This means that the OpenGL rasterizer needs to take a look at the texture coordinate. In other words, we have to set the texture coordinate to the proper coordinate system, and undo that work in the fragment shader again to get absolute coordinates. This certainly seems superfluous, but there are reasons for doing it this way that will probably make sense for more complex applications.</p>
</blockquote>
<p>Once we have the coordinates in unit-circle units, we can use a trigonometric function that I myself have never used before to convert the coordinate into the radians that they occupy on the unit circle: the arc tangent. We add PI to it to move the function’s output from the domain $[-\pi; \pi]$ to the associated radians $[0; 2 \pi]$.</p>
<blockquote>
<p>Side note: Why wouldn’t a shader that <em>literally</em> deals with numbers not have a constant for Pi?</p>
</blockquote>
<p>Lastly, you will see that I multiplied the coordinates with <code>vec2(-1, 1)</code>. Like the code I’ve used in the vertex shader to flip the $y$-axis, this effectively flips the $x$-axis. Why do I do that? Well, just as radians have this weird habit of moving counter-clockwise, the arc tangent function has the weird habit of placing its “starting point” oddly. I want the colors to start at the right-center, and move clockwise. The native output of the arc tangent with the addition of $\pi$ would place the start on the wrong side, so by flipping everything along the vertical axis, I fix that.</p>
<p>Now we have information on all the segments as well as where on the circle the current pixel lies, and we can use this to compute its color. I’m not going to dissect the entire function here, but the big picture is as follows: I take the two PI radians of space that we have around the unit circle as a line, where we start with the first segment at 0 radians, and end with the last segment at $2 \pi$ radians.</p>
<p><img src="https://www.hendrik-erz.de/storage/app/media/blog/webgl-series/figure_5.1_computing_color.png" alt="A visualization of how to calculate the color for a given coordinate." title="A visualization of how to calculate the color for a given coordinate." /></p>
<p>The rest of the function checks in which segment the pixel actually lies. If the pixel lies in one of the threshold areas (coming <em>from</em> the previous color, or going <em>into</em> the next color), I calculate how far the pixel lies in that threshold area, and mix the two colors based on that. The whole <code>prevColor</code> and <code>nextColor</code> calculations merely ensure that the circle is infinite, meaning the <code>prevColor</code> of the first segment we check is the last segment’s color, and the <code>nextColor</code> of the last segment is the first segment’s color. The code also ensures to skip empty (=unused) segments.</p>
<p>Now we can, very unceremonially, change the <code>fragColor</code> output:</p>
<pre><code class="language-glsl">fragColor = compute_color();
</code></pre>
<p>That’s it! If you now re-run the code, it should appropriately color each segment, starting from the right-middle position in a clockwise motion. In between the segments, it should blend the adjacent colors for a neat gradient effect.</p>
<h2>Animating Segment-Changes</h2>
<p>Now it is time to turn to the fourth property I used to store the segment ratios:</p>
<pre><code class="language-typescript">private segmentRatiosTarget: Vec4
private segmentRatiosCurrent: Vec4
</code></pre>
<p>There are two properties to remember the ratios of the segments. One contains the <em>current</em> ratio, and one contains the <em>target</em> ratio. You hopefully can see where this leads: Whenever the segment counts update, I don’t actually change the <code>current</code> ratios, I only set a new <code>target</code> ratio. The current ratios are only overwritten in the render function, naturally time-dependent. The code is straight-forward:</p>
<pre><code class="language-typescript">const step = deltaMs / this.segmentAdjustmentAnimationStepDuration
let hasRatioChanged = false
for (let i = 0; i &lt; MAX_SUPPORTED_SEGMENTS; i++) {
  const cur = this.segmentRatiosCurrent[i]!
  const tar = this.segmentRatiosTarget[i]!

  if (cur === tar) {
    continue
  }

  hasRatioChanged = true

  const direction = cur &gt; tar ? -1 : 1
  const difference = Math.abs(tar - cur)

  if (difference &lt; 10e-4) {
    this.segmentRatiosCurrent[i] = tar
  } else {
    this.segmentRatiosCurrent[i]! += direction * step * difference
  }
}

if (hasRatioChanged) {
  this.setSegments()
}
</code></pre>
<p>This is quite a bit of code, but it works simple. For each segment, we retrieve its current ratio and its target ratio. If both are the same, we are already done. However, if they are not, it will move the current ratio closer towards the target ratio by <code>direction * step * difference</code>. Doing it this way gives us the (accidental) benefit that we have an easing function.</p>
<p>An easing function simply means that initially the changes towards the target ratio will be very fast, and getting incrementally slower as the current ratio approaches the target ratio. Essentially, this is the same as the CSS <code>ease-out</code> function that you might have seen at some point. Finally, to avoid endless loops (since the difference can never reach 0 this way), we “snap” the current ratio to the target ratio if the difference becomes barely perceptible. The number <code>10e-4</code> (0.0001) is a purely arbitrary number that I found sufficient.</p>
<p>Finally, this code also ensures that we don’t needlessly update the segment structure if nothing has changed. This keeps performance up as long as you don’t change the ratios. The movement speed is controlled by <code>segmentAdjustmentAnimationStepDuration</code> which you can control on the demo page.</p>
<h2>Final Thoughts</h2>
<p>At this point, the iris indicator is basically done. And all it took was about 9,000 words (about a regularly-sized research paper)!</p>
<p>The next steps involve doing some post-processing. Specifically, I include three post-processing stages: Multi-sample antialiasing, a bloom filter, and tone mapping.</p>
<p>Everything up until now you could have easily also done in SVG. But it’s the post-processing stages that really set OpenGL apart, and which I was looking forward to the most. So stay tuned for part 6 of this journey!</p>
<h2>The Full WebGL Series</h2>
<p>Jump directly to an article that piques your interest.</p>
<ol>
<li><a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">A Rabbit Hole Called WebGL</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Setup and the OpenGL Rendering Pipeline</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Drawing Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">Animating Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Computing Colors</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">Post-Processing</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">Adding a Bloom-Filter</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">Implementing Multi-Sample Antialiasing (MSAA)</a></li>
</ol>]]>
  </content>
</entry>
<entry>
  <title>WebGL Series, Part 4: Animating Things</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things" />
  <id>https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things</id>
  <published>2026-01-30T11:00:00+00:00</published>
  <updated>2026-02-28T19:01:47+00:00</updated>
  <summary type="html"><![CDATA[In this fourth article of my eight-part series on WebGL, I explain how I animated the rendering and ensure that it conveys a sense of motion.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">
    <![CDATA[<p>This week, I’m going to take the rays we’ve set up last week, and start to animate the entire thing. This means that, after today’s article, your rays will move around and you’ll be roughly 50% through the ordeal! <a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Read last week’s article here</a>.</p>
<p><a href="https://nathanlesage.github.io/iris-indicator/">View the demo-page</a></p>
<h2>Understanding Transformations</h2>
<p>So, cool, we now have a stationary image of some rays that should at least somewhat resemble an iris. But we can do more. The most simple step that does not require many changes to the rendering engine is to add animations. I wanted the animation to convey a sense of movement. We can do so in two ways: First, by rotating all the rays slowly around the origin, and second by moving the lengths of the rays slowly over time.</p>
<p>This is actually quite simple to achieve. We already have most of the code in place, and we would not have to really modify any code in the rendering engine for that. We’ll do that anyways, because it’s simpler, faster, and helps you understand more complex setups.</p>
<p>Let us first talk about the rotation, as that is simpler. As you may have noticed when running just the code I provided earlier, that the rays were all centered around the origin of the canvas – the top-left corner. That’s clearly not desirable. The reason is that in the ray generation code I only calculate positions based on the unit circle, but the unit circle is centered at the origin of $x = 0$ and $y = 0$. But we want the origin to be at the center of the canvas, i.e., half its height and half its width.</p>
<p>How do we fix that? Well, first we could indeed move the transposition of all the rays away from the origin into the center of the canvas to the ray coordinate generation. But as I mentioned earlier, we could rather move that to the vertex shader, similar to the scaling that I do indeed perform in the ray generation code. Again, that was an optimization I could have done, but decided not to, because this project has already eaten up so much time.</p>
<blockquote>
<p><strong>Note, February 1, 2026</strong>: This is not quite true. As this article series has been unfolding for you, I have added a scale matrix to the mix, based on my impression that we can just do the minimal work in JavaScript, and move everything parallelizable into the (parallel running) vertex shader. However, there's one thing I forgot: When I calculate the ray coordinates by multiplying the cosine and sine of the coordinate with an inner and outer radius, I forgot that this calculation <em>has</em> to happen in the JavaScript code. The only &quot;optimization&quot; I could have done would've been to use the fraction-of-1 inner and outer radii that I define in the ray generation code, but that is in fact a calculation that the JavaScript has to do very infrequently. So in fact there is not much to be gained. So, having the scaling operation in the JavaScript code instead of the vertex code really is the only option we have to actually produce properly spaced vertices. So, please remember also in the following, when I talk about some &quot;optimization,&quot; this is really not the case. It has to be done this way if we want to end up with visible rays.</p>
</blockquote>
<p>So instead there are only two transformations we now have to add to the code: Transposition and rotation. We do so using matrices. In game development code, you will frequently encounter matrices, just because that is simpler to work with. Indeed, in my actual research work, I also have to frequently work with matrices, so for me this was a mere exercise. Essentially, what you do in game development (or any rendering, for that matter), is you define three matrices. One describes a transposition of a point from one point to another one. The second describes the rotation of all vertices. And the last one describes how to scale all points uniformly up or down (make them bigger or smaller). The scaling is already done, so we only have two matrices left.</p>
<p>What we will do is produce these two matrices, and then multiply them. Because a matrix multiplication will actually ensure that we can apply both transformations to each vertex at the same time. And, because these matrices remain the same for all vertices (that is, they are “global”), we only have to perform that work <em>once</em> and then we can just pass the matrix to the vertex shader that will transform all vertices, but in <em>parallel</em>.</p>
<h2>Creating Matrices</h2>
<p>To achieve that, we borrow three utility functions again from WebGL fundamentals. I was too lazy to deduce the matrices myself, and these matrices are so common, I didn’t want to bother reinventing the wheel here. To learn more, I recommend reading the <a href="https://webgl2fundamentals.org/webgl/lessons/webgl-2d-matrices.html">corresponding WebGL Fundamentals guide</a>.</p>
<p>The code is simple:</p>
<pre><code class="language-typescript">function translationMatrix (tx: number, ty: number): Mat3 {
  return [
    1, 0, 0,
    0, 1, 0,
    tx, ty, 1
  ]
}

function rotationMatrix (rad: number): Mat3 {
  const c = Math.cos(rad)
  const s = Math.sin(rad)
  return [
    c, -s, 0,
    s, c, 0,
    0, 0, 1
  ]
}

function mat3mul (mat1: Mat3, mat2: Mat3): Mat3 {
  const [a00, a01, a02] = [mat1[0 * 3 + 0]!, mat1[0 * 3 + 1]!, mat1[0 * 3 + 2]!]
  const [a10, a11, a12] = [mat1[1 * 3 + 0]!, mat1[1 * 3 + 1]!, mat1[1 * 3 + 2]!]
  const [a20, a21, a22] = [mat1[2 * 3 + 0]!, mat1[2 * 3 + 1]!, mat1[2 * 3 + 2]!]
  const [b00, b01, b02] = [mat2[0 * 3 + 0]!, mat2[0 * 3 + 1]!, mat2[0 * 3 + 2]!]
  const [b10, b11, b12] = [mat2[1 * 3 + 0]!, mat2[1 * 3 + 1]!, mat2[1 * 3 + 2]!]
  const [b20, b21, b22] = [mat2[2 * 3 + 0]!, mat2[2 * 3 + 1]!, mat2[2 * 3 + 2]!]
  return [
    b00 * a00 + b01 * a10 + b02 * a20,
    b00 * a01 + b01 * a11 + b02 * a21,
    b00 * a02 + b01 * a12 + b02 * a22,
    b10 * a00 + b11 * a10 + b12 * a20,
    b10 * a01 + b11 * a11 + b12 * a21,
    b10 * a02 + b11 * a12 + b12 * a22,
    b20 * a00 + b21 * a10 + b22 * a20,
    b20 * a01 + b21 * a11 + b22 * a21,
    b20 * a02 + b21 * a12 + b22 * a22
  ]
}
</code></pre>
<p>One thing I <em>did</em> learn while copying this code is that I really grew accustomed to how easy it is to work with matrices in <code>numpy</code>, and how cumbersome it looks like in JavaScript. But, alas these three functions do the work perfectly.</p>
<p>Now we have to modify the <code>drawFrame</code> function in the <code>IrisIndicator</code> class:</p>
<pre><code class="language-typescript">const now = Date.now()
const msPerRotation = 120_000
const rot = now % msPerRotation / msPerRotation
const moveByRadians = -rot * (2 * Math.PI)

const originX = this.gl.canvas.clientWidth / 2
const originY = this.gl.canvas.clientHeight / 2

const mat = mat3mul(
  translationMatrix(originX, originY),
  rotationMatrix(moveByRadians)
)
</code></pre>
<p>What we first do is determine the rotation of all the rays. We want this to be an endless spinning motion, and we want this to depend on time, <em>not</em> the frame rate. Why? Bear with me, I will explain below. First, let’s finish talking about this code block.</p>
<p>First, I define <code>rot</code> as a ratio between 0 and 1, based on time. So, if we say that the rays should take 120 seconds, or two minutes, for one full rotation, line three essentially “clamps” the rotation between 0 and 1, and does so in lockstep with time. So even if you reload the page, the rotation will remain constant, and independent of frame rate. We then turn the rotation into radians, because that is the angle that most trigonometric functions expect.</p>
<blockquote>
<p>Side note: Why have we, as a society, decided that we want to use degrees to describe portions of a circle, when all the math only works with radians? I have never understood that. I mean, I can intuitively say what 270° of rotation mean, and it looks weird to call it $\pi$ radians. But that’s what sine and cosine functions expect. Anyways, if you really insist on using degrees instead of radians, you can convert between the two using $rad = \frac{d}{180} * \pi$. Back to topic.</p>
</blockquote>
<p>As you may or may not know, radians move “backwards” around the circle, in a counterclockwise motion. The code <code>-rot * (2 * Math.PI)</code> essentially reverses this motion to move in clockwise direction.</p>
<p>The next lines are fortunately simple: Calculate the center of the canvas and create a translation matrix based on that. Finally, we just multiply the two matrices to arrive at one single matrix that performs both transformations at the same time. Now we have to provide that matrix to the WebGL engine.</p>
<p>To do so, we first add a third parameter to the draw function that receives this matrix:</p>
<pre><code class="language-typescript">this.engine.draw(triangleData, nComponents, mat)
</code></pre>
<p>Next, in the vertex shader, we have to tell it that we want to pass a matrix in:</p>
<pre><code class="language-glsl">uniform mat3 u_matrix;
</code></pre>
<p>We have to again retrieve the actual memory position:</p>
<pre><code class="language-typescript">this.matrixUniformLocation = gl.getUniformLocation(this.program, 'u_matrix')
</code></pre>
<p>And finally, we provide our matrix at draw time:</p>
<pre><code class="language-typescript">gl.uniformMatrix3fv(this.matrixUniformLocation, false, matrix)
</code></pre>
<p>Finally, tell the vertex shader to utilize this matrix:</p>
<pre><code class="language-glsl">vec2 transformed = (u_matrix * vec3(a_position, 1)).xy
vec2 normalized = transformed / u_resolution;
// ... the rest of the shader code
</code></pre>
<p>One thing to note: As you can see, the matrices are 3×3, but we are only dealing with x/y coordinates. To correctly multiply our 2D coordinate with a 3D matrix, we have to add a third dimension, calculate, and then discard it again by only extracting <code>.xy</code> from the coordinate. We’re not dealing with depth here, it’s all perfectly two-dimensional. But if you want to add a third dimension, you can absolutely do so, you’d just have to add one variable each to the two matrices above. I’ll leave that to google yourselves.</p>
<p>If you now re-render the entire thing, you should see that both the rays are actually centered in the canvas now, and that they rotate as you reload the page. But it would be great to actually see the animation in motion without hitting the reload-button, right? For that, we have to do a slight modification to the <code>IrisIndicator</code> class. We need to create an infinite loop of animating.</p>
<h2>Defining a Rendering Loop</h2>
<p>Let’s start with a simple solution:</p>
<pre><code class="language-typescript">setInterval(() =&gt; iris.drawFrame(), 1000/60)
</code></pre>
<p>The <code>1000/60</code> just means: “Run the function 60 times a second,” a.k.a.: Render at 60 fps.</p>
<p>If you do so, you should see the rotation in action, but depending on your display, it might flicker badly. The reason is that <code>setInterval</code> doesn’t care about your display’s refresh rate, and as such it may draw when your display refreshes, or vice versa, leading to flickering. To fix that, we’ll just have to instead call <code>requestAnimationFrame()</code> and, at the end of the <code>drawFrame</code> function, request another animation frame:</p>
<pre><code class="language-typescript">class IrisIndicator {
  // ... other code
	function loop(timestamp: number) {
    this.drawFrame()
    requestAnimationFrame(ts =&gt; this.loop(ts))
	}
}

// And, at the end of the entire setup code:
requestAnimationFrame(ts =&gt; iris.loop())
</code></pre>
<p>This essentially tells the browser: “Please run this function at the appropriate time to make sure we can draw without flickering.” How often this function gets executed therefore depends on your display’s refresh rate. You can actually figure out your display’s refresh rate by taking the <code>timestamp</code> that the animation frame passes to draw frame (see the code repository for how that is done) and, if you divide <code>1000/(currentTime - previousTime)</code>, you get the frame rate. Neat! Using this information you can even implement a frame limiter that limits the refresh rate to, say, 30 fps, by simply not doing any work until at least <code>1000/fpsLimit</code> milliseconds have passed.</p>
<p>This is also what I alluded to earlier: The rotation only depends on time, not on the frame rate. This means that, strictly speaking, the rotation “continues” even if your browser doesn’t render the animation <em>at all</em> (because, e.g., your browser is minimized). But, more crucially, regardless of display refresh rate, this ensures that the animation will not change in speed. This is generally a good habit to foster.</p>
<h2>Animating the Rays</h2>
<p>Now that our iris rotates successfully, it is time to also animate the individual rays. Earlier, I have already introduced some properties for each ray that we will now use to actually animate them. What I will animate is only their lengths, so that they move inward, then outward, then inward, and so on.</p>
<p>Fortunately, we can do this entirely in the <code>drawFrame</code> method without any changes to the rest of the code:</p>
<pre><code class="language-typescript">const speed = deltaMs / this.rayMovementSpeed
for (const ray of this.rays) {
  const { min, max } = ray.radius
  let { current, inc } = ray.radius
  const increment = (max - min) * speed
  current = inc ? current + increment : current - increment
  if (current &lt;= min) {
    current = min
    inc = true
  } else if (current &gt;= max) {
    current = max
    inc = false
  }

  ray.radius = { ...ray.radius, current, inc }
}
</code></pre>
<p>As you can see, the more we progress in this rabbit hole, the more complex the code becomes. What do we do here? First, we determine the speed with which we want to adjust the current length of each ray – again, based on time, not on frame rate. <code>this.rayMovementSpeed</code> is essentially just another parameter that we can set that determines how fast the rays will change time. Feel free to play around with some values.</p>
<p>In any case, here we actually perform a change in how the rays will be calculated: First we extract the minimum and maximum radius. These are randomly allocated (within some limits), and between these two the rays oscillate. We extract <code>current</code> and <code>inc</code> separately because we need to modify them. <code>current</code> remembers the current radius of each ray, and <code>inc</code> simply tells us if we’re currently in a “lengthening” motion or in a “shortening” motion. First, we determine how fast we need to adjust the ray. Doing so will make rays move faster, the longer the distance they travel.</p>
<p>Then, based on <code>inc</code>, we either increase or decrease the current radius of the ray, and change direction if we overshoot the upper or undershoot the lower limit. Finally, we adjust the values in the ray object itself. Afterwards, we can again call <code>this.rays.map()</code> to calculate the coordinates based on these updated positions.</p>
<p>If you now re-run the code, now you should see both a rotational movement <em>and</em>  the rays moving around. This completes everything I wanted to animate in terms of geometry.</p>
<h2>Final Thoughts</h2>
<p>We’re four articles deep into the series, and we still haven’t quite completed the journey. At this point, you should have something that starts to resemble the final animation, but there are two things still missing. Crucially, there is no color variation. This is what I will go through in the next installment next week. Second, we will add some post-processing in the articles afterwards. So, one more, stay tuned!</p>
<h2>The Full WebGL Series</h2>
<p>Jump directly to an article that piques your interest.</p>
<ol>
<li><a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">A Rabbit Hole Called WebGL</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Setup and the OpenGL Rendering Pipeline</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Drawing Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">Animating Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Computing Colors</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">Post-Processing</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">Adding a Bloom-Filter</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">Implementing Multi-Sample Antialiasing (MSAA)</a></li>
</ol>]]>
  </content>
</entry>
<entry>
  <title>WebGL Series, Part 3: Drawing Things</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things" />
  <id>https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things</id>
  <published>2026-01-23T11:00:00+00:00</published>
  <updated>2026-02-28T19:01:41+00:00</updated>
  <summary type="html"><![CDATA[In this third installment of my 8-part series on WebGL, I explain how I finally was able to draw triangles onto the screen, based on the previous two articles that were merely concerned with setting things up.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">
    <![CDATA[<p>Welcome back to the third installment of my weirdly complicated journey through WebGL. Today I’ll treat you to actually using the setup from last week’s article to draw some shapes. <a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Read the previous article here</a>.</p>
<p><a href="https://nathanlesage.github.io/iris-indicator/">View the demo-page</a></p>
<h2>Setting up State</h2>
<p>With the distinction between engine and rendering engine from last week’s article, an understanding of the rendering pipeline, and a simple program at hand, we can <em>finally</em> get to drawing things!</p>
<p>To recap, what we want to achieve is an iris look, and we need to use triangles for that. So let’s do that.</p>
<p>Essentially, what we want is to represent a single ray by a single triangle, and simply instantiate as many rays as we want. However, we will also need to remember some state because we do have to move these rays at some point.</p>
<p>How can we represent a triangle, though? Well, the easy answer would be: three coordinates. But we shouldn’t do this, for two reasons: First, I want to animate those rays at some point. And second, we can make use of circle math for that.</p>
<p>Here’s the first part of circle math coming in. Since we want to draw an iris, we have a bunch of rays arranged in a circle, and instead of using coordinates to describe them, we can also describe them by providing an angle and a radius. So instead of remembering six coordinates, we can only store two numbers.</p>
<p>To make working with the data structure simpler, let’s set up a simple interface (if you’re sticking to plain JavaScript, you won’t need this):</p>
<pre><code class="language-typescript">interface Ray {
  radians: number,
  width: number,
  radius: {
    inner: number;
    min: number;
    max: number;
    current: number;
    inc: boolean;
  }
}
</code></pre>
<p>Here, we remember a few additional pieces of data that we will only need for the animation. Effectively, what we need is only the <strong>radians</strong>, which describes where on the unit circle this triangle will be; the <strong>width</strong> which describes how “thick” the triangle will be (where width is provided in radians, too); and the inner and outer (“max”) <strong>radius</strong>. We supply an inner radius because we don’t want the rays to start at the center. This way, we can make the rays form a circle at their center, which symbolizes the center of the eye. Furthermore, by saving the radius individually for each ray, we can vary that and make the rays vary in size. The other properties will be used in the animation later.</p>
<p>Now, we need to create a few rays. Here’s the function that does this in the <code>IrisIndicator</code> class:</p>
<pre><code class="language-typescript">function generateRays () {
    this.rays = []

    const { cWidth, cHeight } = this.engine.textureSize()
    
    const canvasDiameter = Math.min(cWidth, cHeight)
    const canvasRadius = canvasDiameter / 2
    const outerRadius = 1.0 * canvasRadius
    const innerRadius = 0.3 * canvasRadius
    const minVaryRadius = 0.6 * canvasRadius

    const overlapFactor = 3 // Should be &gt; 1
    const widthInRadians = ((2 * Math.PI) / (this.nRays * 3)) * overlapFactor
    
    for (let i = 0; i &lt; this.nRays; i++) {
      const pos = i / this.nRays
      const centerInRad = pos * 2 * Math.PI
    
      const RADIUS_VARIATION = 0.1
      const rMin = minVaryRadius + Math.random() * RADIUS_VARIATION * canvasRadius
      const rMax = outerRadius - Math.random() * RADIUS_VARIATION * canvasRadius
      const startRadius = minVaryRadius + (rMax - rMin) * Math.random()
    
      this.rays.push({
        radians: centerInRad, width: widthInRadians,
        radius : { inner: innerRadius, min: rMin, max: rMax, current: startRadius, inc: Math.random() &gt; 0.5 }
      })
    }
    }
</code></pre>
<p>Aside from the circle math, this code should be fairly straight forward. A few notes: First, we calculate positions based on the actual size of the canvas (which is returned by the <code>textureSize</code> method which will be introduced later). Second, we allow the amount of rays to vary. Experimenting with that number I have found that, the larger the canvas becomes, the more rays are necessary to retain that “iris” resemblance. Too few, and it looks like a comic star; too many, and it tanks performance. (I would have never thought that rendering a few triangles would actually be able to make my M2 Pro struggle, but here we are.)</p>
<p>Next, there is an “overlap factor.” What exactly is this? Well, we want the rays to not be discernible as the triangles that they are. For this, we need to overlap them a bit. This will make the end of one triangle to overlap with the beginning of the next one. That factor as well is entirely arbitrary, and I have found a factor of <code>3</code> to look quite decent. (If you make that factor too big, things will start to look weird.)</p>
<p>With this information at hand, we can quickly write a small function that takes in this circle information and spits out a set of coordinates for that triangle:</p>
<pre><code class="language-typescript">function coordsForRay (radians: number, width: number, innerRadius: number, outerRadius: number) {
  const [rad1, rad2, rad3] = [radians - width, radians, radians + width]
  const [
    x1, y1, x2, y2, x3, y3
  ] = [
    Math.cos(rad1) * innerRadius, Math.sin(rad1) * innerRadius,
    Math.cos(rad2) * outerRadius, Math.sin(rad2) * outerRadius,
    Math.cos(rad3) * innerRadius, Math.sin(rad3) * innerRadius
  ]

  return [ x1, y1, x2, y2, x3, y3 ]
}
</code></pre>
<p>This should make intuitive sense: We have a center position and an offset for the two base-coordinates, and we simply calculate a position based on sine and cosine according to circle math. We then multiply these values (which are bound to $[0; 1]$) by the actual radius length, and we have three coordinates, centered around <code>radians</code> with the base being spaced <code>width</code> radians away from that.</p>
<blockquote>
<p>(Nota bene: I could also have removed the dependency on the actual canvas size in this calculation of coordinates, and instead provided a scaling matrix later, but that’s an optimization that is simply not necessary for this simple animation, so I kept this code. Believe me, we’re not even 10% of the way to the final animation.)</p>
</blockquote>
<p>Here’s a visualization of that:</p>
<p><img src="https://www.hendrik-erz.de/storage/app/media/blog/webgl-series/figure_3.1_ray_setup.png" alt="The ray setup, based only on an angle on the unit circle, an inner radius, and outer radius, and a width." title="The ray setup, based only on an angle on the unit circle, an inner radius, and outer radius, and a width." /></p>
<h2>Preparing the Shaders</h2>
<p>At this point, we have some data available that we can draw. But how do we do that? Now we need to head back into the <em>rendering</em> engine and prepare it for actually taking in the data. This is something where OpenGL just provides a “default” way to do so. Remember from above that we have defined a variable in the vertex shader:</p>
<pre><code>in vec2 a_position;
</code></pre>
<p>The <code>in</code> keyword tells OpenGL that this variable receives vertex data, <code>vec2</code> just mentions that there will be two floating point values that comprise each position (one $x$ and one $y$ coordinate). What you have to do is provide OpenGL with all coordinates that need to be processed, and it will make sure to run the vertex shader with each of them.</p>
<p>To pass the data to OpenGL, we have to perform three steps:</p>
<ol>
<li>Create a vertex buffer</li>
<li>Find the location of that vertex buffer in the vertex shader</li>
<li>Bind that buffer</li>
<li>Mark the buffer as being used to transfer vertex information</li>
<li>Tell OpenGL what the data format is that you will write into that</li>
</ol>
<p>Creating the vertex buffer is comparatively simple:</p>
<pre><code class="language-typescript">this.positionBuffer = gl.createBuffer()
</code></pre>
<p>We also need to tell OpenGL that whatever we will be putting into this buffer should be passed in as the <code>a_position</code> variable into the vertex shader. For this, we essentially have to “find” the memory location of that variable in the compiled program. We also need to do that with the <code>u_resolution</code> parameter that we use to tell the vertex shader what actual canvas resolution we will be basing the (absolute) coordinates on:</p>
<pre><code class="language-typescript">this.positionAttributeLocation = gl.getAttribLocation(this.program, 'a_position')
this.resolutionUniformLocation = gl.getUniformLocation(this.program, 'u_resolution')
</code></pre>
<p>Next, we tell OpenGL the data format that we will put into it:</p>
<pre><code class="language-typescript">gl.bindBuffer(gl.ARRAY_BUFFER, this.positionBuffer)
gl.enableVertexAttribArray(this.positionAttributeLocation)
gl.vertexAttribPointer(this.positionAttributeLocation, 2, gl.FLOAT, false, 0, 0)
</code></pre>
<p>One thing to always remember in OpenGL is that, before you do <em>anything</em> with any data structure, you usually need to “bind” it, which essentially makes it active. This runs a bit counter to how JavaScript typically works, so it takes some getting used to. What we do above is (1) tell OpenGL to make the position vertex buffer active; (2) tell it that this buffer will be used to provided vertices to the shaders; (3) tell OpenGL the data format that we will provide the coordinates in (two floating point values, that is x/y coordinates).</p>
<p>Again, these functions are all separate, because for more complex applications you might want to re-use buffers to provide various types of data in various shapes, or maybe even write the same buffer into multiple locations.</p>
<h2>Drawing Things</h2>
<p>Now we have everything in place to finally draw something! Just as a notice: Zettlr tells me that we’re now at slightly above 4,700 words (since the beginning of the first article), and <em>only now</em> can we actually produce something. <em>Everything before now was just the setup</em>!</p>
<p>To draw something, let us quickly mock up a draw function:</p>
<pre><code class="language-typescript">class IrisIndicator {
    // ... other code
  private drawFrame () {
    const data = this.rays
      .map(({ radians, width, radius }) =&gt; {
        return coordsForRay(radians, width, radius.inner, radius.current)
      })
      .flatMap(coords =&gt; coords)
    const componentsPerTri = 3
    const triangleData = new Float32Array(data)

    const nComponents = componentsPerTri * this.rays.length
    this.engine.draw(triangleData, nComponents)
  }
}
</code></pre>
<p>And in the engine:</p>
<pre><code class="language-typescript">class WebGLEngine {
    // ... other code
    draw (triangleData: Float32Array, count: number) {
      const gl = this.gl
      resizeCanvasToDisplaySize(gl.canvas as HTMLCanvasElement)

      gl.bindFramebuffer(gl.FRAMEBUFFER, fbo)
      gl.viewport(0, 0, gl.canvas.clientWidth, gl.canvas.clientHeight)
      gl.uniform2f(this.resolutionUniformLocation, gl.canvas.clientWidth, gl.canvas.clientHeight)
      gl.clear(gl.COLOR_BUFFER_BIT)
      gl.bufferData(gl.ARRAY_BUFFER, triangleData, gl.DYNAMIC_DRAW)
      gl.drawArrays(gl.TRIANGLES, 0, count)
    }
}
</code></pre>
<p>And, finally, because we do not yet provide any textures or something to the fragment shader, simply make it produce the same color for all pixels that are rendered:</p>
<pre><code class="language-glsl">fragColor = vec4(0.0, 1.0, 1.0, 1.0);
</code></pre>
<p>This is simply a cyan color so that we can actually see the triangles when they are rendered.</p>
<p>That’s a lot of code. What the <code>IrisIndicator</code> class does is simply take our generated rays, produce coordinates for them, turn them into a one-dimensional floating point array, and calculate how many components are in the array. The latter just tells OpenGL that it should process this vertex information by taking two consecutive numbers, one after another. We have to do this, because this just makes the memory layout simpler and the operations faster.</p>
<p>We then pass this prepared information to the engine, which handles the drawing. First, we need to ensure that the display size equals the canvas size (this is a utility function courtesy of WebGLFundamentals). Second, we need to tell OpenGL that we wish to write to the canvas. We do so by setting the current frame buffer to <code>null</code>. (In OpenGL, we can also write to other frame buffers, for example if we want to do something else to the colors, like post-processing.) Third, we have to ensure that the viewport is set accurately.</p>
<p>The viewport simply tells OpenGL how to rasterize (i.e., how large the frame buffer actually is). While we’re at it, we also set the resolution variable so that the vertex shader can turn our absolute ray coordinates into clip-space coordinates. Fourth, we have to clear the canvas. (We need to provide a color, by calling <code>gl.clearColor(r, g, b, a)</code>.) If we don’t clear the canvas, we’re only overwriting some pixels, which can lead to funny results (try it out!). I set the clear color in the setup code to a charcoal-gray-ish color:</p>
<pre><code class="language-typescript">const BACKGROUND_COLOR = [0.3, 0.3, 0.4, 1.0]
</code></pre>
<p>The last two lines finally <em>actually</em> do the drawing. <code>bufferData</code> writes the provided triangle coordinates into the <code>positionBuffer</code>, and <code>drawArrays</code> commences the actual draw. Only that final line really starts up the vertex shader and does all the calculating. One note of caution, however: You need to make sure to <code>bind</code> the position buffer properly. If you don’t, then OpenGL doesn’t know where it should write the data to. This line is missing here, because we bind the buffer above, and never unbind it. This is why I only kept a single vertex buffer: This just makes it easy, and avoids me having to remember that I need to properly bind and unbind the buffer.</p>
<blockquote>
<p>(Nota bene: The binding and unbinding especially of <em>textures</em> was my most common error source, because it’s extremely easy to get confused which texture is currently bound, and if you try to write to a texture that is also bound to be read from, OpenGL will make funny noises.)</p>
</blockquote>
<p>To actually run all of this, make sure to retrieve a canvas and the WebGL2 context, then instantiate a <code>new IrisIndicator(gl)</code>, and finally call <code>.drawFrame()</code> on it.</p>
<p>At this point, you should be able to see some cyan rays displayed on your canvas!</p>
<h2>Final Thoughts</h2>
<p>That took an awful lot of code just to draw a bunch of small triangles onto the screen. When I was at this point, I think I was already en route to my New Year’s holiday, so past Christmas. It dawned upon me that I may have grossly underestimated the amount of work necessary to get all of this done.</p>
<p>But, alas, I did start, and I did produce something, so the sunken-cost fallacy started to keep me afloat for the remainder of this odyssey. Make sure to come back next week, where I will explain how I animated all of this. I’ll also explain how to actually render in sync with your display to avoid flickering. So stay tuned!</p>
<h2>The Full WebGL Series</h2>
<p>Jump directly to an article that piques your interest.</p>
<ol>
<li><a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">A Rabbit Hole Called WebGL</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Setup and the OpenGL Rendering Pipeline</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Drawing Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">Animating Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Computing Colors</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">Post-Processing</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">Adding a Bloom-Filter</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">Implementing Multi-Sample Antialiasing (MSAA)</a></li>
</ol>]]>
  </content>
</entry>
<entry>
  <title>WebGL Series, Part 2: Setup and the OpenGL Rendering Pipeline</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline" />
  <id>https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline</id>
  <published>2026-01-16T11:00:00+00:00</published>
  <updated>2026-02-28T19:01:31+00:00</updated>
  <summary type="html"><![CDATA[This is the second part of my eight-part series on WebGL. This article introduces the basic architectural design of the iris indicator and provides a primer on what WebGL is.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">
    <![CDATA[<p>This is part two of a series on WebGL and creating an iris indicator from scratch. <a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">Read the first article here</a>.</p>
<p>In this article, I will talk about setting up a new project to use WebGL. This includes state management, and an initial understanding of the OpenGL rendering pipeline. Next week, I will finally talk about actually drawing things.</p>
<p><a href="https://nathanlesage.github.io/iris-indicator/">View the demo-page</a></p>
<h2>Primer to WebGL</h2>
<p>With the background out of the way, I sat down to create my contraption from hell. What I already knew going into this project is that I would be needing an HTML canvas element, some WebGL context, and <em>a lot of circle math</em>. I also knew that WebGL is essentially just a browser-based version of OpenGL, so I knew I’d be dealing with shaders. But there were still a lot of unknowns that I will introduce along the way.</p>
<p>Let’s first start with the basics. Displaying things on a website is today among the easiest things to do. Many children<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> could code up a very simple HTML website that can be opened in a browser. Once they add some image, they are already drawing something (even though most of the work is being done by the browser). Displaying <em>rendered graphics</em> is comparable to displaying an image on a website, but with the added complexity of you having to create the image first.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup></p>
<p>But how do you actually render something? Well, for that you need a graphics rendering engine. No surprise here. Today, there are roughly three big players in this game: Microsoft has DirectX, which effectively is what almost all games use to deliver next-gen photorealism yadda-yadda. Because DirectX is proprietary Windows-stuff, there is an alternative called Vulcan that can also render things, but which is supported also on Linux. Since Apple introduced its ARM-chips (“Apple Silicon”) back in 2020, they have also invested heavily in an Apple-only solution called “Metal.” And, finally, there is also OpenGL, the “Open Graphics Library.” OpenGL is almost as old as DirectX and was started to ensure that there’s always an <em>open</em> alternative to DirectX.</p>
<p>A little more than a decade ago, the big browser vendors decided that they also wanted to enable web developers to do some graphics rendering on their websites. Was this a smart decision? I don’t know, but the end result is what we know as WebGL: A JavaScript implementation of OpenGL so that we can render silly little boxes in our favorite data-stealing browser. Sometime in 2019, it received its version 2.0 which included support for newer OpenGL versions. And ever since, when you wanted to do some fancy 3D-rendering in the browser, you’d be using WebGL.</p>
<p>Using WebGL is actually quite simple: You first have to create a canvas:</p>
<pre><code class="language-html">&lt;canvas id=&quot;webgl&quot;&gt;&lt;/canvas&gt;
</code></pre>
<p>Then, you access it in JavaScript, and retrieve the WebGL context from it:</p>
<pre><code class="language-javascript">const canvas = document.querySelector('#webgl')
const gl = canvas.getContext(&quot;webgl2&quot;) // Or &quot;webgl&quot; if you want the old API.
</code></pre>
<p>Now you can use this <code>gl</code> object to draw something! So simple!</p>
<p>… is what I would say if it was. But initializing WebGL is actually the simplest and most straight-forward step of them all. And at this point you haven’t even thought about writing something on the canvas.</p>
<h2>Setting up WebGL</h2>
<p>Once you have the WebGL context, the fun just starts. Now you have to configure it, set everything up, and perform quite a lot of work before you can even draw anything. However … what should you even configure? Because I had no prior experience with WebGL or OpenGL at all, I heavily consulted the websites <a href="https://www.webglfundamentals.org">WebGLFundamentals.org</a>, and later on <a href="https://www.webgl2fundamentals.org">WebGL2Fundamentals.org</a>. For the bloom filter and MSAA I consulted <a href="https://www.learnopengl.com">LearnOpenGL.com</a>. Big kudos to all contributors to these pages, because without those comprehensive “From zero to hero”-guides, I would have taken much, much longer to make my dream a reality.</p>
<p>So, how should we start? Before setting up WebGL directly, we need to think about the project structure more generally. Because there is <em>a lot</em> of work to be performed, and you will need to do quite a bit of state management to achieve the desired result. So before writing any line of code, let’s talk a bit about the fundamentals of any dynamic rendering engine, and, by extension, how to organize the different parts.</p>
<h3>State Management</h3>
<p>Let’s start with understanding how to actually render something to the screen, and let’s use a video game analogy because that is easier to understand. In every video game, you have two things at the same time: a state of your game world, and the rendering pipeline. For example, in Civilization, you have the current camera position, a zoom level, and you have the various cities and units of the players. This is the game state, and this is firstly independent of the rendering state. It only tells us where things are, but not yet how to display them. In a first-person shooter, you’d likewise have a camera position (which equals the player position), and then you have the same for a bunch of additional players, you have the positions of all the objects on the map, and so on.</p>
<p>All of this is game state, because it has an influence on the game itself. How it’s rendered is a question detached from this. Whether, say, an AI player defeats some unit of yours in civilization is not a question the renderer has to handle. The renderer should just <em>display</em> that state. The renderer of course also has some state, but that only includes settings such as the rendering resolution, texture size, etc.</p>
<p>Now, what does this have to do with rendering a simple indicator? Quite a lot, it turns out. Whenever you have <em>any</em> form of dynamic rendering, you’ll want to maintain actually two states: One for whatever it is you are rendering, and then one for the actual rendering process. This principle applies to both entire games, and simple animations like the indicator we want to implement. How do I know? Well, because at first I thought “Hah, it’s just a simple animation, how difficult can it be?,” and as you already know, I was very, very wrong.</p>
<p>So, we first create a class to manage the non-rendering state, and second a class for all the WebGL:</p>
<pre><code class="language-ts">class WebGLEngine {
    constructor (private readonly gl: WebGL2RenderingContext) {}
}

class IrisIndicator {
    private engine: WebGLEngine
    constructor (gl: WebGL2RenderingContext) {
        this.engine = new WebGLEngine(gl)
    }
}
</code></pre>
<blockquote>
<p>Note that I’ll be using TypeScript throughout the article series. I did start writing everything in JavaScript (because I don’t like having to set up a build pipeline for small projects), but it really turned out to be a big mistake as I was writing the code.</p>
</blockquote>
<p>This looks like a very simple setup. We have one class, <code>IrisIndicator</code> that will contain all the code for managing our state, and then a class <code>WebGLEngine</code>, where I want to centralize all the nitty-gritty of the WebGL code. During rendering, the indicator will derive some data from its state necessary to produce a frame to the engine, which in turn maintains whatever objects and buffers and what not else to actually do the rendering.</p>
<p>One thing to note is that there is an explicit hierarchy between these two classes: Each <code>IrisIndicator</code> contains one <code>WebGLEngine</code>. The reason is that the WebGL engine should simply draw whatever state the indicator has, but never vice versa. The rendering engine is thus dependent on whatever the Iris indicator contains. By encapsulating the renderer inside the indicator, we ensure that anyone actually instantiating an indicator will, in the future, only have to tell it how many segments to render. Everything will be abstracted away in the rendering logic that we are going to write across this series.</p>
<p>Further, note that all of what I am presenting you in these articles is the <em>end</em> result. I’m skipping over all my unlucky experiments and bad decisions.</p>
<h3>Setting up WebGL</h3>
<p>With this first architectural decision out of the way, we can do the minimal setup we need for WebGL. And that minimal setup is actually quite… minimal. There is only one requirement for any WebGL rendering process: It requires a program, and such a program consists of one vertex shader and one fragment shader. That’s it. Everything else is not necessary (at least not if you don’t want to produce any output).</p>
<p>So let’s do so.</p>
<p>First, we need a <em>vertex shader</em>. An (almost) minimal version of such a vertex shader could look like this:</p>
<pre><code class="language-glsl">#version 300 es

in vec2 a_position;

uniform vec2 u_resolution;

out vec2 v_texcoord;

void main () {
	vec2 normalized = transformed / u_resolution;
  vec2 scaled = normalized * 2.0;
  vec2 centered = scaled - 1.0;
  vec2 clipSpacePx = centered * vec2(1, -1); // Flip y-coordinates
	gl_Position =  vec4(clipSpacePx, 0, 1);
  v_texcoord = clipSpacePx * 0.5 + 0.5;
}
</code></pre>
<p>Next, a <em>fragment shader</em>. A very minimal version of that can look like this:</p>
<pre><code class="language-glsl">#version 300 es

precision highp float;

in vec2 v_texcoord;

uniform sampler2D u_texture;

out vec4 fragColor;

void main () {
  fragColor = texture(u_texture, v_texcoord);
}
</code></pre>
<p>Finally, we have to, quite literally, <em>compile</em> these two shaders onto the GPU. This can be done in a few steps:</p>
<pre><code class="language-typescript">function compileShader (gl: WebGL2RenderingContext, type: 'vertex'|'fragment', source: string): WebGLShader {
  const shader = gl.createShader(type === 'vertex' ? gl.VERTEX_SHADER : gl.FRAGMENT_SHADER)

  if (shader === null) {
    throw new Error('Could not create shader from WebGL Context!')
  }

  gl.shaderSource(shader, source)
  gl.compileShader(shader)
  const success = gl.getShaderParameter(shader, gl.COMPILE_STATUS)

  if (success) {
    return shader
  }

  const msg = `Error compiling &quot;${type}&quot; shader: ${gl.getShaderInfoLog(shader)}`
  gl.deleteShader(shader)
  throw new Error(msg)
}
</code></pre>
<p>This is a utility function that I’ve adapted from WebGLFundamentals and essentially, what this does is compile one of the two shaders and returns it. Some things to note:</p>
<ol>
<li>You can only create two types of shaders, vertex, and fragment shaders. I’m passing in a string literal for that, just because that is easier to handle. WebGL inherits from OpenGL its extensive use of flags and constants, and the types that TypeScript provides for WebGL are a bit lacking. You could absolutely pass the type directly.</li>
<li>Next, the operations with WebGL can sometimes seem a bit redundant. Why can’t we directly create a fully compiled shader by calling a single function, passing it both the type and source code? Well, because you can actually provide new source code on the fly to a shader and then re-compile it. Do we need this with such a simple application? Absolutely not. But as soon as we enter complex game territory, this ability of WebGL to re-use shader objects may come in handy. I don’t know because I’m not keen on developing an entire engine.</li>
<li>Because WebGL heavily inherits from OpenGL, it doesn’t make use of the JavaScript way™ to throw errors. Rather, you’ll have to implement modern error handling yourself by calling a function that checks some result, and throw an error yourself. Also, you will have to call <code>getShaderInfoLog</code> <em>before</em> you delete the shader, because otherwise the error will also get deleted, so you can’t throw an error in a single line.</li>
</ol>
<p>Because there is a bit of logic involved, it makes sense to create a dedicated utility function for that. Once we have the shaders at hand, we can create a program out of them. This looks almost identical to the shader compiling, with the small difference that, while you <em>compile</em> a shader, you <em>link</em> a program. That’s simply some system programming terminology and not terribly relevant for us here. Again, the function is courtesy of WebGLFundamentals and I adapted it slightly.</p>
<pre><code class="language-typescript">function compileProgram (gl: WebGL2RenderingContext, vertexShader: WebGLShader, fragmentShader: WebGLShader): WebGLProgram {
  const program = gl.createProgram()
  gl.attachShader(program, vertexShader)
  gl.attachShader(program, fragmentShader)
  gl.linkProgram(program)

  const success = gl.getProgramParameter(program, gl.LINK_STATUS)
  if (success) {
    return program
  }

  const msg = `Could not link program: ${gl.getProgramInfoLog(program)}`
  gl.deleteProgram(program)
  throw new Error(msg)
}
</code></pre>
<p>Now, we can save that program for reference, and tell WebGL to use this little program of ours:</p>
<pre><code class="language-typescript">const vertexShader = compileShader(this.gl, 'vertex', vertexShaderSource)
const fragmentShader = compileShader(this.gl, 'fragment', fragmentShaderSource)
this.program = compileProgram(this.gl, vertexShader, fragmentShader)
gl.useProgram(this.program)
</code></pre>
<p>And that’s it! At this point, we have told WebGL that, whenever we want to draw something, it should use our program, which consists of our two shaders. However, as you may see now, this is still quite a lot of code, and we <em>still</em> haven’t drawn anything onto the screen. Also, these shaders aren’t written in JavaScript (or TypeScript, for that matter), but rather in GLSL, the OpenGL Shader Language. And they do quite a bit of the work. But, what is even more detrimental for an easy understanding of what they do is that OpenGL <em>also</em> does quite a lot of work in between.</p>
<p>So next, let’s talk a bit about the rendering pipeline and what happens when you actually draw something onto the screen. This knowledge will come in handy as we continue.</p>
<h2>Understanding OpenGL’s Rendering Pipeline, Part One</h2>
<p>In order to draw something, the following steps have to happen:</p>
<ol>
<li>Your engine needs to calculate the positions of all your objects in the 3D-world. Then, it passes those positions to the rendering engine.</li>
<li>The rendering engine then passes these positions to OpenGL, and tells it to render these objects.</li>
<li>OpenGL then calls the vertex shader <em>for each element</em> in the positions that you have passed. The vertex shader is responsible for taking the position of a thing in the world, and transform it into what is known as “Clip space” (which is essentially a coordinate system that ranges from $-1$ to $+1$ in the $x$ and $y$ direction). The vertex shader must return these clip positions. What this shader does is perform a relatively simple, z-transform-style operation: It expects absolute pixel positions, transforms them into the domain $[0; 1]$, scales them to $[0; 2]$, then subtracts $-1.0$ to convert them into $[-1; +1]$, and finally multiplies the coordinate with <code>vec(1, -1)</code> which flips the $y$-axis. The last step is only required for WebGL, because OpenGL usually treats the $y$-axis as incrementing from bottom to top, while HTML canvas elements treats the $y$-axis as incrementing from top to bottom. If you’re not writing OpenGL code for the web, you won’t have to do that.</li>
<li>Then, OpenGL does a trick behind the scenes: It takes the vertex positions produced by the vertex shader, and takes a look at the output which you want to draw the things onto. It then calculates all the pixels that are touched by the given vertex, and runs the fragment shader <em>on each</em>. The fragment shader then has the task to calculate the color of the provided pixel, which it can do in the simplest case by looking up a position on a texture. (We will be actually computing the colors later.)</li>
<li>This color is then what gets applied to the correct pixel in whatever you’re drawing onto.</li>
</ol>
<p>This is quite something to unpack.</p>
<p>First, why do we even need a vertex shader if all it does is transform some position into a coordinate system of $-1$ to $+1$? Can’t you just simply provide all the positions already in the correct coordinate space to begin with? Well, yes, you certainly can. But you shouldn’t. Why? Well, here we can return to our distinction between the general <em>engine</em> and the <em>rendering engine</em>. Remember that the general engine should remember all the positions of objects in your world. But you usually can move the camera around. And that means that <em>all object positions</em> will move accordingly, <em>but only from the perspective of the camera</em>. They don’t actually move in the “game world.”</p>
<p>Now, you could absolutely re-calculate all object positions relative to the camera in JavaScript, and only pass the final positions to the rendering engine, making the vertex shader kind of redundant. However, this is a computationally heavy task, since <em>every vertex</em> has to be moved. What you commonly do in graphics rendering therefore is to have a set of matrices. One matrix is used to <em>transpose</em> every vertex homogeneously. When you move the camera left or right, you’d only update the transposition matrix, which is two numbers. Then you’d provide that matrix to the vertex shader and make use of a convenient property of GPUs, or graphics processing units: These can heavily parallelize tasks. Which means: If you have thousands of positions to transpose, letting the vertex shader do the transposition of each vertex is much more efficient. In JavaScript, you can only transpose one vertex at a time, but the vertex shader can transpose as many vertices as you have compute cores on your GPU. Usually, these are … quite many.</p>
<p>If you have ever seen technical details to a GPU, you’ll probably have seen that these graphics cards nowadays have many, many more computing cores than normal CPUs.<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">3</a></sup> Which means: If you have a GPU with 1,000 cores, and you have a thousand vertices to transpose, your GPU can essentially run the vertex shader once on each core, and it will be done in a single iteration. In addition, you won’t have to update your object positions if only your camera position changes. You only have to provide a single changed matrix to your vertex shader and let your GPU do the work.</p>
<p>There is no strict boundary as to which computations you should be doing in JavaScript, and which to do on the GPU. But if you ever run into a bottleneck, you can probably do some performance benchmarking to figure out what the GPU should be doing, and what your JavaScript code should be doing. But again, we’re only rendering a few simple shapes, so I’m not going to do that.</p>
<p>The other thing to unpack is what happens in between the vertex and fragment shaders, because this is difficult to understand this — you can’t really “see” this in the code. It’s important to understand that the <em>vertex shader</em> actually performs <em>vector graphics</em>. What comes out of the vertex shader is still a set of vectors connected by lines, and these lines can be described by mathematical formulas. What OpenGL then does with these vertices is it <em>rasterizes</em> them.</p>
<p>Rasterization is the process of taking some “pixel-perfect” vector graphics and turning them into … well, pixels. For that, OpenGL will check each individual pixel on whatever you’re drawing onto, and ask: “Does this vertex touch this pixel?” If it does, it will remember that position. Once OpenGL has checked every single pixel, it will then — and <em>only then</em> — start up the fragment shader. The fragment shader then gets a pixel and has the task to calculate a color for that pixel. In other words, the fragment shader will <em>never run</em> for a pixel that is not touched by any vertex. What you see is the vertex shader and the fragment shader, and you see that there is some data being passed around, but the entire work of OpenGL in between these two shaders is hidden from view. I found that very difficult to understand.</p>
<p>Anyways, that is part 1 of understanding the rendering pipeline. (There will be a second and a third part which involve understanding read and draw frame buffers, the back buffer and front buffer, and rendering buffers, but we’ll get to that later.)</p>
<h2>Final Thoughts</h2>
<p>At this point, you should have understood the basic state management and setup of WebGL so that we can now turn to finally drawing things onto the screen. Since we’re already 3,000 words into this single article, I’ll keep the suspension up for the next article, where we will actually draw things. So stay tuned!</p>
<h2>The Full WebGL Series</h2>
<p>Jump directly to an article that piques your interest.</p>
<ol>
<li><a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">A Rabbit Hole Called WebGL</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Setup and the OpenGL Rendering Pipeline</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Drawing Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">Animating Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Computing Colors</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">Post-Processing</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">Adding a Bloom-Filter</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">Implementing Multi-Sample Antialiasing (MSAA)</a></li>
</ol>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>I’m not saying “every child” because I want to avoid the heated discussions about “today’s youth” and what they all can’t do anymore. I could do this as a child, and so should have you.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>Insert some cue to “To make apple pie, you first have to invent the universe” here.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>As a reference, the GeForce RTX 5090 has, <a href="https://www.techpowerup.com/gpu-specs/geforce-rtx-5090.c4216">according to its technical specification</a>, 21,760 shader units. This means that it can run 21,760 shader calculations in parallel.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>A Rabbit Hole Called WebGL</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2026, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl" />
  <id>https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl</id>
  <published>2026-01-09T16:00:00+00:00</published>
  <updated>2026-02-28T19:01:24+00:00</updated>
  <summary type="html"><![CDATA[Over Christmas, I made another little side project. But this time around, it turned out to be a rabbit hole of galactic extend. So read this article on my journey, as I fell down the rabbit hole of OpenGL and graphics rendering.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">
    <![CDATA[<p>I have this tradition. At least, it appears like a tradition, because it happens with frightening regularity. Every one to two years, as Christmas draws close, I get this urge to do something new. In 2017, I released a tiny tool that has turned into one of the go-to solutions for hundreds of thousands of people to write, <a href="https://www.zettlr.com/">Zettlr</a>. In 2019, I wrote my first <a href="https://github.com/nathanlesage/visualizrs">Rust program</a>. In 2021, I did a <a href="https://www.hendrik-erz.de/post/analyse-koalitionsvertrag-2021-spd-grune-fdp-ampel">large-scale analysis of the coalition agreement</a> of the German “Traffic light” government. During the pandemic, I built a bunch of mechanical keyboards (because <em>of course</em> I did). In 2023, I didn’t really do much, but in 2024, I wrote a <a href="https://www.hendrik-erz.de/post/localchat-chat-with-an-ai-assistant-on-your-computer">local LLM application</a>. So okay, it’s not necessarily every year, but if you search this website, you’ll find many tiny projects that I used to distract myself from especially dire stretches in my PhD education.</p>
<p>Now, is it a good use of my time to spend it on some weird technical topics instead of doing political sociology? I emphatically say yes. If you are a knowledge-worker, you need to keep your muscles moving. Even as a researcher, if you do too much of the same thing, you become less of a knowledge-worker, and more of a secretary. Call it an artistic outlet, that just so happens to make my research job <em>so much easier</em>. The last time I had to think about wrong data structures in my analytical code or when running some linear regression was … let’s say a long time ago. The more I know about software and hardware, the more I can actually focus on my research questions when I turn to the next corpus of text data.</p>
<p>But alright, you didn’t click on this article because you wanted to hear me rationalize my questionable life choices, you want to read up on the next rabbit hole I fell into: OpenGL and WebGL. In the following, I want to walk you through the core aspects of what WebGL is and what you can do with it, what I actually did with it, and what the end result was. If you’re not into technical topics (which, given the history of articles here, I actually have to start to doubt at this point), <a href="https://nathanlesage.github.io/iris-indicator/">click here to see the full glory of my recent escapade</a>.</p>
<blockquote>
<p>Note: In the following, I will skip over a lot of basics, and merely explain some interesting bits of the source code (<a href="https://github.com/nathanlesage/iris-indicator">which you can find here</a>), central decisions I took, and things I learned. I don’t verbatim copy the entire code that you can find <a href="https://github.com/nathanlesage/iris-indicator">in the repository</a>. The entire thing is still insanely long and will span multiple articles, even though I try to leave out a lot which you can learn via, e.g., <a href="https://webgl2fundamentals.org/">WebGLFundamentals</a>, which I recommend you read to learn more.</p>
</blockquote>
<h2>Background</h2>
<p>First, some context. At the end of 2024, <a href="https://github.com/Zettlr/Zettlr/issues/5414">someone complained</a> that project exports in my app, Zettlr, were lacking any visual indication of their progress. As a quick primer: Zettlr uses Pandoc to convert Markdown to whichever format you choose. However, especially for long projects, exporting may take quite some time, during which the app looks as if it’s doing nothing. You can still work with the app, and do things, but it’s hard to know when Zettlr is actually done performing the project export. The biggest issue was less finding a way to just <em>tell</em> users which background tasks are currently running, and more how to adequately visualize this to them. For quite a bit of time, my brain kept churning idea after idea in the search for a cool way to visualize “something is happening in the background.” You can read up on many discussions that I’ve had with Artem in the <a href="https://github.com/Zettlr/Zettlr/issues/5414">corresponding issue</a> on the issue tracker.</p>
<p>Indeed, the task was quite massive, because the requirements were so odd:</p>
<ul>
<li>The indication should convey a sense of “something is happening” without actually knowing the precise progress of the task being performed.</li>
<li>It should quickly and easily convey how many tasks are currently running in the background, and what their status is.</li>
<li>It should be so compact that it fits into a toolbar icon.</li>
<li>It should absolutely avoid giving people the impression that something might be stuck.</li>
</ul>
<p>At some point, I had my <em>eureka</em> moment: Why not produce an iris-like visualization? Intuitively, it ticked all the boxes: One can animate the picture to convey a sense of movement without looking like a run-of-the-mill loading spinner that we have collectively come to dread; by coloring its segments, one can include several “things” with different status; and by toggling between an “on”- and “off”-state, one could indicate whether something is running, or not.</p>
<p>I currently suspect that my brain simple mangled together the circular appearance of a loading spinner and the <a href="https://www.3blue1brown.com/">logo of 3Blue1Brown</a> into a contraption that would prove to be insanely difficult to create.</p>
<p>Because I wanted to convey a lot of subtle movement, I opted to choose WebGL to implement it, using all the fanciness of graphics processing. My thinking was as follows: I could combine something I’d have to do at some point anyway with something new to learn. I thought: “How hard can it be to learn some shader programming on the side?”</p>
<p>… well, if you’ve read until here, you know that I was <em>rarely</em> so wrong with my estimate of how long it would take as this time. What started as a “let me hack something together in two Christmas afternoons” ended up being an almost two-week intensive endeavor that has had my partner get <em>real</em> mad at me for spending so much time in front of my computer.</p>
<p>But now, it is done, and I have succeeded in achieving exactly what I had imagined weeks ago. To salvage what I can, I am writing these lines to let you partake in my experience, and maybe you find understanding the guts of GPU-accelerated rendering on the web even intriguing!</p>
<h2>The Result</h2>
<p>The result of all these efforts is a beautiful iris indicator that subtly moves and changes colors based on the various tasks that are running. I took half an hour to code up <a href="https://nathanlesage.github.io/iris-indicator/">a demonstration page so that you can play around with the result</a>.</p>
<p><img src="https://www.hendrik-erz.de/storage/app/media/blog/webgl-series/iris_indicator_screenshot.png" alt="A screenshot of the Iris Indicator demo page" title="A screenshot of the Iris Indicator demo page" /></p>
<p>On the page, there are four sections: Some settings, configuration for the segments, a frame counter, and the actual animation below that.</p>
<p>Let me guide you through the settings first:</p>
<ul>
<li><strong>Seconds per rotation</strong>: This setting sets how long it takes for the indicator to rotate once around. By default it is set to 120 seconds, so two minutes, but you can turn it down to increase its speed. The minimum setting is 10 seconds which is quite fast.</li>
<li><strong>Ray movement speed</strong>: This setting determines how fast the individual rays will increase and shrink in size. It is pre-set to five seconds for one full movement, but you can turn it down to increase their speed. The minimum is 100ms, which is stupidly fast.</li>
<li><strong>Enable MSAA</strong>: This enables or disabled multi-sample antialiasing. If disabled, the animation can look very rugged and pixelated.</li>
<li><strong>Enable Bloom Effect</strong>: This setting enables or disables the bloom effect which makes the entire indicator “glow.” This can actually reduce the performance of the animation quite a bit, but it also has a great visual impact.</li>
<li><strong>Bloom intensity</strong>: This effectively allows you to determine how much blurring will be applied to the image. It is preset to 2×, which is a good default. You can reduce it to 1× which will make the effect more subtle. A setting of 8× may be a bit much, but I decided to leave it in since I feel it is instructive.</li>
<li><strong>Rendering resolution</strong>: This setting determines how detailed the resolution is. It is preset with whatever device pixel ratio your display has. If you’re opening the website on a modern phone or on a MacBook, it will probably be preset to 2×, but on other displays, it will be 1×. It has a moderate performance impact.</li>
<li><strong>Segment adjustment step duration</strong>: This setting determines how fast the segment colors adjust when you change the segment counts in the next section.</li>
</ul>
<p>The next section allows you to determine the segments that will be displayed. As a reminder: The whole intention of this project was to visualize the status of running tasks, which might be successful, unsuccessful, or still en route. You have four segments available, and can determine how many tasks are in each segment, alongside their color. The colors are hard-coded because this way I can ensure that they all fit and blend together well.</p>
<p>By default, the demonstration page will auto-simulate changes to the segments so that you don’t have to click around. When the simulation is active it will, each second, determine what to do. There is a 30% chance each that one of the first three segments will be incremented by one. Further, there is a 10% chance that the simulation will reset everything to zero and start again.</p>
<p>The last section includes settings for the frame rate. The frame rate simply means how often the entire animation will be re-drawn (hence, frames-per-second). At the top, it displays the current frame rate. The frame rate is bound to your display, so on a MacBook (which has a refresh rate of 120 Hz), the frame rate will be at most 120 frames per second. On my secondary display, the frame rate is 75 Hz.</p>
<p>By default, I have implemented a frame limit of at most 30 frames per second. This ensures that the animation is still smooth without being too demanding on your computer or phone. However, you can change the frame rate to, e.g., 60 fps. This will render the animation twice as frequently. Especially if you turn the rotation speed to the max, you actually want to increase the frame limit, because on 30 frames per second, it can indeed look very stuttery.</p>
<p>Feel free to play around with the settings to see how they change the animation. Again, you can also go through <a href="https://github.com/nathanlesage/iris-indicator">the source code of the animation</a> to learn how it works.</p>
<h2>About This Article Series</h2>
<p>Over the next three months, I will publish one part per week on how I finally managed to achieve this feat. The logic behind it is very complex, and it takes a lot of research to understand how to achieve the various effects. The articles will be as follows:</p>
<h3>Setup</h3>
<p>In the next article, I will introduce you to WebGL, OpenGL, and how to set everything up to actually start doing things with WebGL. I will talk about the basic architectural decisions I took, and how code can be properly organized. I will also introduce you to OpenGL’s rendering pipeline, and how it works.</p>
<h3>Drawing Things</h3>
<p>In article three, I will guide you to drawing the rays that make up the iris. You will learn about how to provide data to OpenGL, and how the drawing actually works.</p>
<h3>Animation</h3>
<p>In the fourth installment, I will talk through how to add two of the three animations that make up the iris: rotation, and the movement of the rays. This article almost exclusively focuses on JavaScript, and contains minimal changes to the shaders, because movement is mostly a thing of JavaScript.</p>
<h3>Computing Colors</h3>
<p>In article five, I will introduce you to the algorithm I designed to both color the segments of the iris according to the number of running tasks, i.e., the main goal of the entire endeavor. I will also explain the final, third animation that the indicator includes: animating the colors of the iris.</p>
<h3>Enabling Post-Processing</h3>
<p>This article will be more in-depth and explain another big part of OpenGL’s rendering pipeline. It explains how to enable a renderer to perform post-processing. It also adds one post-processing step: tone-mapping.</p>
<h3>Adding a Bloom-Filter</h3>
<p>Article seven focuses on the centerpiece of the animation, the one big part that would not have been possible using other techniques such as SVG. I explain how to add a bloom post-processing step in between the ray rendering and the output, and how bloom actually works. (It’s surprisingly simple!)</p>
<h3>Adding Multi-Sample Antialiasing</h3>
<p>In the eight and final practical article in this series, I explain MSAA a bit more in detail, why it sometimes works, and sometimes doesn’t, and how to actually add it to the animation. I also explain the final piece of the OpenGL Rendering pipeline that you probably need to know to understand what is happening.</p>
<h2>Concluding Thoughts</h2>
<p>When I set out to create this animation, I imagined it would take me maybe two days — nothing to write home about (literally). However, I was wrong, and, to the contrary, we are now looking towards an astonishing nine (!) articles just to explain what has happened here.</p>
<p>I found the journey extremely rewarding, even though it ate up my winter holidays. I want to let you partake in what I learned, and I hope you stick along for the ride.</p>
<p>So, please, come back next Friday for part two: Setting everything up!</p>
<h2>The Full WebGL Series</h2>
<p>Jump directly to an article that piques your interest.</p>
<ol>
<li><a href="https://www.hendrik-erz.de/post/a-rabbit-hole-called-webgl">A Rabbit Hole Called WebGL</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-2-setup-and-the-opengl-rendering-pipeline">Setup and the OpenGL Rendering Pipeline</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-3-drawing-things">Drawing Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-4-animating-things">Animating Things</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-5-computing-colors">Computing Colors</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-6-post-processing">Post-Processing</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-7-adding-a-bloom-filter">Adding a Bloom-Filter</a></li>
<li><a href="https://www.hendrik-erz.de/post/webgl-series-part-8-implementing-multi-sample-antialiasing-msaa">Implementing Multi-Sample Antialiasing (MSAA)</a></li>
</ol>]]>
  </content>
</entry>
<entry>
  <title>Vibe Coding: The Final Form of Hyper-Individualism</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/vibe-coding-the-final-form-of-hyper-individualism" />
  <id>https://www.hendrik-erz.de/post/vibe-coding-the-final-form-of-hyper-individualism</id>
  <published>2025-11-29T13:00:00+00:00</published>
  <updated>2026-02-04T10:14:17+00:00</updated>
  <summary type="html"><![CDATA[A few days ago, I had to deal with the first &quot;vibe coded&quot; PR to my software. In this article, I reflect on this encounter, and analyze the social habitus of the &quot;vibe coder.&quot; I conceptualize &quot;vibe coding&quot;—inexperienced users generating complex code via AI tools—as the final manifestation of hyper-individualism. Drawing on sociological frameworks, I argue that this practice disrupts open-source norms by producing unreviewable, high-impact PRs that ignore community standards and technical context. While motivated, their output reflects a &quot;tragedy of the lone producer&quot; who sacrifices meaningful engagement for isolated productivity. This trend can threaten software integrity and community health.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/vibe-coding-the-final-form-of-hyper-individualism">
    <![CDATA[<p>It finally happened to me. About two weeks ago, someone opened <a href="https://github.com/Zettlr/Zettlr/pull/6004">the first vibe-coded Pull Request</a> to the Zettlr repository. What has cast its shadows long ago in <a href="https://www.reddit.com/r/ProgrammerHumor/comments/1oyhpez/vibecoding/">memes</a> on <a href="https://www.reddit.com/r/ProgrammerHumor/comments/1oqyki5/vibecodingreplacesdevelopers/">the</a> more <a href="https://www.reddit.com/r/ProgrammerHumor/comments/1osqk2y/whyidonotvibecode/">programming-affiliated</a> parts of <a href="https://www.reddit.com/r/ProgrammerHumor/comments/1ot13pa/thissubusedtobefunny/">Reddit</a> has reached my own work. I’m not entirely sure if I should see it as a badge of honor that my app is well-regarded enough to attract this kind of person, or see it for the frustration it causes. What I do know is that I was not prepared for this first encounter with the new reality of programming.</p>
<p>But this article is not about vibe coding. Being a sociologist, I don’t want to reiterate the <em>Unkenrufe</em><sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> of a vibe-coded future and why it is bad. Instead, I want to tell a story about the vibe coder as a social phenomenon. One that focuses on the <em>social dynamics</em> of what <em>makes</em> vibe coder, and what happens when vibe coders interact with the outside world.</p>
<p>If you want to learn more about how vibe coding is inefficient, dangerous, and not going to replace programmers anytime soon, a quick Google search has got you covered. Here, I want to rather focus on the <em>individual who performs</em> vibe coding. A form of cultural thing that has a profound social component, and one that is indeed quite in line with how sociology has conceived society in the past few decades.</p>
<p>I believe that vibe coding is (one of) the final form(s) of the hyper-individualism that social scientists have been concerned with ever since the rise of neoliberalism in the 1980s. After having experienced it first-hand, I believe that vibe coding is a simple, natural extension of existing trends in which the individual is being made productive through a process entirely disconnected from society. A vibe coder, essentially, is an extremely productive individual whose productivity has no tangible impact on the world, and whose action can feel meaningful only in a reality in which there is no society, only individuals. Vibe coding is what David Graeber once called a “bullshit job.”</p>
<h2>What is Vibe Coding?</h2>
<p>In case this trend has completely eluded you, let me begin this article with a quick summary of what vibe coding is. Vibe coding is a phenomenon in which a person with little or no experience with programming, but with a more or less hazy vision of some tool sets out to make that tool a reality. Since they have little experience with coding, they rely on chatbots — generative AI — to write the necessary code for that.</p>
<p>Vibe coding, therefore, only occurred after capable GPT models were being released by OpenAI and other companies. The first GPT models were generalists, since nobody really found an appropriate use-case for them yet. They were impressive in their own right, simply because they generated text really well in response to user queries.</p>
<p>It took about three months after the launch of ChatGPT until the first, good use-cases slowly emerged from the mountains of playful interactions with the model. <a href="https://www.hendrik-erz.de/post/how-to-use-chatgpt-productively">I wrote about them here</a>. Since then, companies with large GPT models running on expensive servers have tried to monetize these models to the best of their abilities. People realized that some models were <em>really good</em> at producing syntactically correct code. From there, it was only natural to start producing models that were tuned to help explicitly with writing code. Microsoft’s Copilot, which started out as OpenAI’s Codex model (ChatGPT’s little brother) is probably the most famous one.</p>
<p>With a few additional improvements such as giving these models information on the codebase more generally, or allowing it to directly modify files, it became easier to write some instructions using natural language, and wait for the model to generate corresponding code. The essential structure of GPT models has not changed throughout the past three years. But what <em>has</em> changed is how these models are integrated, and the tooling that surrounds them. Microsoft’s Copilot model is now deeply integrated in VS Code. Anthropic’s Claude is being lauded as a highly capable model to generate code. And, to some degree, utilizing GPT models to produce boilerplate code, or reason about code is really helpful. I use it myself quite often.</p>
<p>But vibe coding is not about programmers using code-focused GPT models to help in their own work. Vibe coding is about the empowerment of individuals with big ambitions and little experience opening up the gates to producing software. In a positive reading, the arrival of GPT models on the world stage has democratized access to programming. But in a — in my humble opinion — more realistic reading, GPT models encourage people with little experience to contribute in the software development world that is guarded by rules, norms, and traditions, disrupting well-functioning workflows and wreaking havoc.</p>
<p>In essence, vibe coding is a practice of inexperienced people writing code so complex that it appears as if it is worthy of inclusion in some of the world’s important pieces of software. And this is what I would like to focus on today.</p>
<h2>The Encounter</h2>
<p>Let me begin with a subjectively faithful reproduction of my encounter with above-mentioned pull request. One thing that some users of Zettlr have asked for in the past is a better support for right-to-left (RTL) languages, such as Hebrew or Arabic. Since I am a boring European only capable of writing Latin letters, I am hesitant to address this issue, because I don’t want to make it worse than it already is. I acknowledge this as a limitation. But this also means that I am very excited to see someone with more experience with RTL languages jump in and help out.</p>
<p>And so, when I read the title of this PR, I was excited: Finally, someone has started taking this issue upon themselves! I clicked on the link, opened the PR summary page and … immediately saw the hallmark indicator of vibe coding: +16,555 and −1,510 lines of code. To give you some context (please notice what I am doing here, it’s going to become important later), usually PRs aim to fix or add a single thing, and they usually work with maybe 500 changed lines <em>at most</em>. Usually, it’s more in the range of a dozen lines of code. Seeing more than a thousand, let alone more than <em>ten thousand</em> changed lines is very unusual and typically means something is very, very wrong.</p>
<p>A PR is not like a paper review. Each PR needs to be reviewed, but because code is very sensitive to typos and logical errors, a review of a PR involves literally reading every single <em>character</em>. It’s not like when you receive comments, and fully rewrite a paper before re-submitting it.</p>
<p>But I felt like dismissing this PR outright would be unfair. After all, there is the reasonable assumption that vibe coding is not just “idiots who write dangerous code,” but it can also be someone who is really motivated and is trying to use whatever help they can get in order to help. So my mindset was extremely committed to trying my best to help this guy shape the PR until it is actually reviewable and, hopefully, able to be merged into the codebase.</p>
<p>I started reading the (obviously AI generated) PR description to get a sense of what I was dealing with. It starts off very promising:</p>
<blockquote>
<p>This PR introduces comprehensive Arabic language support for Zettlr, demonstrating what it takes to make Zettlr truly accessible to Arabic-speaking users.</p>
</blockquote>
<p>What’s not to love! Exactly what I needed! I continued reading.</p>
<blockquote>
<p>This fork addresses this gap by providing a complete implementation that can serve as a foundation for upstream RTL support.</p>
</blockquote>
<p>This wording sounds odd. Why would he speak about a fork, talk about a “complete implementation” and then “serve as a foundation for”? This sounded odd. But, undeterred, I continued.</p>
<blockquote>
<p><strong>Note</strong>: This is a <strong>proof-of-concept fork</strong> meant to:</p>
<ul>
<li>Demonstrate the feasibility of comprehensive RTL support in Zettlr</li>
<li>Provide a reference implementation for upstream integration</li>
<li>Showcase innovative solutions like the dual-cursor system for connected scripts</li>
<li>Serve Arabic-speaking users immediately while upstream considers RTL support</li>
</ul>
</blockquote>
<p><em>Wait, <strong>what</strong>?</em> I’m sorry, but first, nobody has to demonstrate the feasibility of that; we’ve done this over <em>years</em> of debating this issue. Second, “reference implementations” are nothing for a PR; those only exist to help adoption of some protocol by various programming languages. Third, nobody wants a “showcasing [of] innovative solutions like the dual-cursor system.” Your PR either fixes something, or it doesn’t. And lastly, “while upstream considers RTL support” — we don’t consider it, we want it.</p>
<p>All of this was more than bizarre. But of course, I wasn’t even halfway done with the description. It goes on to show some screenshots, explain some more things, confabulate obviously wrong things (“Maintains full compatibility with upstream Zettlr”? GitHub already reported about a thousand merge conflicts. “No breaking changes to existing functionality”? My, my, you have <em>literally destroyed the entire existing tooling on the repository</em>.), and so on.</p>
<p><em>But I was still undeterred.</em> So I started to look at the files. Besides a whole lot of AI-generated context documentation that appears to be primarily intended to keep the GPT-model aligned with the goals while it generated more code, the PR appears to have rewritten entirely unrelated features and demonstrated a complete lack of awareness for how the repository works.</p>
<p>Meanwhile, the person continued to add more commits, change more files, and dump more stuff into this already unreviewable mess of a PR. So I asked him to stop committing and give me a second while I was trying to come up with the words I was lacking to describe what I just saw. But then I found something just plain insulting: Apparently he decided to design a completely new icon for no reason. That hurt.</p>
<p>But anyways: This is likely an inexperienced user, and if he really is committed, I surely can talk reason into him.</p>
<p>And so I started typing. I began with an elaborate first response on why this PR is not mergeable, and tried to explain to him the steps necessary to bring this into a somewhat reviewable state. After a few hours, I have seen that he has added more commits, meaning that he clearly has seen my message, but not yet responded. So I told him to stop producing more code, and instead respond to my message. Which he did. Kind of. Most of his following response was a rehashing of the previous explanations in the original description, bundled with a request to me to read through his changes. To me, it implied that he wanted me to “pick and choose” things of what he did that I liked. But this is not how PRs, or any of this, really, work.</p>
<p>So I try again, this time with actionable items, in the hopes that, instead if piling up more code, he’d start deleting some clearly unrelated stuff. He responds, requesting that I book some meeting with him to discuss this. <em>Excuse me</em>? You want something from me, and I am not going to “book a meeting time” with you just to tell you once more that I will not be spending countless hours scouring the depths of whatever Claude has generated there. But he doubles down. He really insists I read through it, and is unwilling to change anything on his part. From his perspective, he has done his work, and now it’s on me to respond.</p>
<p>I realized that nothing would likely change his perspective, and so I closed the PR, and locked the discussion.</p>
<p>A few days later, after having discussed this encounter with friends and colleagues, and doubting whether I made the correct decision, a friend sent me a link to <a href="https://github.com/ocaml/ocaml/pull/14369">this PR on the OCaml compiler</a>. The PR, and the ensuing discussion, looked eerily familiar. It was also a vibe coded PR changing humongous amounts of code, oftentimes unnecessarily, and a complete unwillingness of the creator of the PR to engage in a meaningful discussion. The maintainers of OCaml tried their best to steer the PR and its creator in a more productive direction, to no avail.</p>
<h2>Software Development as a Cultural Practice</h2>
<p>After having stared at that disaster of an interaction for a few days like the <em>angelus novus</em> stared at history, I tried to categorize and analyze what has just happened. There are two parts to an answer. The first is that software development is really a cultural practice, like many others, such as science, lawmaking, or creating a product. There are certain ways things are done. You don’t get to be a researcher just because you feel like it, you become one through years of training. You also don’t get to just start making laws unless you pass through a few stages of becoming an elected parliamentarian. And, before any new product is launched, there is a checklist you have to go through, lest the launch becomes a disaster. With software development, it’s the same thing, and Open Source is no exception.</p>
<p>Software development is surrounded by unwritten rules and cultural norms. When you start contributing to a repository, you first tread carefully. After all, you want to literally change something someone else has done. And thus, you would want to treat this endeavor with respect. First contributions to any Open Source project thus often comprise opening issues. Sometimes, if you feel confident, you can directly open a PR and fix something. But you usually don’t add some new feature at first. By doing PRs, you start interacting with the people behind the project, the maintainers whose job is to — well — maintain the program.</p>
<p>Every software project is different. Some people want you to discuss everything first on their issue tracker, or even on their Discord. Other projects have extremely rigid policies surrounding even the creation of issues, and others again are much more lenient. But while exploring the repository typically gives you some hints, there remains some insecurity, and so nothing can replace the first encounter with the maintainers.</p>
<p>Another, more explicit rule is the purpose of pull requests. Pull requests are intended to add functionality to some repository. The idea is that you make a copy of the software, set everything up on your own computer, change the thing you want to change, and ensure it works as intended. And, once it does, you propose your changes by creating a PR. That is the sole purpose of a PR: Moving some code you wrote to fix something into the general code base (“upstream”) from which you copied the code.</p>
<p>Third, one norm that is usually heavily enforced is that the maintainers are doing all of their work in their free time, and as such are not required to engage with anyone, including you. The common understanding is that maintaining an entire software project is a huge amount of work, and if you want to contribute to a project, you’ll have to play by the maintainer’s rules and shall make no demands. While this might sound as if it diametrically opposes the spirit of <em>Open</em> Source, this is not the case. If anyone had unrestricted access to software code, all kinds of dangerous things could happen (looking at you, <em>Sha1-Hulud</em>); starting from the software breaking, and ending with serious security flaws. By enforcing a strict ruleset over what gets permitted into the code base, and what not, maintainers ensure that whatever they maintain remains safe for users to use. Once maintainers stop understanding every single inch of the code base, this is no longer the case. And this is why they are essentially forced to enforce very strict rules about how the software is built.</p>
<p>Fourth, this leads to a rule that many people often forget, but that is quite central to the social function of software maintainers: they carry the sole responsibility. If the software breaks, they are the ones to take the responsibility of fixing it. Nobody will demand that from some random, unknown contributor who may have introduced the bug in the first place. The maintainers are the public facing people behind the code; the ones people usually know. In order to guarantee that, and in order to ensure they can indeed carry that responsibility, they have to not just understand the entire code base in and out, but in addition understand every single character that is being changed by contributions in PRs.</p>
<p>Vibe coded PRs break all of these rules and norms at once.</p>
<h2>Vibe Coders Disrupt Open Source Development</h2>
<p>By definition, vibe coders are those people who not just use GPT models as tools to help generate some code, but people who use GPT models to write the code for them. They rarely understand much of what these models do, and primarily judge the code quality by its output, that is: does the software break when they themselves test it out? No one in their sane mind would change over ten thousand lines of code before realizing that they should probably just change one thing at a time, lest they break something elsewhere.</p>
<p>This sidesteps most of the rules that have developed around Open Source Development. And this is a problem. Because while institutionalism tells us that many rules are likely very inefficient, they serve a crucial role. Rules and regulations safeguard against dangers and issues. The idea that a PR should change only one thing at a time is because this is the most efficient way for some outside contributor and maintainer to communicate. This way they communicate a single fix (something the contributor wants), and confirm that it is also safe to merge (something the maintainer wants). When you change ten thousand lines of code, this cannot be guaranteed anymore.</p>
<p>Also, vibe coded PRs are full of obvious mistakes once you take into account the broader context in which the software exists. For example, every single vibe coded PR I’ve seen so far rewrites or changes part of the tooling around a software; often to fix errors which the tooling produces because something is deeply wrong with the code. Do not misunderstand me: PRs that change tooling around a software are perfectly fine. But changes to the tooling of a software inside a PR that aims at fixing something else are unacceptable. And I have seen many times that vibe coded contributions have changed some feature <em>and</em> some of the safeguards, because the safeguards were working correctly and actually flagged something wrong with the code. And, GPT models being GPT models, instead of fixing <em>their own</em> code, they “fix” the errors by removing the safeguards that were flagging the dangerous code in the first place.</p>
<p>I believe this is sufficient to understand how dangerous it is to ignore the rules of the game of contributing to Open Source software. A friend gave me the correct term for this: vibe coding is the <em>performative</em> act of working without actually producing anything.</p>
<p>But there’s more to it.</p>
<h2>The Tragedy of the Vibe Coder</h2>
<p>Vibe coding is not intentionally malicious. No vibe coder wants to hurt anybody or cause problems with the software they contribute to. They genuinely want to improve the software. And they do it the only way they can: by letting a GPT model generate the code, instead of understanding the software first, and trying to slowly move towards fixing the bug. And I don’t believe that it’s their fault.</p>
<p>I believe that a vibe coder is a phenotypical example of an individual that is fully enmeshed in the current state of individualized digital capitalism. A vibe coder looks to me like an individual that is just trying to produce something; have an impact in the world. This is something that gets hammered into our brains every day. But, at the same time, we also are stressed, chased by the fear of becoming unemployed, of being a <em>burden</em> to society. So we can’t <em>think</em>, do <em>unproductive</em> work, such as engaging with a community of people.</p>
<p>To an extent, the vibe coder is the dream worker that corporations have wanted for so many years. But because we couldn’t automate everything yet, we had to push employees through school, college, trainings, and workshops. We have to spend a non-trivial amount of our lives just learning. GPT models and the current wave of AI models has finally opened up a possibility for employees to skip all of that. Instead of having to learn <em>how</em> to do things, they just need to imagine a product, and describe it to some generative AI model which then proceeds to generate it.</p>
<p>And the worst is: Initially, it indeed works. AI models can produce working code. And that is what prompts so many companies – but also students whom I teach – to uncritically adopt AI to a far greater extent than what is safe.</p>
<p>A vibe coder is an individual who has eliminated all the seemingly “unproductive” work around work: the training, the understanding, the listening. Vibe coding is what the protestant work ethic looks like when pushed to its logical end. An individual who takes nothing as input, and produces, all day long. Code, apps, entire services.</p>
<p>And this is what makes a vibe coder a very lonely, isolated individual. Because it turns out, a lot of the seemingly unproductive work that we all do every day – learning things, reading, contemplating, and care work – is what we require to live fulfilling lives. And this is also what can enable us to actually make an impact. But it doesn’t generate money. And so it has been branded as “unproductive” and has a negative connotation to it.</p>
<p>The person who opened the PR on my repository has never talked to either me or anyone in the community before. He likely read through some discussions, identified the need for RTL-support, and then just prompted away. He didn’t comment on issues, asked people, engaged with the community. And now, his first interaction, by being so overbearing, has left a mark on both him and me. And all his relentless productivity was for nothing. (And Zettlr still has no proper RTL-support.)</p>
<h2>Conclusion</h2>
<p>What should we do with this new and coming trend? I really don’t see this ending before the AI bubble bursts. It’s just too fitting. Vibe coding is the perfect work in an age of cut through capitalism that needs to produce at breakneck speed. And I can’t blame the people who do it. But, personally, it makes me sad.</p>
<p>What vibe coders certainly have is motivation and eagerness to do something great. But because they were pushed through the same capitalist pipeline as all of us, and are probably even more afraid of plummeting into meaninglessness, they utilize this energy for meaningless work. I know many people probably have to look this up, but an apt metaphor that was constantly on my mind while writing this article is that vibe coding seems to me like writing mountains upon mountains of code, but echoing it all to <code>/dev/null</code>.</p>
<p>I surely hope that the AI bubble will burst sooner rather than later. Not just for all the other benefits that will come from it, but also because it will free vibe coders from the shackles of having to produce for production’s sake. That – even in capitalist conditions – there is no way for them but to meaningfully engage with communities before producing. What they will produce then will have a far, far greater impact than any amount of vibe coded PR will ever have.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Totally unrelated, but I couldn’t think of an appropriate translation. I later looked this up, because from my childhood I only remember that “Unkenrufe” means “Someone says something will become disastrous.” Turns out, “Unke” is the German name for a toxic frog from Europe and Asia, the fire-bellied toad. Imagine that great doom is foreshadowed by ominous croaking.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>Shutdown on Capitol Hill: An Afterword to my PhD Thesis</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/shutdown-on-capitol-hill-an-afterword-to-my-phd-thesis" />
  <id>https://www.hendrik-erz.de/post/shutdown-on-capitol-hill-an-afterword-to-my-phd-thesis</id>
  <published>2025-11-01T21:00:00+00:00</published>
  <updated>2025-11-01T21:47:12+00:00</updated>
  <summary type="html"><![CDATA[After five years, we know a little bit more about the lawmaking processes in U.S. Congress. At the same time, Congress is barely legislating anymore, because the government is currently under shutdown. What remains from U.S. democracy as we know it after nine months of Trump? An afterword to my PhD thesis.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/shutdown-on-capitol-hill-an-afterword-to-my-phd-thesis">
    <![CDATA[<p>Academic dissertations are a very peculiar genre of text. Written to be primarily read only by yourself, your committee, and some members of your own family who are extraordinarily proud of your achievement. This also implies that they commonly lack some features of ordinary scientific books, such as an afterword.</p>
<p>I do believe that an afterword to my dissertation is in order, and since it is uncommon to add one to a PhD thesis, I am writing one here, as an article. I do not believe that my thesis itself merits an afterword, but current circumstances do.</p>
<hr />
<p>I embarked on my PhD journey to the day exactly five years ago, on November 1st, 2020. Just two days into my PhD, Joe Biden was elected President of the United States, and the reign of Donald Trump was coming to an end.</p>
<p>However, nobody believed that it was going to be a smooth transition. Too often, Trump had threatened to just continue living in the White House, half jokingly, half seriously. And indeed, on January 6th, 2021, in an unprecedented act of domestic terrorism, a mob of several hundred Trump-supporters invaded the U.S. Capitol in an attempt to bring U.S. democracy to an abrupt end. Only a few capitol police officers stood between the angry mob and the machine room of democracy. Representatives had to be evacuated via tunnels below the building.</p>
<p>After a few, gruesome hours, the horror ended, and two weeks later, Biden was inaugurated as planned. For the next four years, U.S. democracy was “back on track.”</p>
<p>For me, this was a sign of hope. I decided to dedicate my thesis to U.S. Congress. To understand how U.S. democracy works beyond the president. To understand how powerful Congress exactly is, and how it performs its role as the machine room of democracy. For the next four years, I spent countless hours reading Congressional speeches, extracting information from it, and understanding how laws are being made. I was convinced that understanding Congress was a useful and fruitful endeavor to help researchers and the public at large have a better understanding of the mechanisms of lawmaking.</p>
<p>When the next election cycle started, and Joe Biden announced his new bid on the presidency, it was apparent that this might not go well. Only after about three and a half years, maybe a little less, it was clear that democracy in the U.S. was back in crisis. And indeed, on November 5, 2024, Donald Trump was once again elected President of the United States.</p>
<hr />
<p>For me, this was a shock. Not because I desperately wanted Biden back in the White House (Harris would have been the better choice), but because I knew that this time around, Trump and his entourage had a plan, provided by the Heritage Foundation: <a href="https://www.project2025.observer/en">Project 2025</a>. It was serious. We couldn’t hope for Trump to doodle his way through a second presidency until the electorate would put an actual president in power again. No, this time, his administration would have much more capable personnel, and a dedicated plan to reshape the U.S. government according to a deeply authoritarian and Christian-fundamentalist worldview, including what is known as the “<a href="https://en.wikipedia.org/wiki/Unitary_executive_theory">Unitary Executive Theory</a>.”</p>
<p>But still, I retained hope. After all, the U.S. Constitution clearly outlines a separation of powers and implements checks and balances. The president can’t introduce new laws, Congress has a broad range of leeway and the “power of the purse,” and the courts can always rule against the administration. So, when Biden waved goodbye for a final time, and Trump moved back into the White House, I remained as optimistic as possible.</p>
<p>I was able to keep this feeling for about two weeks. By January 31st, I couldn’t sugarcoat the developments in the administration anymore. “DOGE” wreaked havoc among Congressionally appointed agencies and ripped apart the fabric of the regulatory government. In addition, “Schedule F” was reinstated, court orders violated, innocent people were deported to random countries, and Trump started to make laws — not via Congress, but via executive order. And all of it with impunity. There is little the Trump administration had to fear from Congress, which is fully under control of the Republican Party. And thanks to a conservative stacking of the supreme court, not even from the constitutional court. SCOTUS even decided to <a href="https://www.supremecourt.gov/opinions/23pdf/23-939_e2pg.pdf">give the president full immunity during his presidency in an unprecedented ruling</a>. And, as if to kick someone who’s already on the ground, Trump recently demolished the entire East Wing of the White House for a ballroom that is not just architecturally reminiscent of a certain, bygone era.</p>
<p>Today, just nine short months after Trump’s inauguration, it is almost impossible to recount all the violations of the Constitution, court orders, and social norms of the rule of law. The administration’s strategy of maximalism, to flood the ether with outrage to incapacitate media and the public alike, has worked. These days, Trump violating a court order is just a regular Tuesday. Gutting an entire agency is a Wednesday afternoon. And withholding funds for SNAP (Supplemental Nutrition Assistance Program) despite the existence of emergency funds, thus putting 12% (!) of the U.S. population in danger of severe malnutrition (!) is just another political fight.</p>
<hr />
<p>How can I justify having worked on U.S. Congress for five years, if it all doesn’t seem to matter anymore? What help is it that we now know how responsive representatives are to economic crises, if they do not enact legislation anymore? Why bother understanding the party pressure that influences representatives in their vote decisions, if there are no more votes?</p>
<p>Since February of this year, I have struggled not to lose hope in the relevancy of my thesis. I lived through the five stages of denial; believed that, if Congress stops being relevant, then maybe my work has historical worth; or maybe one could archive the thesis for the future, when Congress may be operational again. But right now, I do not feel that my thesis has any worth for contemporary observers of U.S. policy. It seems to me even less worth than a regular PhD thesis that is only read by you, your committee, and your overexcited aunt.</p>
<p>I believed that, after decades of the public almost forgetting about Congress, it was in order to shed a light onto the not-so-photogenic parts of U.S. government, to highlight the grunt work of lawmaking. And for what? For a government shutdown? For Mike Johnson completely locking down the entire House? Refusing to swear in a newly elected representative? For a president simply bypassing Congress via executive order?</p>
<hr />
<p>In 2018, U.S. political scientists Steven Levitsky and Daniel Ziblatt have written an instructive book on democratic backsliding in the U.S., “How Democracies die.” They revisit a century of authoritarian rule around the world and try to come up with a benchmark for how well the U.S. is doing. The book was written in the middle of the first Trump presidency, and they outline three potential scenarios for the future: A swift democratic recovery (“unlikely”), a second Trump term leading to white-national autocracy (what they call an unlikely “nightmare-scenario”), or a continuing polarization and vile political fights between the two parties (what they deem likely).</p>
<p>It turns out, the nightmare-scenario might have become true. But I believe that even this scenario appears almost <em>benign</em> to the political developments we are now witnessing in the U.S.</p>
<hr />
<p>I really hope that none of that persists in the long term. I really hope that the U.S. can find back a way to democracy, and that it does not devolve into bleak authoritarianism. Even if it is not relevant to lawmaking in the U.S. right now, I really do hope that my thesis will not remain irrelevant to understanding the current “State of the Union” for too long.</p>
<p>I truly believe that understanding how parliaments work is a worthwhile endeavor. But for the time being, I will probably focus more on European democracies. None of these are resistant to autocracy either, but the constitutional settings at least make it appear less likely to see an authoritarian takeover anytime soon. But democratic backsliding is in full throttle. And we need to understand how it unfolds before it is too late.</p>]]>
  </content>
</entry>
<entry>
  <title>I think I Finally Got Monads</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/i-think-i-finally-got-monads" />
  <id>https://www.hendrik-erz.de/post/i-think-i-finally-got-monads</id>
  <published>2025-10-27T11:00:00+00:00</published>
  <updated>2026-02-04T10:14:27+00:00</updated>
  <summary type="html"><![CDATA[Sometimes, we all get hung up on fringe phenomena that are largely inconsequential for the world&#039;s pressing issues, but still satisfy some urge to understand within us. One such thing for me were monads, a weird little concept from group theory that sits at the heart of many programming jokes. I have spent years trying to understand them, and now that I finally did, I had to realize that I will probably never need them.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/i-think-i-finally-got-monads">
    <![CDATA[<p>So, hear me out. At this point, “I understand monads” is basically a meme. Nobody understands monads. And, once you understand them, “you lose the ability to explain them.”<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> This means: no promises for this article. There are only two options: Either I actually understand monads, and you won’t understand what I write here, or I didn’t actually understand monads, and so you won’t be any the wiser.</p>
<p>This article is for you if you either (a) share the same, subconscious urge to understand this programming concept; or (b) if you want to simply follow me down yet another rabbit hole. For me personally, writing this article feels relieving, because my brain can now follow more relevant political and sociological puzzles for the final weeks of my PhD studies![^2]</p>
<h2>Why Is It So Hard To Understand Monads?</h2>
<p>First, why is it so hard to understand monads? I have been trying to understand monads for the past five years, on and off. I wasn’t consumed by the thought that I didn’t yet understand them, but once or twice a year, I got this <em>urge</em> and wanted to know more about them. But I never really got them. And I grew frustrated. Because over time, <a href="https://www.hendrik-erz.de/post/the-transformer-architecture-a-visual-guide-pdf-download">I understood transformers</a> (both encoder and decoder layers), and I even understood word2vec enough to <a href="https://github.com/nathanlesage/node-sgns">build a functional prototype</a>. The more I understood complex concepts of programming, statistics, and mathematics, the more it bugged me that I didn’t understand “that one concept.” I am always keen on improving my knowledge of programming, which led me from class-based OOP (Object-Oriented Programming) to functional programming, and I felt like monads would be the next step. However, they always eluded me.</p>
<p>I believe there are several reasons for why this is. The first is, more tongue-in-cheek, that, once you actually get them, you “lose the ability to explain them.” The second, I believe, is that we all come from different trajectories into programming, and thus need a different starting point to understand monads. A third is that monads basically break how we think about programming languages. And a fourth is that we really don’t need monads all that often.</p>
<p>The first reason is obviously not meant entirely serious, but there is a kernel of truth to it. To everyone who does <em>not</em> understand monads, including me, every explanation sounds like enchanting a mantra, or cursing your family. I mean, come on — “a monad is a monoid from the category of endofunctors.” Excuse me? I kept on reading new and new explainers, but all simply used the same magic incantations to “explain” monads. There was simply no breakthrough moment, because I just could not make the necessary connections between the words, their meanings, and, most importantly, the <em>why</em> of monads.</p>
<p>The thing is, math is a very efficient language. Math uses very short and concise words, but words that come with a barrage of meaning. It’s easy to say “Monads separate pure functions from impure functions, and they elevate values from the concrete context into the abstract monadic context until we execute the functor and retrieve the result.” But entire books could be written on that modest sentence to explain it. And these words are so rare that my spell checker has littered this paragraph with red squiggly lines.<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">2</a></sup></p>
<p>The second reason is something I just realized recently. All of us who didn’t enjoy a formal computer science education (a.k.a. most people who write code) usually started at some odd point in time. My first entry into programming was C++, Java, and then PHP. All of which are <em>heavily</em> class-based, that is, these are languages that encourage you to write object-oriented code. Later, I switched into more functional programming, when it actually started to make sense. Other people may start with Bash, Python, or simply R. A select few have started with Haskell and thus learned monads immediately. Nowadays, more and more people start by learning Rust and get exposed to monads from the start. All in all, this means that, regardless of where we come from, we need different analogies and mental models to build the necessary bridges between the origin of monads in math, and their implementation in software. For me, for example, I felt like the breakthrough came in a video that explained monads using Python (a language I know), and then made a connection to Rust’s <code>Option</code> type, which finally made it “click.”</p>
<p>The third reason speaks to the fact that monads are (to my very limited knowledge) the only concept that is still extremely close to its mathematical foundations and thus is more difficult to understand than, say, functional programming. And even though there are many bridges between math and programming, there is this fine distinction between pure math and the more pragmatic world of software engineering. Some concepts simply do not translate well from one world into the other. Also, the problem that monads solve is a very particular one that has been solved differently in nearly all existing programming languages, and so monads just don’t make a ton of sense in most of them. In other words, you literally have to break how you think in whichever programming language you know, in order to be able to understand monads.</p>
<p>Lastly, every explainer of monads that I have seen so far has used <em>extremely</em> simple examples to explain monads, but as I have realized, if you have a very simple program, monads actually overcomplicate it. And I am a person that always needs to know a <em>reason</em> for why we do certain things. If nobody gives me a reason, then why should I understand it? Monads simply do not make sense in simple programs, and the more data you have, the less it makes sense. In fact, I would argue that relatively shallow, pipeline-style data-analysis programs (that is, 90% of the programs I write in my research, because I work with data), would be <em>hurt</em> if we used monads in them.</p>
<p>Add to these reasons the fact that, by now, monads have become basically a meme, and you have a good sense for why monads are (a) so mysterious, and (b) so difficult to understand.</p>
<p>Side note: when I started the day (I originally wrote this article) I think I understood monads, I started with asking an LLM. I realized that, since explainers for monads are <em>everywhere</em>, every GPT model should have at least a few dozens of them in their training data. But after first querying Llama v3.2, then Mistral, and lastly ChatGPT, I realized that none would be able to explain monads to me. All of them, when asked for “why” we should use monads, replied the same: “It’s easier to reason about your program.” Yes, I’ve read that sentence a hundred times, but <em>why</em>?! This has cemented my suspicion that most monad-explainers basically just use the same words, which all don’t make sense to someone who doesn’t yet understand monads.</p>
<h2>TL;DR: Monads are Essentially a Way of Error Handling</h2>
<p>After that, the very abridged claim for what monads actually are (according to my understanding, which may be totally wrong) is quite simple: It’s a programming pattern that handles errors in a particular way. This is what caused the “Aha” moment in my understanding: As I was <a href="https://www.youtube.com/watch?v=Q0aVbqim5pE">watching this guy explain monads using Python</a>, he said the magic words: Monads are a way to handle errors that is <em>different from how most programming languages handle errors</em>. Aha! Suddenly, it all made sense.</p>
<p>First, the video explained why it is so hard to see a reason for them to exist: Because we don’t <em>need</em> them. Each language has some error handling. Most have no monads, but we survive. Second, Rust uses monads in its <code>Option</code> type, and since it enforces a very different type of error handling, it explains why writing Rust always felt off to me. Third, how they actually work is slightly odd, but it is possible to understand it if one tries to set aside the fancy math language for a second. Fourth, it explained why it was so difficult to grasp the functionality of monads: Because most explainers use Haskell to explain monads. But Haskell has a built-in syntax for monads, meaning that it literally hides away the actual functioning of monads. In other words: Every tutorial (<a href="https://www.youtube.com/watch?v=t1e8gqXLbsU">except the one from Computerphile</a>) that uses Haskell to explain monads explains how to write Haskell code, but not how monads work.</p>
<p>With that out of the way, let’s get into my own take on monads, in the hopes that I avoid using jargon to explain jargon to you. But again, I might not have understood monads, actually.</p>
<h2>What are Monads?</h2>
<p>Let us build up an understanding of Monads, trying to explain the crucial parts of the weird monadic language as we go.</p>
<h3>Side Effects</h3>
<p>At the foundation of an understanding for monads is the concept of “side effects.” A side effect, in plain terms, is <em>anything</em> where your code interacts with the external world. Here, “external world” is defined as <em>everything</em> outside your CPU. The idea is that your program “lives” in the CPU, and we assume the CPU not to die. When you write a function that adds up 2 and 2, and it does not take any input, it cannot, by definition, have any side effect, because your CPU will be able to calculate that. (We’re assuming that a computer “just works” and ignore weird stuff like cosmic rays here.)</p>
<p>A side effect, then, is anything unexpected that happens. This doesn’t happen when your computer calculates 2+2, but it <em>can</em> happen when you read from your disk. Because the file you wanted to read may not be there. Or, when you want to fetch a file from the internet, but the computer your code runs on has no internet. Finally, an important but overlooked “side effect” is printing stuff to a console. This can also go wrong, as it requires your computer to have a display, a cable to that display, etc. I hope you get the gist. Any interaction with any piece of hardware outside your CPU is a side effect.</p>
<p>Essentially, every program must have side effects, otherwise it is not going to do useful things for you. Any useful program will have at least two side effects: First, getting data into it, and second showing you the result. Both of these things happen outside the “pure” world of the CPU, and so — theoretically — something bad can happen. The aim of monads is, as texts frequently say, to “push side effects to the edges of your program.” I <em>always</em> thought this meant “run side effects either at the beginning or the end, but never in between.” Instead, I figured out, side effects can occur at any time. It is not <em>when</em> you run them, it is <em>where you deal with them</em>.</p>
<h3>Error Handling: Monads Vs. The Rest</h3>
<p>This was the core blocker for my understanding of monads. Most programming languages have a <code>try-catch</code> construction that allows you to safely run a piece of code that may throw, catch the error, and instead of outright crashing, try to recover. For example, take the following Node.js function that might read a plain text file from disk:</p>
<pre><code class="language-typescript">function readFile (path: string): string {
    return fs.readFileSync(path, 'utf-8')
}
</code></pre>
<p>Looks good, right? Except when the file at <code>path</code> doesn’t exist, or your program lacks permission to read it. Then, the function <code>readFileSync</code> will throw an error. But the function I wrote is oblivious to it, it doesn’t catch it. So the error will be passed to the calling function, and so on. Now, in most programming languages, including every one I use, you would have to decide: What do I do in case of an error? Who should handle that? If you have a file that should extract, say, the mean of a column from a CSV, you might do the following:</p>
<pre><code class="language-typescript">function meanCSVColumn (path: string, col: number): number|undefined {
    try {
        const fileContents = fs.readFileSync(path, 'utf-8')
        const csvData = parseCSVData(fileContents)
        const col = csvData[col]
        return mean(col)
    } catch (err) {
        return undefined
    }
}
</code></pre>
<p>What we did here is very important: The function suddenly got an opinion, and it decided: If there is <em>any</em> error, I simply return <code>undefined</code>. Regardless of whether the file doesn’t exist, it’s not valid CSV data, or the column doesn’t exist, or it’s not a numeric column. We don’t care about it, we only want either the mean of that column, or undefined. This then means that whoever calls this function needs to handle the <code>undefined</code> result:</p>
<pre><code class="language-typescript">function selectFile (path: string): void {
    const res = meanCSVColumn(path, 3)

    if (res === undefined) {
        showDialog('CSV seems to be wrong: Could not calculate mean of column 4.')
    } else {
        // ... use the result ...
    }
}
</code></pre>
<p>This code is already somewhat “monadic.” Why? Because the function <code>meanCSVColumn</code> cannot fail. There will never be an error thrown. It fails not by throwing, but by returning undefined. The caller — here the function <code>selectFile</code> — checks whether the function fails, and if so, displays a proper error message to the user. I have chosen this name on purpose: <code>selectFile</code> is supposed to be called once the user has selected a file to open. In other words, the function runs by request of the user. The mean calculating function should not throw errors, it should just either return <code>undefined</code>, or said mean. It should not do error handling, because it doesn’t know what the caller’s intention was. Is the caller fine with the function failing? In that case, throwing an error would be bad. Is the caller not? Then let them decide to display an error to the user that makes sense in context.</p>
<p>Monads essentially just abstract all of this into a nice programming pattern. That’s all there is to it. Monads ensure that error handling never happens while some functions do something to your data, but it is always the caller’s responsibility. We could make the <code>selectFile</code> monadic, too. If we decided that we should not do any error handling here, we could simply make that function return <code>undefined</code>, too. Then it would be up to whichever function calls <code>selectFile</code> to check if everything works properly.</p>
<p>Monads literally are just neat types around these two states. A function that either returns a number (a.k.a.: a result) or undefined (a.k.a.: it failed), could also just either return <code>Some(number)</code>, or <code>None()</code>. This is precisely what Rust’s <code>Result</code> and <code>Option</code> types do. They return some result, or nothing, and based on whatever a function gives back, you immediately know if it succeeded or not. In fact, this means that you won’t need <code>null</code>, <code>undefined</code>, <code>void</code>, or whatever anymore to indicate failure. In fact, a function could return <code>undefined</code> as a successful result!</p>
<p>But this also shows why monads often do not make sense: Programming languages that have a built-in way to represent failure, such as <code>undefined</code> or throwing errors naturally do not need monads. The function I have provided above uses Node.js’s built-in file reading function. As a monadic function, it could look like this:</p>
<pre><code class="language-typescript">function readFile (path: string): Result&lt;string&gt; {
    return fs.readFileSync(path, 'utf-8')
}
</code></pre>
<p>It looks <em>almost</em> like the original function. But instead, now the function doesn’t throw an error, but it <em>always</em> returns a result container which <em>may</em> contain a string (if the read was successful), or an error. Now you see why monads can make “reason about code” simpler: They are <em>explicit</em>. When it comes to error throwing, you will always need tracebacks to find the source of the error, but with monads, it can be faster to follow the call-chain to find where the error originated. In addition, you can see why implementing monads should usually follow the language: If a function already returns a monad instead of throwing errors, it would be weird to take the function result and throw an error if the result is <code>None</code>. Instead, it is much simpler to just pass on that monad to the next function.</p>
<p>You can “force” monads into try-catch languages, however. As a final example, take a look at the following snippet that takes the <code>readFileSync</code>-function as it is, but turns it into a monad:</p>
<pre><code class="language-ts">function readFile (path: string): Result&lt;string&gt; {
    try {
        return Ok(fs.readFileSync(path, 'utf-8'))
    } catch (err) {
        return Err(err)
    }
}
</code></pre>
<p>Now, with the main purpose of monads at hand, let’s discuss all the other fuzzy terms that are frequently being thrown around when trying to explain monads.<sup id="fnref:4"><a class="footnote-ref" href="#fn:4" role="doc-noteref">3</a></sup></p>
<h3>Pure vs. Impure Code</h3>
<p>Monads are often explained in that they separate pure from impure code. But for that, we have to understand what this is. This is relatively straight forward now. “Pure” code is code that <em>cannot</em> have any side effects whatsoever. “Impure” code, on the other hand, can. Consider the following program:</p>
<pre><code class="language-typescript">function square (x: number): number {
    return x ** 2
}

result = square(5)
</code></pre>
<p>This is a pure program. Which means: Under the assumption that your CPU doesn’t suddenly explode or that there is some cosmic ray flipping a bit in its L1 cache, it will <em>always</em>, until the end of time, be guaranteed to take the number five, and square it. So let’s print that out:</p>
<pre><code class="language-typescript">function square (x: number): number {
    return x ** 2
}

result = square(5)

console.log(`Result: ${result}`)
</code></pre>
<p><em>Oh no! We just wrote impure code!</em> The <code>console.log</code> function tries to print “result” to the screen. This requires there to be an <code>stdout</code> stream. And this could fail. Because the print statement interacts with the outside of the processor, here: all the hardware required to take data from the CPU (the result) and put it onto a computer display. This makes it impure.</p>
<p>Bottom line: There is no pure program. Every program <em>must</em> at some point interact with the outside world, which might fail, so it’s impure. There are <em>always</em> side effects. A perfectly pure program would just run, and not leave any trace of it. We would never know what it actually did. In other words: It’s useless to us.</p>
<p>This means that the distinction pure vs. impure makes more sense when talking about certain functions, or sections of your code. Some parts of your code may be pure, because they do not rely on anything external that could fail. (Again, the CPU strictly speaking is … external to the program, but we just assume it can never fail.) But your program <em>in total</em> is always impure, because at some point you inevitably need input or write text on the console.</p>
<p>However, the aim of a monad is not to separate pure from impure code. This is a big misunderstanding that I always had. The aim of <em>functional programming</em> is to separate pure from impure code.<sup id="fnref:5"><a class="footnote-ref" href="#fn:5" role="doc-noteref">4</a></sup> The aim of monads, however, is something else. Monads embrace impure code, and they don’t care whether some function is pure or impure. Instead, monads are a way to move error-bearing code to very particular parts of your application where <em>you</em> control what happens if something goes south.</p>
<p>This also finally explains why it is difficult to see the point of monads when all you get as examples are very simple ones like the one in the Computerphile video.</p>
<h3>Functor</h3>
<p>The next tricky term is “functor.” Many have tried to explain it to me, but once I learned that a functor is just a data structure that combines a value with a function, it made more sense. The following is a crude approximation of a functor in JavaScript:</p>
<pre><code class="language-javascript">function Functor (value, func) {
    return { value, func }
}
</code></pre>
<p>As you can see, a functor is nothing but a data structure which includes both some data and a function that can operate on that value. But it doesn’t (yet). This will become important later, but for now: A Functor basically “stores” a function call and a value you can provide to that function for later calling.</p>
<h3>Endofunctor</h3>
<p>Now you can understand an endofunctor. The prefix “endo” (Greek, you know how math works) basically just means something akin to “within” or “contain.” An endofunctor is essentially just a functor that returns something that has the same type signature. So if you have a functor of type <code>int</code>, when you call it, it must also return a functor of type <code>int</code>. When it returns a functor of type string, it’s not an <em>endo</em>functor anymore. This makes more sense with actual types. Say you have a functor that takes a number and a function that operates on the number:</p>
<pre><code class="language-typescript">function Functor (value: number, func: (val: number) =&gt; Functor) {
    return { value, func }
}
</code></pre>
<p>This is <em>not</em> an endofunctor, because it does not return something of the same type yet. It takes a number, but returns a functor. To achieve that, you will need a functor that accepts itself as its value. However, the opposite also works, and you can just write a functor that accepts <em>any</em> as the value:</p>
<pre><code class="language-typescript">function Endofunctor (value: any, func: (val: any) =&gt; Endofunctor) {
    return { value, func }
}
</code></pre>
<p>Now it doesn’t matter whether you call <code>Endofunctor(5)</code> or <code>Endofunctor(Endofunctor(null))</code>. However, if you noticed: We just got rid of the data types here. There’s nothing stopping you from providing whatever value to it. This is (a part of) the abstraction that people often talk about, and this is how Rust’s <code>Result</code> type works: It accepts any type of data. The only important part for the <code>Result</code> monad is that it can be either <code>Ok</code> or <code>Err</code>, a container for the value (<code>Ok</code>), or the error message (<code>Err</code>).</p>
<h3>Abstraction</h3>
<p>Monads are “abstract” in such a way that they don’t really care about what <em>value</em> they hold. Again, a monad is a way of error handling, and errors can appear in dozens of operations. Remember: most code is impure. This means that monads need to be able to ignore the values they receive. However, you will frequently find people tell you that monads have very specific function signatures: <code>int --&gt; int --&gt; double</code>. This is not essential for a monad because, again, it wants to abstract. Reintroducing some specific data types doesn’t help understand them.</p>
<p>This is best understood by <em>finally</em> introducing something that clicked for me, too, once I understood monads: Rust’s <code>Option</code> and <code>Result</code> types. In Rust, you typically have <code>Option</code>s and <code>Result</code>s. Specifically, you have a lot of functions that either return <code>Some</code> or <code>None</code>, or <code>Ok</code> or <code>Err</code>. Structurally, <code>Option</code>s and <code>Result</code>s are one and the same. The only difference is that <code>Err</code> contains an error description that can help you understand what went wrong, whereas <code>Option</code>s only give you <code>None</code> with no further explanation.</p>
<p>If a function gave you a <code>Some</code> you know that there is a useful value in it which you can use. However, if that function returned a <code>None</code>, you know that something went wrong. What you actually put into that <code>Some</code> is completely up to you. You could use <code>Some(5)</code>, or <code>Some(“string”)</code>, or even <code>Some(null)</code>. What is important is that the value is wrapped in a <code>Some</code>, and not in a <code>None</code>. You don’t really interact with <code>Option</code> directly, because that is just a type, which has two instantiations: Either a <code>Some</code> or a <code>None</code>. And whichever it is, you know whether the function call succeeded, or not.</p>
<p>If a program that accesses some file on disk and parses its contents returns a <code>Some</code> to you, then you know that (a) that file existed, (b) its contents were valid, and (c) you now have the data and can work with it. If the program returned you <code>None</code> you know that <em>something</em> has gone wrong: Either the file didn’t exist, or your program did not have read access, or the data couldn’t be parsed. This already gives you some glimpse into how monads perform error handling.</p>
<p>This is by the way also why many programming languages that support monads or are built on them make heavy use of <code>match</code> statements. In Rust, for example, you often <code>match</code> all available results from a function call, so that you explicate what the program should do if the call succeeds, versus if it does not. In Haskell, there’s something similar.<sup id="fnref:6"><a class="footnote-ref" href="#fn:6" role="doc-noteref">5</a></sup></p>
<p>Here’s a toy example from Rust demonstrating the use of the match operator:</p>
<pre><code class="language-rust">let result = divide(a, b);

match result {
    Some(value) =&gt; println!(&quot;The result is: {}&quot;, value),
    None =&gt; println!(&quot;Cannot divide by zero&quot;),
}
</code></pre>
<h3>“Elevating” Values</h3>
<p>Another term you will frequently stumble upon is that monads will “elevate,” or “lift,” values into some abstract realm, before moving them back down again. I always found this metaphor appalling, because it doesn’t seem to make sense. But essentially, what it means is to wrap a value in a functor. Essentially, at the start of running some monadic code, you will just have a plain value. Something boring and plain, such as a number, or a string, or a Boolean. Then you wrap it into such a functor thingy. By doing so, you “elevate” the value into the realm of monads. Once you unwrap this function (note the term “unwrap”), you essentially pry the value out of the functor, that is, access the value from this functor.</p>
<p>However, for now forget about this metaphor, because I personally find it very unhelpful. Instead, think rather about “wrapping” a value into a data structure, because that’s what happens in practice.</p>
<h3>Monoid</h3>
<p>A monoid is some math concept that I don’t completely understand, but I gathered the following: it is something that defines some operations and its result. The <code>Option</code> monad in Rust, for example, comes in the flavors <code>Some</code> and <code>None</code>, and you can chain up computations you do on whichever value is wrapped in <code>Some</code>, including some operations. Monoids also need to be commutative (so the order of computations should not matter). And so on. As you can see: What you actually call the things is pure semantics at this point. The important part is this “chains computations on some piece of data.”</p>
<h3>“Binding”</h3>
<p>Another term I frequently encountered is to “bind” something. This is, if I am not mistaken, just a fancy term for chaining up functions together. In Rust contexts, I have seen <code>and_then</code>, and some LLM has spat out <code>of</code>. Remember, monads are a variant of functional programming, where the idea is that you have some value, and you want to massage it a bit, until you get the result. The “binding” essentially just means to add one additional step. For example, you may want to take a number and multiply it by three, and then divide by two. “Binding” then just means that you take the value, wrap it into a monadic type, queue up a function to multiply it by three and divide by two, and then running the functions.</p>
<h3>Deferred Computing</h3>
<p>One term that I don’t read often but that I find fundamental to understanding monads is that it also involves deferred computing, meaning: A monad first involves chaining data transformations you want, and then, once you’re done with it, executing all of these functions and seeing what pops out at the end. This is counterintuitive with most of the examples people use to explain monads, because in those toy examples, execution is practically instant – both theoretically, because there is little chaining, and practical, because it usually contains only few instructions. It does make sense, however, if you manage to think like a mathematician again, for a second.</p>
<p>What monads do is they are patterns of programming that allow you to <em>describe</em> a program. Let us imagine a program that allows a user to open and view a file from some cloud storage. Let’s use Google Drive. You can describe what the program should do in a series of steps:</p>
<ol>
<li>Check if there is internet</li>
<li>Access the credentials for Google Drive</li>
<li>Connect to the server</li>
<li>Query the file</li>
<li>Get the file</li>
<li>Parse the file</li>
<li>Display the file to the user</li>
</ol>
<p>In monadic code, you could write it as simply as such:</p>
<pre><code class="language-typescript">const result = Some('/path/to/file.txt')
  .bind(check_internet)
  .bind(get_credentials)
  .bind(connect_to_server)
  .bind(access_file)
  .bind(download_file)
  .bind(parse_file)
  .run()

if (result instanceof Some) {
    console.log('Computation successful', result.value)
} else if (result instanceof None) {
    console.log('Something went wrong')
}
</code></pre>
<p>As you can see: Very easy to understand (as some would say, “reason about”), because the code tells you exactly what it does. Only at the end, the result may be either <code>Ok</code> or <code>Err</code> (or <code>Some</code> or <code>None</code>, depending on which monad type you personally decided to use).</p>
<p>“But there is a clear order to the computations!,” you may now think. And indeed, we can’t just arbitrarily move around the various functions we have chained up. If we parse a file before we actually download it, there won’t be much to parse. But this is still a monad. Why? Because the <code>Option</code> monad doesn’t care about the <em>value</em>.</p>
<p>The beauty of this approach now becomes apparent once you think about each of these steps. For example, you may support multiple cloud storage providers, not just Google Drive. In that case, the <code>get_credentials</code> function gets more complex. It itself may call a variety of functions, such as checking some database where multiple credentials are stored, or checking a file on disk. The list goes on. However, it may be that <code>get_credentials</code> does not need to run, because there is no internet. Does the <code>get_credentials</code> function need to understand that? Not at all. Indeed, here’s how you can implement this using monads:</p>
<pre><code class="language-typescript">function get_credentials (path: Option&lt;string&gt;): Option&lt;Credentials&gt; {
    if (path instanceof None) {
        return None()
    } else {
        return fetch_for_path(path) // This itself takes an Option and returns one.
    }
}
</code></pre>
<p>As you can see, the function <code>get_credentials</code> is now pure in the definition introduced earlier, because it doesn’t have any side effects. The side effects kind of “hide” within the <code>Option</code>. This reinforces the thought introduced earlier: Monads happily take all kinds of side effects. What they do is not actually produce purely “pure” code. There will always be errors. It’s just that many functions can be implemented without any error handling, as that will be passed on to the caller. It is really just about ensuring that errors, if they occur, are moved into a space in your program where you can deal with them properly. Instead of throwing errors randomly across your code base, you can define a central place where you want all errors to end up, and then you can handle them appropriately.</p>
<p>Indeed, given that this other function, <code>fetch_for_path</code> also takes an option and returns an option, you can completely get rid of this instance check. This then gives you some additional space to check for more meaningful things, for example: Should we grab Google Drive credentials, or rather iCloud credentials?</p>
<pre><code class="language-typescript">function get_credentials (path: Option&lt;string&gt;): Option&lt;Credentials&gt; {
    if (is_on_google_drive(path)) {
    	return get_gdrive_creds()
    } else if (is_on_icloud(path)) {
        return get_icloud_creds()
    }
}
</code></pre>
<p>Note that this really doesn’t describe deferred computing. We don’t set some timeout after which the code will actually run. From the computer’s perspective, it <em>will</em> absolutely run. But from the perspective of us <em>reading</em> this code, it will only run once we call the <code>run</code> function. I feel this is helpful.</p>
<h2>A Better Explainer of Monads</h2>
<p>So, what are monads? It’s basically a way of directing your traffic where you want it. Instead of <code>throw</code>ing errors, or <code>raise</code>ing them (like in JavaScript or Python), you wrap your computations in some container that keeps them “floating” in the air, until you call the <code>run</code> function. In Rust, you usually <code>unwrap()</code> them. This will actually commence the computation. And this means that, if some error occurs, it will be delivered to the code that calls the <code>unwrap</code>. If you have a function that you don’t want to deal with any error, you just return such a monadic type, too.</p>
<p>All of this basically allows you to organize your programs differently than what we all have learned with so many programming languages. Instead of erroring left and right, you control what happens. This <em>can</em> be useful, but it is not necessary.</p>
<p>This leads me to a final section:</p>
<h2>Conclusion: You Probably Don’t Need Monads</h2>
<p>Most of us very likely never need any monads. In fact, I think there are only three cases in which using monads makes actual sense:</p>
<ol>
<li>You write in a programming language where monads are first-class citizens (Rust, Haskell).</li>
<li>You want to explicitly move any error-handling code to a particular part of your program, and not rely on throwing and catching errors at specific parts of the application.</li>
<li>The program has been built on top of the idea of monads.</li>
</ol>
<p>I would argue that points 2 and 3 are not always a given. I have seen <em>very</em> large programs that run without a single monad. So it’s more about preference rather than necessity, even for large programs.</p>
<h2>Don’t Use Monads Unless You Need to</h2>
<p>In fact, I don’t think you even <em>should</em> use monads unless you absolutely need to. If your language of choice defines a way to throw errors, this is a good indicator that it was written without monads in mind. Plugging monads into such a language is certainly possible, but what is the point? If you come from math and have learned to think in such a “monadic” way, then absolutely, go ahead, but if not, I feel it makes everything more complicated.</p>
<p>This leads me to the type of code I usually write: Data analysis pipelines. This is specific code that has one particular property that make monads useless. Completely useless. And that is that data analysis code is <em>required</em> to run without errors. Any “side effect” such as some data column not being present or some file being missing is a complete stop for your program. The only <code>Option</code> you have then is to abort any further processing because the data required for anything afterwards is not present. Your program is useless without the data, or side effects. So you don’t need to go through the hassle of implementing monads, because you never want the computation to return <code>None</code>, and, if it does, your code is wrong.</p>
<p>Now, if you decide to use Haskell to do your statistics, then be my guest. But Haskell implements monads already, so it’s simple to use them. Not using monads would mean to work <em>against</em> the language. But in any other case, raise errors liberally. Because R, Python, and many other languages, have been designed with the error throwing idea, not with monads. And for data analysis code, it’s really okay never to use them.</p>
<h2>Never Work Against the Language</h2>
<p>And this is why monads are so hard to grasp: Most programming languages haven’t been built with monads in mind, and since it’s just <em>one way</em> of writing code that can fail, it is just as fine to simply throw errors, especially if the language provides them. Writing monads means to add a lot of code that is not strictly necessary.</p>
<p>However, if you write in a language that implements monads as first class citizens, such as Haskell or Rust, then you absolutely need to understand monads, because otherwise you would work against the language by <em>not</em> using monads.</p>
<p>The result of working against the language is code as you can see with my first experiment of writing Rust code. I am so used to the error handling patterns of other languages that I subconsciously forced my Rust code to check errors every time, instead of allowing to pass them along. Take this function that can switch an audio device:</p>
<pre><code class="language-rust">pub fn switch_device (&amp;mut self, device_index: usize) {
    let (stream, config, rx, real_index) = create_stream(Some(device_index), None);
    self.sample_rate = config.sample_rate.0;
    self.stream = Some(Box::new(stream));
    self.thread_recv = rx;
    if self.event_sender.is_some() {
		self.event_sender.as_ref().unwrap().send(AudioEvent::InputDeviceChanged(real_index)).unwrap();
    }
}
</code></pre>
<p>After understanding monads, I suddenly saw what violence I was doing to the language here. Specifically, the function never returns anything. Instead, what can go wrong — that is, the actual switching inside the <code>if</code>-statement — is immediately <code>unwrap</code>ped. This means that, if something goes wrong, the program <em>will</em> crash completely. And not give me any error. Instead, I should’ve removed all of these various <code>unwrap</code>s in the code, and returned a <code>Result</code> instead. Then, my main application which calls this function would’ve had to actually look at the result, and do some more proper error handling than simply … well, crashing the entire app, just because I couldn’t switch the audio device. But as it stands now, my main code will literally just call this function, and ignore whatever is going on with it.</p>
<p>As you see: If you think in terms of <code>try</code>-<code>catch</code>-constructs, you will fail at writing proper Rust or Haskell code. But, likewise, if you want to force monads into a language that does not have an understanding of them, you will just as well work against the language.</p>
<p>In short: If you don’t know whether you need monads – you probably don’t.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Taken from a comment under the <a href="https://www.youtube.com/watch?v=t1e8gqXLbsU">Computerphile video on monads</a>. Also found in <a href="https://rybicki.io/blog/2023/12/23/promises-arent-monads.html">this blog post by Chris Rybicki</a>.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>Interestingly enough, this is no longer true. When I first wrote this paragraph in May 2025, my spell checker has annotated most words in that paragraph, but now it only complains about “monadic.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:4" role="doc-endnote"><p>Of <em>course</em> none of these terms are “fuzzy” because they are mathematical and have well-defined meanings. This was a joke.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:4" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:5" role="doc-endnote"><p>I may write an article on that later.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:5" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:6" role="doc-endnote"><p>I’m intentionally vague with the concepts of Haskell or their names, because I’m not familiar with it, albeit my friend Albert Krehwinkel has, at times, tried to make me familiar with Pandoc’s source code.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:6" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>PhDone</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/phdone" />
  <id>https://www.hendrik-erz.de/post/phdone</id>
  <published>2025-10-24T11:00:00+00:00</published>
  <updated>2026-02-04T10:14:33+00:00</updated>
  <summary type="html"><![CDATA[After five years of research and writing, I successfully defended my dissertation thesis on Monday. I am now officially PhDone. Time for a first reflection on the process of writing a dissertation. I also share a &quot;Thank you&quot; to my supervisors and all of my colleagues and peers for being part of this incredible journey.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/phdone">
    <![CDATA[<p>Thirteen years. I waited thirteen years to write this post. A few days ago, on Monday, October 20, <a href="https://liu.se/en/news-item/research-reveals-the-link-between-language-and-lawmaking">I successfully defended my dissertation thesis</a>, <em>On the Record: Understanding a Century of Congressional Lawmaking through Speech and Vote Behavior</em> to earn the title of PhD.</p>
<p>On Monday, I have completed all three stages of research education. My PhD studies only lasted for the past five years, but my dream of becoming a researcher is much older than this. It took only a few weeks, maybe two or three months, after starting my undergrad before I was certain that I want to do research. And now I finally made it.</p>
<p>Even after five days of recovery, it still feels surreal that I can now officially call myself “Dr. Hendrik Erz.” Of course, I didn’t suddenly feel like a different person (counter to what we jokingly say to ourselves before the defense). But I do feel a subtle change. It is mostly the stress slowly leaving my body. But it is also the feeling of starting a new chapter of my life: the Postdoc phase.</p>
<hr />
<p>On October 30, 2020 — to the day exactly four years and 51 weeks ago — I published <a href="https://www.hendrik-erz.de/post/new-roads">the first article on this website</a>, titled “New Roads”:</p>
<blockquote>
<p>One large suitcase and a ticket to Stockholm: This is all I take with me to begin my PhD in Analytical Sociology at Linköpings Universitet.</p>
</blockquote>
<p>Now, almost five years later, I know all of these roads in and out. They are no longer new. They have become familiar roads. It is time to review the past five years.</p>
<p>Let me start directly with this website itself. I have promised the readers of this website — that’s you! — that I would be aiming at writing one article per week. Including this one, I have written 125 articles, and it has been 259 weeks since I started my PhD. This means that, while I was not able to keep my promise, I did manage to publish roughly once every fortnight. Which is still impressive, given that in that time I had to write a dissertation.</p>
<p>Five years is quite a lot of time (259 weeks, or 1,813 days), and so I not just managed to write some articles here. I also published three papers (<a href="https://www.ssoar.info/ssoar/handle/document/99217">[1]</a>, <a href="https://dl.acm.org/doi/full/10.1145/3703465.3703475">[2]</a>, <a href="https://www.sciencedirect.com/science/article/pii/S2543925124000287">[3]</a>), helped organize three conferences (<a href="https://liu.se/dfsmedia/dd35e243dfb7406993c1815aaf88a675/85352-source/schedule-sessions-2024-05-05">NSA 2024</a>, <a href="https://www.ic2s2-2025.org/">IC2S2 2025</a>, <a href="https://liu.se/en/event/eusn-2026">EUSN 2026</a>), became the master of disaster (“webmaster”) for the International Network of Analytical Sociologists<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">1</a></sup>, co-organizer of the <a href="https://liu.se/en/research/research-school-in-computational-social-science">Swedish Interdisciplinary Research School in Computational Social Science (SIRCSS)</a>, co-supervised three master students, and became a reviewer for <em>Network Science</em>.</p>
<p>I have done my fair share of professional service, and I do it happily. I have built a network of amazing colleagues with whom I share interests, work, methods, and theoretical approaches. I have been able to visit the U.S. two times (Princeton in 2023, NYC in 2025) and Canada once (Montréal in 2024). And I became familiar with a foreign country that, in some respects, by now feels closer to home than my country of origin. I learned the language, customs, and have made many friends along the way.</p>
<hr />
<p>This leads to another piece of reflection on my dissertation. During my defense, a colleague asked me an intriguing question: “If you were able to start your dissertation all over with the knowledge you have now, would you do anything differently?” I thought for a few seconds, trying to recollect the past five years, and think of an appropriate answer. After a long pause, I answered: “No.”</p>
<p>Later this week, a few colleagues were discussing my defense, and explained to me their amusement at this answer. When I asked them why, they told me that they read this question as one which one may use to demonstrate growth over the dissertation. They would’ve answered the question differently, highlighting potential mistakes they would like to avoid, etc.</p>
<p>I stay by my answer, however. I would have done nothing differently. At least not consciously. Of course, having lived through the experience of a PhD, I now have much more knowledge than before, and I will automatically avoid some of my earlier mistakes. But I believe this is not the point of a PhD. At least not the only one.</p>
<p>I understand a PhD to be an exercise in both doing research and acquiring a decisive set of cultural norms. Of course one is expected to publish an entire book at the end of the process. And of course one should be able to demonstrate that one is capable of performing independent research, publishing papers, and contributing to their own field. But here’s the thing: Many of these things are not instances of some mechanistic knowledge. It is culture.<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">2</a></sup></p>
<p>How does one publish a paper? Well, for this one needs to know how to write one and perform the required research for this. But the paper writing process starts with coming up with a research question. And already this initial part of doing research requires a lot of unwritten, cultural knowledge. What is a <em>good</em> research question? What is one that will likely be accepted at the best journals of the field? And what even <em>are</em> the best journals of the field?</p>
<p>All of these are questions one can only answer if one interacts with one’s own colleagues and peers; people one cherishes because of shared research interests and approaches to society. There is no walkthrough guide for becoming a researcher. And this is a crucial point: one can learn in a structured way how to do research. How to wield the tools available to the field; the different theoretical schools; and what the field focuses on more broadly. But what one cannot learn through reading and studying alone is all the small details. Things of seemingly no importance whatsoever, but which turn out to be the greatest helpers in succeeding in academia.</p>
<hr />
<p>Incidentally, this insight is not new to me. After only three weeks of doing my PhD, <a href="https://www.hendrik-erz.de/post/responsibility">I wrote in an article</a>:</p>
<blockquote>
<p>“Totes Wissen,” or <em>dead knowledge</em> is something you acquire just by hearing or reading something. Dead knowledge are the countless stories of PhD students who’ve thankfully shared their experiences with others like me so that we had an idea of what to prepare for. “Lebendiges Wissen,” or <em>living knowledge</em>, on the other hand, is such knowledge you don’t just possess based on reading, but your own experience as well. It is knowledge that has reified itself. You just <em>know</em> when “dead” knowledge has become “alive.”</p>
</blockquote>
<p>And this is, I believe, the biggest feat one achieves by finishing a PhD: Turning mountains upon mountains of “dead knowledge” into “living knowledge.”</p>
<p>To achieve this feat, however, intelligence is not as relevant as one may think. What one needs to learn the cultural norms, the small things, the seemingly inconsequential distinctions, is not intelligence, but an open mind. One needs to embrace the PhD, the field, the research, the journey.</p>
<p>When I started my PhD, I was full of gleeful happiness — and this in the midst of a global pandemic. I <em>wanted</em> to do a PhD. I wanted to be a good student, and become a better researcher. Throughout the past five years, I took every opportunity to learn the ropes of the trade, to network, and to <em>experience</em> academia.</p>
<p>No, I wouldn’t have done anything differently. A PhD is a journey one cannot speed up. This is why it takes so long to finish it. I believe that now I could speed up writing an entire book. But I don’t think that this is the point of being a PhD student.</p>
<p>Just as going to school and university are rites of passage, finishing a PhD is a rite of passage, too. When you finish school, you are not just educated, you are also a full member of society. When you finish university, you are not just an expert in a scientific field, but also an academic. And when you finish a PhD, you are not just at the forefront of a particular part of your scientific field, but also a scientist.</p>
<hr />
<p>The turn from PhD student to researcher is gradual. You don’t suddenly wake up on the morning of your defense, and turn from PhD student into researcher. No, when the morning of your defense dawns, you already <em>are</em> a researcher. The defense does not mark the transition from student to researcher, it signifies its end. The evolution from student into researcher is a slow process that happens during the PhD. And a large part of why this is, is the fact that you acquire the aforementioned cultural knowledge.</p>
<p>There are only very few reports of PhD students that fail their defense. Many explain that the primary reason is that your supervisor would never let you proceed to the defense stage of your PhD if you were not ready. But when <em>are</em> you ready? For the first four years of my PhD, I strongly believed this to be a mere function of your research output. I thought that one’s supervisor would occasionally take a look at your research output and, once it looked like three proper papers in a trench coat, they would tell you to go on and defend it.</p>
<p>But early this year, I realized that this was completely wrong. No, your supervisor will let you proceed to the defense only once they believe you have become an independent researcher. Once they are certain you have learned all the rules of the game. Of course, they won’t let you defend if the research itself is not ready. But they will take into account your personal character development, and maybe sometimes even weigh it more than your research.</p>
<p>I realized this because I stopped <em>feeling</em> like a student. I started to feel more and more like I know what I had to do. And this was when I realized that it is much more important to embrace the journey in its entirety. After I stopped feeling like a student, the speed of my research increased by magnitudes, and this is something that my supervisors as well as colleagues kept on telling me. Until November of last year I had barely one and a half papers, and even less of an idea of the bigger picture. Only a few months later, I suddenly had three papers, and an introductory chapter that showed a much clearer vision of my work.</p>
<p>You probably can write a PhD without becoming an independent researcher. But I don’t believe this should be your goal. I believe that it is crucial to not treat your PhD as pure employment, but also as the educational journey that it is. Yes, we are being paid to do that work. But we shall never forget that a PhD is still part of one’s education. We are not being paid to simply apply our knowledge to increase some company’s profits. We are being paid to <em>learn</em>.</p>
<hr />
<p>I still can’t believe that it’s over now. I still feel as if I only started a few weeks ago. But it has been five years. It has been an amazing journey, and I would never exchange this experience for anything else in the world.</p>
<p>While I slowly recover from the stress, I remain in deep gratefulness. I am thankful to so many people who enabled me to have this experience, who helped me along the way, who supported me, shared insight, made me laugh, and without whom this journey would’ve been a <em>tristesse</em>. “It takes a village to write a PhD,” and I feel privileged for the particular village I ended up being in. Thank you all.</p>
<p>As the saying goes:</p>
<p><em>Tack så mycket, and thanks for all the fish</em>.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:2" role="doc-endnote"><p><a href="https://www.analyticalsociology.com/about/council">https://www.analyticalsociology.com/about/council</a>.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>To be fair, I’m a cultural sociologist, so I’m biased towards seeing culture everywhere.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>What is Analytical Sociology?</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/what-is-analytical-sociology" />
  <id>https://www.hendrik-erz.de/post/what-is-analytical-sociology</id>
  <published>2025-10-17T21:00:00+00:00</published>
  <updated>2026-02-04T10:14:40+00:00</updated>
  <summary type="html"><![CDATA[This might be one of the hardest articles I have ever written. But it answers a seemingly simple question: What exactly is analytical sociology? The answer, it turns out, is surprisingly hard, and this article, too, is unable to unanimously answer it.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/what-is-analytical-sociology">
    <![CDATA[<p>In the last reflection article on my dissertation, I answered the most recent question that has been posed to me. Today, I want to revisit one of the oldest questions that someone has asked me.</p>
<p>I remember it vividly: It was December 2020, the pandemic still raging globally, and we were sitting in the traditional go-to bar of IAS, Ölstugan Tullen in Norrköping.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> It was me, a few coworkers, and the question-asker: the first PhD graduate of the Institute for Analytical Sociology, Alex Giménez de la Prada. Back then, he was almost finished with his thesis and set to successfully defend a few months later. We were sitting with a beer, and I was asking what the overarching goal of a PhD at the IAS would be. He answered: “What is analytical sociology? This is the big question that you should be answering with your PhD.”</p>
<p>I already suspected that this harmless sounding question would probably turn out to be an entire barrel of worms. And why wouldn’t it? It’s usually the innocent questions that turn out to be the monster under your bed. And since he told me that my entire PhD (that is, four entire years) should be dedicated to answer this, I took it with the appropriate amount of respect.</p>
<p>Over the next years, I have had many discussions with colleagues and friends about analytical sociology. I am especially thankful to Rodrigo Martínez Peña, with whom I discussed this question for years until he also successfully defended his PhD two years ago. It felt almost like a treasure hunt. But a treasure hunt without an end goal.</p>
<p>So, what exactly <em>is</em> analytical sociology? Do I have an answer, after five years?</p>
<h2>The Core Tenets of Analytical Sociology</h2>
<p>Having observed the field and the research emanating from there over the past five years, I believe there are five “core tenets” of analytical sociology, if you so will. Five things that mark something as a piece in analytical sociology. These stem largely from the four central theoretical tomes on analytical sociology, released in 1998,<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> 2005,<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">3</a></sup> 2009,<sup id="fnref:4"><a class="footnote-ref" href="#fn:4" role="doc-noteref">4</a></sup> and 2021.<sup id="fnref:5"><a class="footnote-ref" href="#fn:5" role="doc-noteref">5</a></sup> But they are also based on recent developments starting only in the late 2010s such as the emergence of computational social science, and my own observations.</p>
<p>The central hallmark of analytical sociology is certainly its focus on methodological individualism, “the striving for explaining social phenomena (mostly) without reference to structures and institutions.”<sup id="fnref:6"><a class="footnote-ref" href="#fn:6" role="doc-noteref">6</a></sup> Essentially, methodological individualism posits that, in order to understand <em>why</em> society works the way it does, we must look at the individuals comprising it, and not some macro-phenomenon we may imbue with meaning.</p>
<p>At the core of methodological individualism sits a specific tool that offers a <em>visual</em> icon like no other to identify a methodological individualist: the Coleman boat, or (in the German context), Coleman’s bathtub.<sup id="fnref:7"><a class="footnote-ref" href="#fn:7" role="doc-noteref">7</a></sup> The Coleman boat describes a way to think about social mechanisms, in which one can explain macro-phenomena through their micro-foundations.</p>
<p>It is clear that the Coleman boat is merely a <em>tool</em> to think about social phenomena, not the one and only description. Because it is possible to define a Coleman boat for any time period from minutes to centuries, and for a variety of use-cases it is a great analytical tool. But this also comes at the expense of ease of understanding.</p>
<p>This is one of the biggest issues that plague any new student of analytical sociology, including me. Where should one start? And what should I include in my theory, and what should I exclude? These are all very hard analytical questions, and part of what make AS “analytical” in the first place.</p>
<p>The Coleman boat seamlessly leads into another core feature of what people globally associate with “analytical sociology”: agent-based models. Once you accept the premise that society only works via micro-foundations, and that it is the combination of individuals and their actions that lead to the emergence of macro-phenomena, you enter the realm of ABMs.</p>
<p>Some believe ABMs to be a core feature of analytical sociology because of the way they map onto the epistemological foundations of the Coleman boat.<sup id="fnref:8"><a class="footnote-ref" href="#fn:8" role="doc-noteref">8</a></sup> An ABM is a simulation in which the researcher defines a set of actors and a set of actions they can perform, and observes what macro-patterns emerge from there. The most famous and “OG” ABM is probably Thomas’ Schelling’s segregation model.<sup id="fnref:9"><a class="footnote-ref" href="#fn:9" role="doc-noteref">9</a></sup> Indeed, it is one of the first <em>visual</em> demonstrations of individual-level behavior leading to emergent macro-phenomena any student of analytical sociology is exposed to.</p>
<p>The appeal of ABMs for analytical sociology is that they allow for testing highly specific mechanisms for how society works. They require a Coleman boat and a definition for each of the four nodes in it, and then one can just run a simulation to see if the theoretical hypothesis holds.<sup id="fnref:10"><a class="footnote-ref" href="#fn:10" role="doc-noteref">10</a></sup></p>
<p>This leads to a fourth tenet of analytical sociology: its use of middle-range theories. In a short chapter in the 2009 book, Peter Hedström and Lars Udehn explain what they understand as “middle-range.” Essentially, they occupy a middle space between the “grand theories” of the mid-twentieth century and very particular “mini-theories.” The idea is that sociology should only aim to explain a single phenomenon, using a Coleman boat approach and possibly an ABM, and neither try to limit its theory to a particular instantiation of the phenomenon, nor try to make general claims about a wide range of different instantiations of this and similar phenomena.</p>
<p>The fifth, and final, core tenet is relatively new, and it might be debatable in what way it actually constitutes a core feature of analytical sociology. In 2009, a paper by David Lazer et al. made headlines.<sup id="fnref:11"><a class="footnote-ref" href="#fn:11" role="doc-noteref">11</a></sup> Published in <em>Nature</em>, it was one of the first social scientific papers that recognized and embraced the “age of big data.” The authors argue that, with the new surge of available data, and increasing capabilities of computers, social scientists could tap into a data source of a never-before known resolution. Instead of relying on large-scale surveys, (field) experiments, or qualitative efforts, social scientists could just take a look at the massive amounts of data generated by human interactions globally to understand how society works.</p>
<p>This revolution has led to an increase in the computational requirements of analytical sociology. Today, I know only few colleagues who perform analytical sociology without the need for high-performance computing clusters (HPC). Almost all of my colleagues, including me, know of at least one period where we had to leave our computers running for days waiting for an analysis to conclude. Even our master students at the institute (who, coincidentally, are formally studying computational social science) all know of this. And once you realize that students usually don’t have the money for powerful laptops, you have an understanding for the frustration they sometimes have to go through.</p>
<h2>But What is Analytical Sociology, Really?</h2>
<p>But none of this really explains what <em>analytical</em> sociology is. Methodological individualism and the Coleman Boat were around way before the first book on analytical sociology. The same holds for agent-based models and computationally heavy workloads. Middle-Range Theories even stem from the times of Robert K. Merton. None of this really tells us what makes analytical sociology … well, <em>analytical</em>.</p>
<p>I believe that an answer to this question requires an indirect approach. There is no singular definition of what analytical sociology is. Indeed, if you take a look at the literature, many scholars have attempted to define analytical sociology, and there is still no consensus. Some have called it a “superfluous revolution,” others obsess over ontological problems in the practical application of it, and even the godfather of the field, Peter Hedström, continuously tweaks the definition of analytical sociology.</p>
<p>Instead of providing a direct answer, I will follow the lead of my university’s dean, who has — in my opinion — asked in the correct way: Does analytical sociology <em>solve the problem of lawmaking</em>?</p>
<p>Essentially, therefore, this article cannot answer the question generally, but by means of my dissertation. By doing so, this entire article is necessarily incomplete and only one piece to the puzzle. It is based on my personal experiences, having spent five years at the IAS. Due to the simple fact that any observer might see it differently,<sup id="fnref:12"><a class="footnote-ref" href="#fn:12" role="doc-noteref">12</a></sup> there cannot be a final answer to the question of what analytical sociology is or is not.</p>
<h2>Does Analytical Sociology Solve the Issue of Lawmaking?</h2>
<p>I personally believe that having access to the methodological toolkit of analytical sociology has been immensely beneficial. I cannot recount how often I have sketched a Coleman Boat in the past five years when trying to understand a particular phenomenon related to lawmaking. Also, simply being forced to take an individual-level perspective has made it much simpler to reason about lawmaking processes. It has prevented me from falling into the trap of assuming the entire U.S. legislature as one large “black box,” as some scholars tend to do.<sup id="fnref:13"><a class="footnote-ref" href="#fn:13" role="doc-noteref">13</a></sup></p>
<p>But I didn’t make use of the entirety of the toolbox of analytical sociology. I did not run an ABM, I did not perform simulations, and as such I was unable to perform proper in-depth causal analyses over the data. This also had to do with the fact that text data does not lend itself easily to analytical sociology. Analytical sociology has been developed in an age where all social scientists had available to them were simple, straight-forward behavioral datasets. Text data is not simply behavioral data. And this really is an issue.</p>
<p>I think that my dissertation is a testament to how difficult it can be to transform textual data in such a way that standard approaches to sociology become possible. And I further believe that much of the current deluge of text analysis that emerges in the field of computational social science is neat, but it also severely lacks the analytical depth required for rigorous research. Many of the papers that currently float on the hype wave will likely prove to be unreproducible. Also, if you think about them more deeply, you will realize that many of these papers cannot provide us with any form of mechanism for what makes society work. They have a certain form of … theoretical shallowness to them.</p>
<p>Turning back to the core question, I also believe that analytical sociology sometimes does <em>not</em> help us in understanding the issue of lawmaking. I have found that, in order to understand the discursive dynamics of U.S. Congress, I had to take recourse to more traditional political science frameworks. And this includes something analytical sociology tries to avoid at all costs: referencing macrostructures as part of the explanation.</p>
<p>You see, analytical sociology sees individuals as the atomic movers of society, the fundamental force of the universe. For analytical sociology, at least in a strict sense, there cannot be any macrostructure that isn’t explainable as the emergent outcome of dozens or thousands of individual interactions. And if you think about what this presumed ontology of society from the perspective of analytical sociology means in practice, one cannot escape a very particular quote manifesting in one’s head: “There is no such thing as society. Only individuals” (and families).</p>
<p>And maybe this is wholly true. Maybe it <em>was</em> a mistake of Hedström to renounce the necessity for accessing the mental states of individuals in order to arrive at theoretically solid causal mechanisms.<sup id="fnref:14"><a class="footnote-ref" href="#fn:14" role="doc-noteref">14</a></sup> But I would argue that, in practice, it is simply impossible to fulfill the requirements of a strict, narrow, causal form of analytical sociology.</p>
<p>And the primary reason is that we simply cannot have all data available to us. This is one of the – in my opinion – primary insights I gained over the past five years. I always struggled to place my work in the framework of analytical sociology, and only this year did I realize why this is the case. Politics, and lawmaking more specifically, always misses information. We <em>cannot</em> ever have full information on these processes. This insight emerged from my inability to include parties in my work. As I write in my dissertation introduction:</p>
<blockquote>
<p>Indeed, the epistemologically hazy foundations of party power in U.S. Congress is one of the primary reasons why this thesis must subscribe to the “weak” form of methodological individualism: Parties absolutely do wield influence over the lawmaking process, but most of this influence happens via backroom-politics, that is: off the record. Conversely, this thesis has only access to sources that were deliberately put on the record. Thus, in order to account for party power, the essays of this thesis have to utilize proxy measurements wherever necessary. This includes controlling for party affiliation, committee assignments, or seniority. However, all of these controls need to measure the party as a single macro-institution, without being able to track individual behavior. None of these controls will accurately measure the influence parties have over the representatives. This becomes evident in the final essay of this thesis, which resorts to measuring party pressure via the standard error of an OLS regression model (Nokken &amp; Poole, 2004). It serves as a reminder that parties can remain hidden in plain sight.</p>
</blockquote>
<p>So, <em>ultima ratio</em>, I have to conclude that analytical sociology does not, in fact, solve the issue of lawmaking. It provides a plethora of tools, and it makes it a thousand times easier to think about lawmaking. It has helped me tremendously to make sense of the data that I have, and figure out how lawmaking processes work. But ultimately, analytical sociology fell short of helping me cross the final threshold. Because, in the end, proper analytical sociology requires access to a perfect database.</p>
<h2>Final Thoughts</h2>
<p>I would call myself an analytical sociologist. I believe that analytical sociology can help us understand society to a degree that is impossible with many other approaches. It offers unprecedented resolution. And for this reason, I believe it is the single-most suitable approach to work on lawmaking processes.</p>
<p>But analytical sociology is no silver bullet. Strong methodological individualism poses strict requirements to both data and analytical approach. It requires both removing a lot of the social grit that makes society interesting to study in the first place, and at the same time almost perfect data availability. And lawmaking can fulfill neither. Lawmaking is influenced by many factors, all of which can become quite important, including mere <em>cohabitation</em>. At the same time, a lot of lawmaking happens behind closed doors, where one of the decisive factors is that no data is being produced.</p>
<p>While I believe that analytical sociology is an incredibly powerful approach to society, lawmaking, and text analysis more specifically, demonstrates its limits. Not everything can be answered by analytical sociology. And methodological and theoretical pluralism in the social sciences are really the only answer if one aims to deliver a holistic understanding society. Sometimes, analytical sociology fails. But I see this as an opportunity.</p>
<p>So, what is analytical sociology? It is many things. And for different scholars, it means different things. The only thing I know after five years is that it offers amazing resolution that I would never want to miss.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Reminder: Sweden implemented the “YOLO” pandemic protocol.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>Hedström, P., &amp; Swedberg, R. (1998). <em>Social Mechanisms: An Analytical Approach to Social Theory</em>. Cambridge University Press.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>Hedström, P. (2005). <em>Dissecting the Social: On the Principles of Analytical Sociology</em>. Cambridge University Press.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:4" role="doc-endnote"><p>Hedström, P., &amp; Bearman, P. S. (Eds.). (2009). <em>The Oxford Handbook of Analytical Sociology</em>. Oxford University Press.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:4" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:5" role="doc-endnote"><p>Manzo, G. (2021). <em>Research Handbook on Analytical Sociology</em>. Edward Elgar Publishing. <a href="https://doi.org/10.4337/9781789906851">https://doi.org/10.4337/9781789906851</a>&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:5" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:6" role="doc-endnote"><p>Erz, H. (2025). <em>On the Record: Understanding a Century of Congressional Lawmaking through Speech and Vote Behavior</em> [PhD Thesis, Linköping University]. <a href="https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-217773">https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-217773</a>, page 7.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:6" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:7" role="doc-endnote"><p>To quote my supervisor Jacob Habinek: “It’s either the Coleman boat or the Coleman bathtub, but we don’t know because James never drew any water.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:7" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:8" role="doc-endnote"><p>See Hedström 2005, Coda.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:8" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:9" role="doc-endnote"><p>Schelling, T. C. (1978). <em>Micromotives and macrobehavior</em> (1st ed). Norton.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:9" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:10" role="doc-endnote"><p>Disclaimer, before anyone gets mad: I have never run an actual ABM to test a mechanism-based hypothesis yet, and this is merely based on close contact with colleagues who did. It’s probably more involved than I make it appear here.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:10" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:11" role="doc-endnote"><p>Lazer, D., Pentland, A., Adamic, L., Aral, S., Barabasi, A. L., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D., &amp; van Alstyne, M. (2009). Life in the network: The coming age of computational social science. <em>Science</em>, <em>323</em>(5915), 721–723. <a href="https://doi.org/10.1126/science.1167742">https://doi.org/10.1126/science.1167742</a>&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:11" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:12" role="doc-endnote"><p>Luhmann, N. (1992). <em>Beobachtungen der Moderne</em>. Westdeutscher Verlag.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:12" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:13" role="doc-endnote"><p>Looking at you, international relations.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:13" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:14" role="doc-endnote"><p>Opp, K.-D. (2024). The recent turn in analytical sociology: The dismissal of general theories, mental states, and analytic philosophy – and the old issue of mechanism explanations. <em>Social Science Information</em>, <em>63</em>(2), 131–154. <a href="https://doi.org/10.1177/05390184241247724">https://doi.org/10.1177/05390184241247724</a>&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:14" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>Between Theory and Methods</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/between-theory-and-methods" />
  <id>https://www.hendrik-erz.de/post/between-theory-and-methods</id>
  <published>2025-10-06T10:00:00+00:00</published>
  <updated>2026-02-04T10:14:46+00:00</updated>
  <summary type="html"><![CDATA[In this second reflection article on my dissertation, I talk about theory. I explore my theoretical origins, why I am no longer a theory guy, and how the PhD journey over the past five years have changed the way I approach and write theory. I reflect on the style of theory, and why it needs to differ between theoretically-heavy and methodologically-heavy papers, as both parts need to match each other.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/between-theory-and-methods">
    <![CDATA[<p>In this article, I wish to answer a question that has been posed by a colleague almost immediately after I submitted my PhD thesis: which parts of the various papers did I enjoy writing the most, and which one did I enjoy the least? It is a natural question and one that I instinctively believed to be fairly easy.</p>
<p>I thought for a few seconds, and then responded that I couldn’t really point my finger to it, and that I probably enjoyed everything somewhat equally. However, I noticed an oddity. I did enjoy writing the chronologically first and third papers more than the second one.</p>
<p>I struggled for about a week to find a proper hook for this article, until — in the dark of night, in a heavily delayed German train — I had the necessary insight. Yes, I did enjoy writing all my papers, but the fact that I did enjoy the second paper a little less was because that fell into a time when something in my writing changed: I went from writing pure theory pieces to writing methods-papers.</p>
<p>At first, my writing was heavily influenced by the more classical, German sociological tradition. It shows in my chronologically first paper, as it is somewhat sparse in methods, but rich in theory. As I grew accustomed to the toolbox of analytical sociology and computational social science, however, this began to change. The chronologically second paper falls into this twilight zone, where my writing adapted to accommodate more complex methodological setups, but wasn’t quite there yet. And the chronologically last paper in my dissertation then manages the switch.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup></p>
<p>This is what I want to focus on today: My academic provenance in theory, the acquisition of methodological skills during my PhD, and how this has completely changed how I do research and what I enjoy about it.</p>
<h2>Theory and the Historical Sciences</h2>
<p>During my B.A. and M.A., I heavily focused on historical and social theory, and until I joined IAS, this didn’t change much. This was not the least due to the fact that I simply lacked methodological experience, and felt uncomfortable with having to perform analyses that others could <em>trust</em>. I knew the basics of survey methodology, and what an OLS regression looks like. I learned about PCA, correspondence and factor analysis<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> but neither did I ever work with actual data, nor did I have a feeling for what an OLS was telling me.<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">3</a></sup></p>
<p>During my B.A., theoretical work was essentially built-in, because that’s what history is about: collecting historical evidence and then building a theory that can explain how historical events likely went about based on this evidence.<sup id="fnref:4"><a class="footnote-ref" href="#fn:4" role="doc-noteref">4</a></sup> In history, theorizing about past events is really the only way to tell a cohesive story, because historical evidence does not lend itself to straight-forward causal claims. But very quickly I realized that purely historical methodology is inadequate for explaining how history really happened.</p>
<p>The inflection point for this insight came in a seminar on medieval history. In my term paper, I was tasked with exploring the founding history of the diocese of Verden, a small German town close to the city of Bremen. Like with any other source of data, the farther you go into the past, the less reliable the data becomes. And we’re talking early European medieval ages (around 800 AD), so you can probably imagine how bad it was. By 2013, when I took the course, diplomatics researchers had already done a good job at sifting through the archival evidence, and we had a good understanding about the available facts. But those facts were so sparse that <em>any</em> statement about what might have happened back then amounted to little more than tea-leaf reading.</p>
<p>So I decided to throw in a bit of sociological theory to see if it stuck. I chose neo-institutionalism,<sup id="fnref:5"><a class="footnote-ref" href="#fn:5" role="doc-noteref">5</a></sup> because that was what I just learned at that time. And indeed, by applying this theory to the data, a somewhat better picture of the founding history of that diocese emerged. What I didn’t think about, however, was that I was dealing with a medievalist, and not a sociologist. The term paper ended up being the worst-graded thing I wrote during my entire undergrad. However, when I applied the same theoretical framework to a similar historical episode in my bachelor thesis, I received a very decent A-. That confirmed my suspicion that my bad grade in the seminar was mostly due to my undergrad hubris and failing to read the room, and not a fundamental flaw in my thinking.</p>
<p>At this point I realized that history was not my vocation. I was too interested in understanding social change and how society works, and had too little interest to continue on my path of becoming a historian. So, for my graduate studies, I switched to sociology.</p>
<h2>From Historical to Sociological Theory</h2>
<p>In the fall of 2014, I started my graduate studies in sociology. Or rather, as my university called it, “Societies, Globalization, and Development” (<em>Gesellschaften, Globalisierung und Entwicklung</em>, often abbreviated GGE). I realized only after I started the program that the reason we didn’t have a “Sociology” program was that the institute was simply lacking personnel for it.<sup id="fnref:6"><a class="footnote-ref" href="#fn:6" role="doc-noteref">6</a></sup></p>
<p>However, I am very happy that I received the GGE-package, and not a straight-forward sociology package. The downside of the program was certainly its lack of methodological education, but this lack was compensated by a huge amount of sociological theory and sheer breadth of knowledge. I took courses in international relations, domestic policy, area studies, systems theory, history of sociological science, and more. I was taught the primary schools in international relations, economic sociology, how finance works in Ethiopia and Bangladesh, and the history of Western sociology. In the evenings, I was a frequent guest at lectures from the philosophy department to bring myself up to speed on the philosophy of science.</p>
<p>All of this meant that I was well-equipped with broad world knowledge, but little methodological knowledge. There were a few methods courses that mostly revolved around surveys and simple OLS regression, but those made up only part of the study program. I received the biggest push in methods education only in the final year of my graduate studies, when I took a course on advanced methods. The closest I got to proper applied quantitative methods during that time was when I was employed in a project on food security in Ethiopia, which required me to wade through a few Gigabytes of quantitative survey data using Stata.</p>
<p>What I am eternally grateful for is that the program helped me develop a good theoretical sense and critical thinking skills. This helped me publish my very first peer-reviewed paper in 2019 on crowd science and the sociology of violence.<sup id="fnref:7"><a class="footnote-ref" href="#fn:7" role="doc-noteref">7</a></sup> I ended the program with a 100-page theoretical thesis on the cycle of violence between British and U.S. imperialism in the Middle East and radical Islamism that culminated in the founding of ISIS.</p>
<h2>Entering the Lab</h2>
<p>With the start of my employment at the Institute for Analytical Sociology, the days of me being a theory-nerd were over. Now it was all about quantitative methods: data, precision, methodological rigor. And, more than some of my colleagues, I really had to sit down and learn. I lacked what colleagues from more traditional quantitative programs already brought to their PhD. Given that I made it this far, I appear to have been successful, although this is for my committee to decide.</p>
<p>The papers that comprise my dissertation are approximately chronologically ordered, and across all three, one can clearly see a trajectory of me acquiring methodological skills. The first paper, <em>Policymaking in Times of Crisis</em>, is very theory-heavy. I spent a lot of time scouring through the qualitative literature to understand the case I was dealing with. The methodological part is comparatively short. In the end, it turns out to be the most straight-forward methods framework of all my papers. I spend pages introducing and discussing the underlying theory, and then a few paragraphs on collecting data, and testing my hypotheses using very simple fixed-effect OLS regressions.</p>
<p>My second paper, <em>Measuring Issue-Level Polarization</em>, is somewhat of a middle child. I got a bit more adventurous with my methods, and my theoretical part shrank substantially. However, the paper does not feel holistic. Methodologically-wise, I was still learning, and the theory did not lend itself nicely to the approach. That paper was all about a methods-heavy approach where theory came second. It almost appears as if the paper’s methods and theory parts are in a form of “competition” for attention. I will come back to this below.</p>
<p>The third paper then, <em>Brittle Parties?</em>, appears to be better both in terms of methods and in terms of theory. Re-reading it now, I get the feeling that it’s one example for what I learned during my PhD. The theory part is more solid than in paper number two, and it leads naturally to a set of measurements I will need in order to answer my research question. The theory properly binds my research question to my methods. And the methods, in turn, are suitable to answer the questions. They neither attempt to look as fancy as possible, nor are they standard run-of-the-mill.</p>
<p>While none of them are perfect, I think the papers show an evolution from an abstract theoretical writing and research style to a methodologically heavy, quantitative one. And the more I think about it, the more I believe that it would be weird if all my papers were equally good. It appears almost “honest” to have a paper somewhere in the middle, which is neither my best nor my worst work, as it serves as evidence for the development that has happened in my thought processes over the past five years.</p>
<p>Looking back, it is interesting to see how my interests have completely changed. I still enjoy theory and thinking about concepts, but I am equally fascinated by applying state-of-the-art methods. At first, my theory was an end in itself. Now, it serves the purpose of answering a research question, and rests on equal footing with the methodological approach.</p>
<p>But it was a long journey. In the first two years of my PhD, I was happy if I got a simple, linear regression going. At the same time, my colleagues were pushing for complex causal models; simulations; and agent-based models. Gradually, this changed. As I participated in more and more courses (one of the requirements of earning a Swedish PhD involves accumulating 60 ECTS in course work<sup id="fnref:8"><a class="footnote-ref" href="#fn:8" role="doc-noteref">8</a></sup>), my confidence in applying methods grew. By now, I am comfortable applying run-of-the-mill statistics as well as top-notch text analysis methods from LDA through word2vec to Encoder-models and generative AI.</p>
<p>There is one more thing, though. It is not just that I learned to weave methods into my theoretical thinking. In order to properly include quantitative methods in my research process, I had to change the way I approach theory.</p>
<h2>Theory ≠ Theory</h2>
<p>This brings me to a final point for today’s reflection. The way one writes theory, and how one approaches theoretical problems is partly a function of the discipline, and partly a function of the methods. I mentioned that my second paper is a middle child. The reason for this is that the theoretical part does not appear to fully fit the methodological approach. In that paper, I used the same, theory-heavy writing style like I did in my first paper, but then added a heavy methodological part on to it. Now I realize that the theory one writes for theory-papers is of a different kind than the theory required to write methodological papers.</p>
<p>Indeed, no theoretical approach is the same. Historical theory is fundamentally different from political theory, which in turn differs from sociological theory, which differs from the theory you need for a quantitative paper. I never excelled at historical or political theory; and I believe this is partly due to the fact that historical theory never fully “clicked” for me and that political theory has its very own language which I do not speak.</p>
<p>But even within sociology, theory can differ tremendously; both in form and content. Gabriel Abend has written an excellent piece on “The Meaning of Theory” (2008)<sup id="fnref:9"><a class="footnote-ref" href="#fn:9" role="doc-noteref">9</a></sup> that tries to disentangle the various notions of theory, and how to justify one’s own choice in a paper. Abend argues that, which definition, or concept, of “theory” one uses is partly a political question because there is no “real or objective referent for ‘theory’” (p. 176). I argue that the way you write theory is partly defined by the type of paper you write, hence a function of the methodological approach.</p>
<p>In my paper on the sociology of violence, I rely on research literature to argue that the conceptual perspective of crowd science has affected the sociology of violence, and is visible in many concepts the latter uses. I spend many paragraphs arguing that the choice of words from the 19th century crowd science was derogatory and subjective, and that sociology can – and should – do better.</p>
<p>In my dissertation papers, I do not produce an argumentative chain to discuss concepts and their drawbacks. Instead, I focus on particular phenomena, and produce theoretical insight only insofar as it is relevant to the case at hand. Instead of weighing theoretical notions and their pros and cons, I draw on theory for a set of expectations and relevant concepts, and then let the data speak. However, and this is important, my methodological choices and my measurement instruments are all derived from theoretical insight. I use measurements not because they are convenient, but because theoretical accounts argue that they are the correct ones.</p>
<p>Doing so requires a learning process The reason that the theory-part of my chronologically first paper, <em>Policymaking in Times of Crisis</em>, looks substantial is because I had been working theoretically on the concept for a few years, so I already had a good grasp on it. The methods are fully in the service of the theoretical part, and are used merely as a test for some of the theoretical assumptions I make. The core of that paper is theory, and since I had ample experience with theory writing, this is a potential explanation for why it looks good (at least for me – again, the dissertation committee may come to a different conclusion).</p>
<p>The reason for why my chronologically third paper, <em>Brittle Parties?</em>, appears (to me) to have an equally good match between methods and theory is that the type of theory used here is entirely different from the one I used in my first paper. While in the first paper, my methods were in the service of my theory, in the third paper theory is in the service of my methods. Theory serves as the connecting thread that ties my research questions to my measurements and methodological challenges.</p>
<p>Both my first and third papers, therefore, feel well-rounded to me because the fit between theory and methods is better than in my second paper. The style of theory differs completely, but it matches the corresponding methods. Thus, the reason for why my chronologically second paper, <em>Think Alike, Talk Alike?</em> looks much weaker in comparison, is because I wrote a theory part similar to my first paper, while at the same time applying methods like in my third paper. Theory and methods thus do not match. This leads to the impression that theory and methods appear to be in some kind of competition for attention. What should the reader focus on? The theory? Or the methods? Or both? It’s hard to determine this just from looking at the outline.</p>
<h2>Final Thoughts</h2>
<p>So, what do I enjoy most — theory or methods? This might be the wrong question. No theory can survive without a methodological approach, regardless of whether it is qualitative or quantitative. Likewise, methods cannot live without theory, as has become clear during the debate on “post-theory” from the late 2000s. But depending on how one weighs each part, the style of theory must adapt. Writing theory-papers where the “data” is other research literature requires a different process than writing methods-papers.</p>
<p>This change that I lived through was subtle. Through my rich PhD education, I learned to value the ability of being able to test theories using quantitative data. Instead of restricting myself to theoretical thoughts in the proverbial ivory tower, I gained the ability to test theories with data. And while this means that my papers now have a large methodological part which necessarily makes the theoretical parts smaller, this approach comes with its own challenges. And this trajectory shows in my dissertation.</p>
<p>To answer the question: I enjoyed all parts of the writing process. What I enjoyed less was the at times painful learning process that enabled me to write proper research papers in the first place.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>At least that’s what I hope. I haven’t defended it yet.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>This is primarily thanks to Prof. Andreas Schmitz, who joined my university during my graduate studies, and who did an amazing job in stuffing our brains with fundamental methodological knowledge in a short amount of time. More so, he will even be able to verify first-hand whether I learned properly as part of my dissertation committee. The world is small.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>Several years ago, I was in Stockholm together with my supervisor, and on the train ride back I uttered concerns about my regression models, and that I felt uncomfortable with reporting its results. When he asked why, he was surprised to learn I did not run a proper regression before.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:4" role="doc-endnote"><p>As our teachers never went tired of telling us: Following Leopold von Ranke, we were supposed to tell history “wie es eigentlich gewesen” (“<em>as it really happened</em>”).&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:4" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:5" role="doc-endnote"><p>Hasse, R., Krücken, G., &amp; Meyer, J. (2005). <em>Neo-Institutionalismus</em> (2., vollst. überarb. Aufl). Transcript-Verl.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:5" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:6" role="doc-endnote"><p>Fortunately, things have changed. By now there is not only sufficient sociological personnel, but word has it that there are plans for splitting up the combined political science and sociology department into one for political science, and another one for sociology.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:6" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:7" role="doc-endnote"><p>Erz, H. (2019). Der lange Schatten von Gustave Le Bon. Zum sprachlichen Einfluss der Crowd Science auf die Soziologie der Gewalt. <em>Soziologiemagazin</em>, <em>2019</em>(2), 71–88. <a href="https://doi.org/10.3224/soz.v12i2.06">https://doi.org/10.3224/soz.v12i2.06</a>.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:7" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:8" role="doc-endnote"><p>I accidentally accumulated over 70, because I forgot to count.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:8" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:9" role="doc-endnote"><p>Abend, G. (2008). The Meaning of ‘Theory.’ <em>Sociological Theory</em>, <em>26</em>(2), 173–199. <a href="https://doi.org/10.1111/j.1467-9558.2008.00324.x">https://doi.org/10.1111/j.1467-9558.2008.00324.x</a>.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:9" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>Five Years of Studying U.S. Congress: What Remains?</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/five-years-of-studying-us-congress-what-remains" />
  <id>https://www.hendrik-erz.de/post/five-years-of-studying-us-congress-what-remains</id>
  <published>2025-09-26T10:00:00+00:00</published>
  <updated>2026-02-04T10:14:51+00:00</updated>
  <summary type="html"><![CDATA[I am almost done. A few days ago, my dissertation &quot;On the Record: Understanding a Century of Congressional Lawmaking through Speech and Vote Behavior&quot; was published. Now it is time to sit back, and reflect. With this article, I am beginning a series of articles that will answer many questions and contextualize findings from my PhD research.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/five-years-of-studying-us-congress-what-remains">
    <![CDATA[<p>A few days ago, my dissertation, titled “On the Record: Understanding a Century of Congressional Lawmaking through Speech and Vote Behavior” was published via Linköping University Press.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> As I am writing these lines, it is still about four weeks until I have to defend this thesis. I will have to answer what I have been doing these past five years, how I have been doing this, and what its results mean.</p>
<p>And these are very relevant questions. What exactly <em>is</em> it that I have been doing in these past five years? What did we learn? How did I contribute to the scientific field? It is a cold September night, and in trying to find some sleep, I realized that the best preparation towards my defense may not be to simply re-read the entire thing, but to actually reflect on it. Answer questions that others possibly also have. In turn, I may be able to myself better understand what it is that I have been doing, and what it means.</p>
<p>I don’t know how long these reflections will become, or how structured these articles are going to become. But what I want to do with this and any following articles is three things: (1) explain to myself what I have been doing the past five years; (2) explain to others why it is important; and (3) answer questions colleagues and friends are posing towards me or that emerge while writing.</p>
<h2>Lawmaking in U.S. Congress</h2>
<p>Let me start with explaining what I have been doing. In essence, I study how the U.S. is making its laws, and how the interactions of various representatives can sometimes lead to unexpected policy outcomes. That’s it. This is the “Tweet my thesis”-version of my thesis.</p>
<p>But there is more, obviously. I take a close look at episodes in which lawmaking appears to diverge from its regular paths. I am not looking at episodes in which lawmaking works as it is supposed to. Instead, I look at crises, disagreement, and when laws get passed or rejected against all odds.</p>
<p>The core of my thesis consists of three distinct “irregularities,” if you so wish, of U.S. Congressional lawmaking. In one paper, I focus on what we mostly know under the term “neoliberal revolution.” In another one, I focus on the increasing party polarization, and try to understand how it plays out on the level of policy preferences. And, lastly, in a third paper I focus on dissent and breaking ranks by representatives during votes.</p>
<p>So, essentially, the correct answer to “What have I been doing the past five years?” would be: “To understand what happens when customs and regularities break down. What do legislators do when a crisis happens?”</p>
<!-- Deep down, this also satisfies a less scientific, but strong curiosity of mine. For the most part of my early life, I was surrounded by people who held the conviction that politicians are far removed from the lives of its population, and rarely do something that helps “the people.” I know that this is obviously not true, but it is also quite difficult to determine how it actually is. I believe that this was one subconscious driver of my research: trying to prove those people wrong, and understand where this misconception may come from. -->
<h2>How do Crises Help Understand Peacetime?</h2>
<p>But how can <em>crises</em> help us understand how a legislative works in general? Are crises not defined to be states of exception?<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> The answer to this question is threefold.</p>
<p>First, “man is an animal of difference,”<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">3</a></sup> as Georg Simmel has once put it in his essay “The Metropolis and Mental Life” (1903). In simple terms, this means that it is easier for us to understand how society works by looking at the “bumps” of history. Crises enable a comparison that is much harder to find in peacetime, because crises are precisely those times in which “[e]stablished cultural ends are jettisoned with apparent ease.”<sup id="fnref:4"><a class="footnote-ref" href="#fn:4" role="doc-noteref">4</a></sup> This implies two things: first, there must have been somewhat stable cultural ends which can actually get “jettisoned.” Second, by comparing how social indicators change in this time of crisis, this gives us a measure of what changed, and in what direction.</p>
<p>This points to the second part of the answer to our question. Unlike in physics, there is no “absolute zero” — no baseline — in society. That means: when we attempt to take measurements of society, we often cannot tare them against some known baseline. The closest concept of a “baseline” we have would probably be “normality,” but this is a curious concept. We all intrinsically understand when something seems normal, but it is almost impossible to define.<sup id="fnref:5"><a class="footnote-ref" href="#fn:5" role="doc-noteref">5</a></sup> The same holds true for legislative processes. What exactly makes lawmaking “normal”? This is why shedding a light onto “abnormal” times can help. When we cannot take absolute measurements and have no way of knowing what a normal procedure looks like in our measurements, sudden changes can help us infer how the process might’ve worked in normal times. Now that I am writing this, I realize that essentially all of my colleagues who have already defended did the same thing. In all of their theses and papers, you will find instances of looking at abnormal events or phenomena in order to understand society.<sup id="fnref:6"><a class="footnote-ref" href="#fn:6" role="doc-noteref">6</a></sup></p>
<p>The third part of the answer is somewhat more mundane and innately practical: The U.S. is, as of today, one of the oldest continuous democracies in existence. And this means that “understanding Congressional lawmaking” involves making sense of the figurative “metric ton of data”<sup id="fnref:7"><a class="footnote-ref" href="#fn:7" role="doc-noteref">7</a></sup> that Congress has produced. The longest continuous stretch of time which is covered by the various data sources spans significantly more than a century (116 years). Amid this flood of data, it is difficult to find anchor points, and as such, choosing well-known episodes of turmoil is a safe strategy to be able to get a hold of the phenomenon of interest.</p>
<h2>So, What Did We Learn?</h2>
<p>This leads to a final question (for today, at least): What did I actually show in my dissertation? As is the case with compilation theses, as opposed to monographs, the results are fractured; divided into three papers that comprise the core of my thesis.</p>
<p>In the first paper, I look at the “neoliberal revolution” under president Ronald Reagan in the 1980s. I wanted to understand why U.S. Congress would suddenly shift gears to pass laws that would abrogate decades of Keynesian economics and replace them with a new — and at that point essentially still unproven — idea. Mind you, the idea of “neoliberalism” was very young even in the 1980s. However, it had been taught at universities already, and as such has been passed on to the highly educated, who would then continue to become representatives in U.S. Congress.</p>
<p>Much of the literature on neoliberalism in the U.S. focuses on the presidency, and as such the role of Congress has been culpably neglected. What this paper can show is that we can see a sudden uptick in saliency of Congressional economic speech during the especially disastrous 1970s. This appears to imply that Reagan received help not just from his party, but also from many highly educated Democrats. Many appeared to agree with him on economical grounds, since the ideas of tax cuts and budget reductions likely resonated with what they learned in university.<sup id="fnref:8"><a class="footnote-ref" href="#fn:8" role="doc-noteref">8</a></sup> This might furthermore imply that at least in the 1980s the U.S. government worked the way it was intended — with presidency and U.S. Congress in a mutual dependency.</p>
<p>Moving on to the second paper in my thesis, I focus on the phenomenon of polarization. Much of the political science literature since the 1980s has detected a sharp increase in party polarization in the U.S. What this means is that the two parties move apart, rendering bipartisan agreements on legislative initiatives less and less likely. Over the years, political scientists have linked this to many ails in the U.S. legislative. This starts at frequent stalemates that may lead to government shutdowns, to a general hostile atmosphere, where politicians are less interested in mutual understanding, and more in personal gains.</p>
<p>However, most of the studies analyzing polarization rely on vote results. But vote results can only tell us how someone has voted, but not why.<sup id="fnref:9"><a class="footnote-ref" href="#fn:9" role="doc-noteref">9</a></sup> This means that these studies tell us little about what may drive this increase in polarization. Indeed, the discussion on the causes of polarization are one of the most intense debates in political science I have witnessed. What I do in my paper is look at a neglected source of information in this realm: speeches. What the paper can show is that, as we zoom in on individual issues, polarizing trends disappear. While polarization still occurs when taking all speeches into account, this implies that polarization is a macro-phenomenon that may not solely be caused by partisan identities and extreme politicians pulling the parties apart. What might be at play here are institutional pressures that produce polarizing trends against the representatives’ preferences.</p>
<p>Indeed, what has received surprisingly little quantitative attention are the institutions that dominate Congressional life — parties, whips, and committees. There are great qualitative works, do not misunderstand me,<sup id="fnref:10"><a class="footnote-ref" href="#fn:10" role="doc-noteref">10</a></sup> but what is lacking is a quantitative understanding of how these institutions interact with lawmaking processes. We know that parties have a huge impact on which legislation is being passed, and that committees decide on life and death of bills. But we know little about how they do so.</p>
<p>This is what I look at in my third paper. Here, I focus on what happens when representatives disagree with their party. How free, or capable, are they to break ranks with their own party? This is a surprisingly hard question to answer, non the least because what a party “is” is epistemologically a very difficult question. In fact, political scientist Keith Krehbiel nailed the problem when he asked “Where’s the party?”<sup id="fnref:11"><a class="footnote-ref" href="#fn:11" role="doc-noteref">11</a></sup></p>
<p>In order to at least approximate a measurement for the power of parties to direct their representatives in their votes, I look at how well the preferences of individual representatives align with their party. Based on this, I check whether a misalignment may make them more likely to vote against their own party. To my surprise, the results show a weird trend: parties are rapidly losing power over their representatives since the early 1990s. Today, if a representative wants to break ranks, they can simply do so. This was much harder in the 1950s or 1960s. This seems to imply that what happened is a strong alignment of parties and representatives in their preferences.</p>
<p>Taken together, these results paint a picture of how Congressional legislation worked and changed over more than a century. While I am unable to say much in detail about the period before the end of World War II, since the data is deteriorating quickly as we move into the past, afterwards some clear patterns emerge. First, it appears that the 1940s, 1950s, and 1960s are a very stable period for U.S. Congress. Now, politically ground shaking events took place at the same time, such as the civil rights movements. What I want to express is that the Congressional machinery worked like a well-oiled motor. In the 1970s, as the U.S. economy was devastated by a flurry of crises, the outlook changed. The presidency of Ronald Reagan shifted Congressional priorities. However, one cannot say that this necessarily cemented all the trends we can see in Congress. Many of the detrimental trends scientists have found have started to pick up speed only since the early 1990s, and I believe that representatives such as Newt Gingrich with his “Contract with America” may have done more long-term harm than Reagan.</p>
<p>All of this leaves us with a legislative system that, in 2025, is barely able to withstand presidential authoritarian pressure and folds almost daily. Daniel Ziblatt and Steven Levitsky have argued in their 2018 book “How Democracies Die” that the pivotal point at which the U.S. political system took a nosedive was when the parties stopped working as gatekeepers. This is certainly a possibility. What remains is that, by focusing on three moments of crisis in the Congressional system, we can see that what happens today is a result of a progressive degradation of parties and representative culture alike.</p>
<h2>Open Questions and Further Threads</h2>
<p>All of this leads to more questions. Based on my results, how might we conceptualize the second Trump presidency? Is this also a form of crisis? Also, I did not talk about my methods here, yet. Additionally, as I am talking about my thesis with friends and colleagues, more questions emerge. Just today, the question popped up: Which parts did I enjoy writing, and which ones didn’t I? This has led to an interesting discovery of a trajectory of turning from a theory-driven researcher towards a more methods-based researcher.</p>
<p>There are many open questions, and as I read through, and reflect on my work of the past five years, more will emerge. I will try to do my best to write these down; partly as research notes, partly as targeted articles, but always with the intention to publish them here, which will greatly increase their readability, and motivate me to think further.</p>
<p>The coming weeks will probably be dedicated exclusively to this one, big, white building on the banks of the Potomac River, and the people who fill it with life.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Erz, H. (2025). <em>On the Record: Understanding a Century of Congressional Lawmaking through Speech and Vote Behavior</em> [PhD Thesis, Linköping University]. <a href="https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-217773">https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-217773</a>&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>See Koselleck, R., &amp; Richter, M. (2006). Crisis. <em>Journal of the History of Ideas</em>, <em>67</em>(2), 357–400.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>Full quote: “Der Mensch ist ein Unterschiedswesen, d. h., sein Bewußtsein wird durch den Unterschied des augenblicklichen Eindrucks gegen den vorhergehenden angeregt” (transl.: “Man is an animal of difference, i.e., his mind is stimulated by the difference between his current and previous impression”).&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:4" role="doc-endnote"><p>Swidler, Ann. 1986. ‘Culture in Action: Symbols and Strategies’. <em>American Sociological Review</em> 51 (2): 273–86. <a href="https://doi.org/10.2307/2095521">https://doi.org/10.2307/2095521</a>.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:4" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:5" role="doc-endnote"><p>To state the obvious comparison, “Normalcy is like porn: we know it when we see it.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:5" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:6" role="doc-endnote"><p>This might just be a variant of survivorship bias, because of course I have not read all of my predecessors’ theses completely. But of those which I did read and understood — Rodrigo Martínez Peña (<a href="https://doi.org/10.3384/9789180755283">link</a>), Miriam Hurtado Bodell (<a href="https://doi.org/10.3384/9789180756181">link</a>), and Anastasia Menshikova (<a href="https://doi.org/10.3384/9789181180992">link</a>) — all of them looked at irregularities of society to understand its regularities.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:6" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:7" role="doc-endnote"><p>Completely unnecessary information: While writing this sentence I was interested in knowing why the phrase “metric ton” has this ring of “a very large amount,” and according to a quick internet search, it’s because there are two definitions of tonnes. The U.S. defines a ton as 2,000 pounds, but the metric ton actually weighs a bit more, with ~2,204lbs.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:7" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:8" role="doc-endnote"><p>Related to this, Elizabeth Popp Berman has written a highly instructive book on the Democrats adopting a very <em>technical</em> language during that time, called “Thinking like an economist: how efficiency replaced equality in U.S. public policy.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:8" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:9" role="doc-endnote"><p>Incidentally, this was the point at which I realized that a somewhat implicit question in a lot of quantitative text analysis is: do you look at the <em>form</em> of text, or its <em>contents</em>? I will go into this further.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:9" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:10" role="doc-endnote"><p>The “classics” are certainly Richard Fenno’s “Congressmen in Committees” (1973); Gary Cox’s and Mathew McCubbins’s “Setting the Agenda” (2005); and Lawrence Evans’s “The Whips” (2018).&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:10" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:11" role="doc-endnote"><p>Krehbiel, K. (1993). Where’s the Party? <em>British Journal of Political Science</em>, <em>23</em>(2), 235–266.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:11" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>Supporting Liquid Glass Icons in Apps Without XCode</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/supporting-liquid-glass-icons-in-apps-without-xcode" />
  <id>https://www.hendrik-erz.de/post/supporting-liquid-glass-icons-in-apps-without-xcode</id>
  <published>2025-09-19T12:00:00+00:00</published>
  <updated>2025-09-19T12:56:51+00:00</updated>
  <summary type="html"><![CDATA[Apple released its new operating system five days ago, and so in this article I will explain how developers can add new icons based on Liquid Glass to their apps if they don&#039;t use XCode, but rather Electron or Tauri.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/supporting-liquid-glass-icons-in-apps-without-xcode">
    <![CDATA[<p>On Monday, Apple released the new version of its operating system, macOS 26 Tahoe. With it comes a bunch of new features, apps, and the big design overhaul, called “Liquid Glass.” There have been quite a few discussions on the pros and cons of this new design language, which I don’t want to go into here.</p>
<p>Instead, I want to talk about icons. Because those also got a facelift with the new update. Since I had less to do this week, I took my newly-gained freedom to sit down after work and try to get Zettlr’s app icon up to speed to conform with this new logo design.</p>
<p>What I imagined being a quick exercise in testing out Apple’s new Icon Composer — an app that does make creating icons very easy — turned out to take my entire evening. The main reason for this is that I don’t use XCode. After all, the app is cross-platform, and while I am happy to go the extra mile to accommodate macOS, Windows, and Linux, there is a limit to how much I can do.</p>
<p>It turns out that, while redesigning the entire app icon process, Apple also decided to clean it up a bit, and so now they added this subtle requirement for having XCode installed. In this article, I’ll quickly go over what developers have to do without having to keep XCode installed because they – for example – rely on CI pipelines to automatically build their apps on computers without XCode.</p>
<h2>Creating macOS Icons — Then and Now</h2>
<p>First, a short excursus for what has changed that suddenly makes adding custom icons to macOS apps so much less straight-forward than previously.</p>
<p>For the past decades, all operating systems had relatively simple mechanisms to defining software icons. On Windows, it has been and will always be <code>.ico</code>-files, Linux opted for PNG images, and macOS for the longest time had <code>.icns</code>-files.</p>
<p>Essentially, <code>.icns</code>-files are just Zip-archives of the same icon in various sizes. To make an app icon look as good as possible across our vastly differing screen sizes, Apple decided to invent this icon format to store various pre-calculated sizes in one place. That’s pretty neat!</p>
<p>However, with Liquid Glass, they made some changes. Specifically, and <a href="https://developer.apple.com/videos/play/wwdc2025/361/">Apple themselves said this in a training video</a>, there are two issues that make having a set of static images less desirable with the new iteration of Apple’s operating systems. First, the icon should be the same on all platforms, including iOS, iPadOS, watchOS, visionOS, macOS, and whichever OS I just forgot to list. All of these devices come with different screen sizes and layouts for icons. Having several icons for each of these platforms would increase the size of this small icon file by a large amount. Second, and more crucially, however, the new Liquid Glass design rests on a lot of graphical fidelity and effects. We are talking about background filters, specular lighting, blurring, and various amounts of color tinting. Trying to fit all of that into static images would just be insane.</p>
<p>So they opted for a new format, aptly named <code>.icon</code>. This is not a file, but rather a folder that contains two things: The vector graphics that make up the icon, and a file describing certain properties of the various icon layers. For example, app developers can choose the amounts of translucency for certain parts of their icon, how the different layers blend in with each other, and so on.</p>
<p>However, this <code>.icon</code>-folder just contains some instructions on how to create the icon — it’s not yet the icon itself that your app can use. That is why it’s now a bit more difficult to add such a new icon to your app, especially if you don’t use XCode. For XCode users, nothing has changed; they can continue to import their icons into XCode and the software will make sure that the icon ends up in the correct place.</p>
<p>For everybody else — this includes Electron developers, Tauri developers, React Native, and anyone who develops for macOS but without XCode — this process is now much more involved.</p>
<h2>Creating a Liquid Glass Icon (Almost) Without XCode</h2>
<p>From a bird’s eye perspective, the new workflow for creating icons outside XCode looks like this:</p>
<ol>
<li>Use the <a href="https://developer.apple.com/icon-composer/">Icon Composer</a> to generate the <code>.icon</code>-file.</li>
<li>Run the <code>actool</code> utility to compile it into an <code>Assets.car</code> file</li>
<li>Place the <code>Assets.car</code>-file into the <code>Contents/Resources</code> folder of your app bundle and sign it. Make sure that it’s signed.</li>
<li>Add the entry <code>CFBundleIconName</code> to your app’s <code>.plist</code>-file.</li>
</ol>
<p>Step 2 — <code>actool</code> — is what actually introduces the dependency on XCode. For whatever reason, this tool comes preinstalled on macOS, but requires XCode to work. So unfortunately, to create a Liquid Glass icon you will need to have access to a Mac with XCode installed on it. However, once you have created the <code>Assets.car</code>-file, you can ditch the Mac. I recommend you do step 1 and 2 at the same time, and then you can figure out steps 3 and 4 later, depending on your tool chain.</p>
<p>To get started, you will need to create an icon using Apple’s Icon Composer. This is not part of this guide, so I recommend consulting Apple’s resources to learn more, starting with <a href="https://developer.apple.com/videos/play/wwdc2025/361/">this informative video</a>.</p>
<p>Once you have the <code>.icon</code> available, you’ll need to compile it into an <code>Assets.car</code>-file. This file is an “asset catalog” and it is essentially very similar to the old <code>.icns</code>-files. However, unlike the <code>.icns</code>-files, it does not just contain a set of images, but additional instructions and materials such as gradients, colors, and basically all the things that macOS needs to properly render an icon in the new Liquid Glass UI.</p>
<p>To generate such an asset catalog, we need the <code>actool</code> mentioned above. It is essentially a small utility that enables compilation of these catalogs, and XCode uses it when you develop, e.g., iOS or macOS apps.</p>
<p>Apple intended the tool for compiling apps via XCode, but there is nothing preventing us from running the tool without any code project. Here’s a handy Bash-script that I have written which you can adapt to take in an <code>.icon</code>-file, and spit out an <code>Assets.car</code> file. Note that it will generate a <code>.plist</code>-file as a side result, and you cannot disable this behavior. But you can just remove that file afterwards.</p>
<pre><code class="language-bash">#!/usr/bin/env bash

ICON_PATH=&quot;./resources/icons/Zettlr.icon&quot;
OUTPUT_PATH=&quot;./resources/icons&quot;
PLIST_PATH=&quot;$OUTPUT_PATH/assetcatalog_generated_info.plist&quot;
DEVELOPMENT_REGION=&quot;en&quot; # Change if necessary

# Adapted from https://github.com/electron/packager/pull/1806/files
actool $ICON_PATH --compile $OUTPUT_PATH \
  --output-format human-readable-text --notices --warnings --errors \
  --output-partial-info-plist $PLIST_PATH \
  --app-icon Icon --include-all-app-icons \
  --enable-on-demand-resources NO \
  --development-region $DEVELOPMENT_REGION \
  --target-device mac \
  --minimum-deployment-target 26.0 \
  --platform macosx

rm $PLIST_PATH
</code></pre>
<p>Take note of the filename of your icon, because <code>actool</code> will quietly assume this to become the name of your icon in the asset catalog. Since asset catalogs contain all assets of an app within a single file, it is important to tell macOS which of the many assets is the actual icon.</p>
<p>At this point, you should have your icon, and your icon inside an asset catalog that you can now use to build your app with whatever tool chain you use.</p>
<h2>Adding the Icon to Your App</h2>
<p>So, how do we get this icon into the app? For this we will have to do two things: First, add the <code>Assets.car</code> into your app’s <code>Resources</code>-folder and sign it, and second, add the key <code>CFBundleIconName</code> with the string value <code>Icon</code> (or whatever your icon has been named) to the app’s <code>.plist</code>-file.</p>
<p>Let’s start with adding the <code>Assets.car</code>-file. If you’re using Electron Forge to build the app, adding the catalog to your app requires a single line of code. Inside your Forge configuration, you simply need to add the path to your <code>.car</code>-file to the property <code>packagerConfig</code> → <code>extraResource</code>. This ensures that Electron will pick up the file at the correct time, copy it into the <code>Resources</code>-directory, and sign it.</p>
<p>For Tauri, it’s almost as simple. <a href="https://v2.tauri.app/distribute/macos-application-bundle/#infoplist-localization">Here</a>, the property is called <code>bundle</code> → <code>resources</code>, and this should also place the file correctly. For any other tool chain, there should be an analogous way of performing this step.</p>
<p>Next, you need to tell macOS that there’s a new icon in your app. You do this by modifying your app’s <code>.plist</code>.</p>
<p>Again, if you use Electron Forge, it is extremely simple. Add the key <code>CFBundleIconName</code> either to your <code>.plist</code>-file, if you make use of it, or provide it directly as a key to your <code>extendInfo</code>-property in the <code>packagerConfig</code>.</p>
<p>If you place it into a <code>.plist</code>-file, it should be contained under the top-level <code>&lt;dict&gt;</code>, and look like this:</p>
<pre><code>&lt;key&gt;CFBundleIconName&lt;/key&gt;
&lt;string&gt;Icon&lt;/string&gt;
</code></pre>
<p>A minimal example for Forge would look like this:</p>
<pre><code class="language-javascript">// In file forge.config.js
module.exports = {
    packagerConfig: {
        extraResource: [ './path/to/Assets.car' ],
        extendInfo: {
            CFBundleIconName: 'Icon'
        }
        // Alternatively: extendInfo: './path/to/info.plist'
    }
}
</code></pre>
<p>For Tauri, the process looks essentially the same and is <a href="https://v2.tauri.app/distribute/macos-application-bundle/#native-configuration">fully documented here</a>.</p>
<h2>Final Thoughts</h2>
<p>While it took me an entire night to figure this process out, which, in the end, turned out to be relatively benign, I hope that these instructions will help you do the same but faster.</p>
<p>Just a final note: Don’t just throw out your existing <code>.icns</code>-file. There are still many out there with Macs that don’t run the newest macOS, and if you remove the <code>.icns</code>-file, the app will break for all of these people. For the coming years, be prepared to ship both the <code>.icns</code>-file and the new Liquid Glass version.</p>]]>
  </content>
</entry>
<entry>
  <title>I Built an Interactive Schedule for my Field&#039;s Largest Conference (and yours, too)</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/i-built-an-interactive-schedule-for-my-fields-largest-conference-and-yours-too" />
  <id>https://www.hendrik-erz.de/post/i-built-an-interactive-schedule-for-my-fields-largest-conference-and-yours-too</id>
  <published>2025-06-21T09:00:00+00:00</published>
  <updated>2026-02-04T10:15:02+00:00</updated>
  <summary type="html"><![CDATA[How do you communicate the what and where of various events at large conferences? This seems to be still a relatively unsolved problem, at least in the FOSS space. To properly navigate participants around the IC2S2 conference 2025 in Norrköping at my institute, I decided to build a flexible solution for this. In addition, it should work for any conference. So if you have to organize a conference at some point, this tool may come in very handy.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/i-built-an-interactive-schedule-for-my-fields-largest-conference-and-yours-too">
    <![CDATA[<p>At some point in time, I have published an article on this website in which I said something to the essence of “academia runs on spite.” I don’t remember if I wrote that <em>I</em> run on spite, or my entire industry. But someone does. And when it comes to conferences, this is even more true.</p>
<p>A few months ago, I was asked by a professor at my department whether I would be interested in helping organize the largest conference of my field, the International Conference on Computational Social Science (IC2S2). Of course, I said yes.<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> Since I’m the IT guy at the department, it fell onto me to create the website. However, another task that already dawned upon me was one that no one else was thinking of at that point: we would have to come up with a clever way to design an interactive schedule for the conference, because organizing several hundred people across four days <em>will</em> yield a maze of program points that people will have to navigate.</p>
<p>What I came up with after some weeks of engineering work was <a href="https://github.com/nathanlesage/conferia">Conferia.js</a>, an interactive schedule tool that respects organizers’ time and gives participants a clear schedule to never get lost amid dozens of sessions and keynotes. In this article, I want to share with you why I wrote this tool, how I did so, and how you can use it for any conference going forward to prepare an easy interactive and feature-rich schedule.</p>
<h2>The Problem: Scheduling for Large Conferences</h2>
<p>Let us begin with the problem that this tool solves: scheduling for large conferences. It all begins with the insight that, as the size of a conference grows, so does its need for a powerful tool to navigate the event. IC2S2 2025 is such an event. The conference features many hundred participants that will visit Norrköping over the course of four days with 10 tutorials, 9 keynotes, hundreds of posters, and 290 accepted presentations sorted into 48 parallel sessions of 6 talks each across 8 rooms and 3 days. As you can see: the conference is <em>really</em> big, and as such having <em>no</em> agenda would be very bad. One would still be able to somehow navigate the participants through the keynotes, since those are scheduled sequentially. But how are participants supposed to decide on which of the eight concurrent sessions they want to visit? And doing so six times (two parallel slots times three days)?</p>
<p>A common sight at many large conferences is an interactive agenda which lets participants quickly see the entire schedule at a glance, search for keywords and sessions. Maybe, it even lets participants create their own, personal agenda so that they can already weed out things they are not interested in before the conference even starts.</p>
<p>While small conferences and workshops usually do not have an interactive agenda, and do fine, this doesn’t work here. And indeed, IC2S2 is a great showcase for the various approaches organizers can choose to follow:</p>
<ul>
<li><a href="https://boothuchicagocaai.wixsite.com/website-2">IC2S2 2022</a> at the University of Chicago decided to go with a <a href="https://bee8d65b-f2d5-4224-83f6-df475c333f5b.filesusr.com/ugd/3d8e26_1394ecaa666e4d6d8d8199fe28d17746.pdf">simple, non-interactive PDF program</a>.</li>
<li><a href="https://www.ic2s2-2023.org/">IC2S2 2023</a> at the University of Copenhagen implemented a <a href="https://ic2s2-2023.org/program_interactive">fully custom web app for the interactive agenda</a>.</li>
<li><a href="https://ic2s2-2024.org">IC2S2 2024</a> at the University of Philadelphia opted for an <a href="https://ic2s2-2024.org/schedule">Excel solution</a>.</li>
</ul>
<p>Now, all of these solutions have benefits and drawbacks. The most traditional solution is certainly the PDF program created by the organizers of IC2S2 2022. It is easy to understand, and the obvious choice in case someone decides to distribute printed programs. However, as you can imagine, this approach only gets you this far. IC2S2 2022 was significantly smaller than IC2S2 2025 will be. But even then, the program has almost 50 pages, and it contains no abstracts. So either the authors did a great job at conveying their idea in the title, or participants are out of luck if they want some help deciding. This is certainly not perfect.</p>
<p>The next-best job was done by the organizers of IC2S2 2024. Using embedded Excel spreadsheets is certainly a … choice, but it does convey the sense of which sessions are in parallel much better than a purely sequential PDF file. I do enjoy this solution somewhat, because it really smashes the ugly reality that most of the world runs on Excel into your face unfiltered. But obviously, it comes with drawbacks: Navigating those sheets can be cumbersome, and they are lacking obvious convenience functions such as a search function. Also, while Excel is powerful, free-form cells only get you this far when it comes to time-based grids, even though Excel assumes anything to be a date.</p>
<p>The — in my humble opinion — best solution thus far has been implemented for IC2S2 2023, courtesy of the amazing Laura Alessandretti. She<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> has built a fully custom solution to actually deliver a web app designed to present the full program of the conference for maximum convenience. IC2S2 2023 was also the first IC2S2 that I attended, and as such, I was very pleased with the ability to check the program both before and during the conference. There was only one problem: It was a hard-coded solution. At least I could not find any library in there, and the corresponding JavaScript code looked very custom. This means that it only worked for IC2S2 2023, and only for the very … peculiar data format they chose. I commend the organizers for taking the time to develop an interactive agenda, because it clearly pays off. But I can also fully understand many other organizers opting for less ideal solutions.</p>
<p>Because here’s the main problem conference organizers face: It’s a small team of highly motivated but essentially volunteering individuals against a flood of equally motivated researchers who all want a spot in the program. Every minute of work time is reserved months in advance, and so it is crucial to shave off as much overhead as possible. The less work, the better. Organizing a conference has to be very efficient. This is one of the central lessons I learned during my time in the conference chair team.</p>
<p>The problem, in short, is that conference organizers have little time, and there are no good one-size-fits-all solutions that organizers can adopt for offering an interactive agenda to their participants that does not add an unwieldy burden on the organizers.</p>
<h2>Requirements: Identify what needs to be done</h2>
<p>After having identified the problem — no reusable framework for displaying interactive agendas —, the next step involves the requirements the proposed solution has to meet. Essentially, it has to balance two contradicting requirements:</p>
<ol>
<li>The <strong>organizers</strong> need to spend as little time as possible on the program creation, as there are many other tasks. Any solution for an <em>interactive</em> agenda therefore must add as little overhead to the already overfilled to-do lists as possible. Otherwise, the “interactive” part of the agenda becomes a low priority, creating a detrimental experience for everyone.</li>
<li>The <strong>participants</strong>, on the other hand, have a vested interest in such an interactive agenda, because without it, navigating a large conference is a pain. I speak from experience here. My first large conference was the DGS-conference (German Sociological Association) 2016 in Bamberg. The available app was very thinly advertised, so I stuck to the <a href="https://kongress2016.soziologie.de/fileadmin/user_upload/DGS_Redaktion_BE_FM/Kongresse/Kongress_2016/dgs2016_Bamberg_Hauptprogramm-Download.pdf">unwieldy PDF program</a>.</li>
</ol>
<p>Any solution to an interactive agenda, in other words, needs to balance these two needs. The obvious solution was something that slots right into what organizers need to do <em>anyhow</em> (creating some big Excel spreadsheet), and transform this into an interactive agenda <em>automagically</em>. After some thinking (and decent experience of both good and bad agendas), I came up with the following requirements that translate the two needs identified above into actionable points:</p>
<ul>
<li>Every program planning process will result in an Excel spreadsheet, with one event per line, usually sorted into keynotes, parallel sessions, and special events (e.g., conference dinners). The solution must essentially work on top of one of these spreadsheets.</li>
<li>It should work without any big IT knowledge. It should be possible for a researcher in the arts and humanities to implement the solution with a little bit of googling and fiddling. It should not require IT expertise.</li>
<li>It should work with any type of program layout, that is, both conferences with entirely linear programs, parallel sessions, or (most likely) both. It should accommodate whichever assortment of events there are. It should work for any conference there is.</li>
<li>It should work on both desktop and mobile, since participants will likely use it on their computer (before) and phones (during the conference).</li>
<li>It should not require any additional software, since one-time use apps are generally not nice for participants. But every phone and computer has a browser installed.</li>
<li>Changes to the schedule should be easy to make.</li>
<li>The tool should have zero additional dependencies. It should be a single, minified JavaScript file that can be easily included by the conference organizers.</li>
</ul>
<p>This should cover what the app should do. But it’s a lot. It requires a lot of the tool to be this flexible and work for any type of conference. However, we luckily do not live in a vacuum, and so we can make some assumptions that we can work with.</p>
<h2>Assumptions: What we can work with</h2>
<p>If we would not make any assumptions, anything short of a literal magic box would take years to be developed. And, indeed, there are some proprietary companies that offer conference program scheduling. Those products usually do not assume anything and will work with only an Excel spreadsheet. The problem? They do assume one crucial thing, and that is money. And anyone who has visited an academic conference at least once in their life knows that this is an optimistic assumption to make.</p>
<p>So, what <em>can</em> we assume, given the constraints? Thinking a bit about the way conferences are organized (again, this requires some participation and behind-the-scenes knowledge of conferences), we can make a few assumptions:</p>
<ul>
<li>The conference will have a web presence. Or, if it doesn’t have a dedicated website, such as IC2S2, the university typically has some website on which conference organizers can get a small sub-page for their conference. Worst case, GitHub Pages is becoming more common among academics of any couleur.</li>
<li>The organizers will utilize Excel (or an alternative) to produce the schedule.</li>
<li>On each conference, there are only so many types of events, so we can hard code those.</li>
<li>There will be at least one person with a somewhat comfortable handling of code such as HTML or JavaScript. So the app can require a bit of code (as long as it’s reasonable and well-documented).</li>
<li>As long as it is a one-time effort, we can assume the organizers to be happy to spend <em>a bit</em> of additional time implementing the solution, so it does not have to be a completely drag-and-drop solution.</li>
</ul>
<p>This now concludes the preparatory part of the planning. Next is deciding on the actual implementation.</p>
<h2>Implementation: The Tech Stack</h2>
<blockquote>
<p>If you’re less interested in the technicalities of the app, you can skip to the last section: <a href="#the-test-case-ic2s2-2025">The Test Case: IC2S2 2025</a>.</p>
</blockquote>
<p>For implementing Conferia, I chose to go with a somewhat modern, relatively basic tech stack. I wanted to keep the solution small (because bandwidth is still an issue in many places), and also not overcomplicate it. Here’s what and why I chose:</p>
<ul>
<li>Classic HTML/CSS/JavaScript stack: Since I don’t want any complex app solutions, it should run on any device. And the only technology that runs on any device is a simple web app, because every device has a browser.</li>
<li>TypeScript: For the actual language, I chose TypeScript. I could have done this all in vanilla JavaScript, but I have learned over the past years that the added type safety is so much more efficient, because it completely obliterates the need to debug which types you are shoving around in the app.</li>
<li>Rollup.js for bundling: This is a somewhat odd (for me) choice, because I’ve been socialized to the web with Webpack. However, Webpack is kind of a destroyer type ship, while this project requires more like a light littoral cruiser. Also, Marijn Haverbeke (CodeMirror) speaks very fondly of it, so I decided to give it a go. It’s an insanely light-weight solution to the problem.</li>
</ul>
<p>That’s it. If you’re somewhat familiar with web development you may wonder about the many omissions. So here’s what I decided to omit, and why:</p>
<ul>
<li>No framework; no Vue/React/Angular. This is because the app should be very simple. There’s not much state-management involved, and the little that is I can handle with a bit of JavaScript myself. This is the biggest size-saving measure of the entire endeavor.</li>
<li>No CSS/SASS/UI framework. The app should live off of the styles present on the site, and otherwise come with a very limited default design. Simplicity is king, because there is so much information already flying around.</li>
<li>No Papaparse or CSV framework to speak of. This was less a decision I made willingly, but merely due to the fact that Rollup unfortunately can’t properly bundle Papaparse, and CSV loading frameworks in the JavaScript ecosystem are surprisingly rare. And I wanted it to have zero dependencies.</li>
</ul>
<h2>Implementation: How it Works (Technically)</h2>
<p>Finally, how it works. I will introduce to you only how it works <em>technically</em>, because the actual documentation on how you can use it for your conference <a href="https://nathanlesage.github.io/conferia/">is well described here</a>.</p>
<p>Before I start, let me once again acknowledge two people that have inspired this work a lot, and which have ensured that I didn’t have to re-invent the wheel: Laura Alessandretti, who has created the IC2S2 2023 schedule, and Carl Nordlund (IAS), who has also created an interactive schedule, but for the NSA Conference 2024 (Nordic Sociological Association, not what you’re thinking of).</p>
<p>So let’s dig into the code. The main entry is the big <code>Conferia</code> class. It handles data loading, DOM setup, and rendering. One initializes Conferia simply by calling <code>new Conferia(/* options */)</code>. That’s it. Everything else is handled by the library. What it will do first is generate its DOM structure (using the various helpers in the <code>dom</code> folder) and mount it into the document. Then, it adds a few event listeners that handle interactions with the tool. Then, it immediately begins loading the CSV file.</p>
<p>The CSV parser is, as mentioned, not some proven library, but instead a very rudimentary parser that probably breaks with grossly malformed CSV files. It takes in a string of CSV file data, and returns a set of events (<code>CSVRecord</code>) that contain all events part of the conference. It happily accepts CSV files with any number of columns, but it does require a set of columns that I have identified to be absolutely necessary:</p>
<ul>
<li><code>date_start</code> and <code>date_end</code> should contain ISO 8601-compatible time stamps that denote the start and end of the event.</li>
<li><code>type</code>: Can be one of <code>session_presentation</code> (for individual presentations), <code>keynote</code>, <code>meta</code>, <code>single</code>, and <code>special</code>. Most of these events are actually interchangeable, and because these are mere semantics for the library, nothing prevents you from defining all your keynotes as simple “single” events, or coffee breaks as “keynotes.” (However, in some instances, these events do matter, such as the detail dialogs.)</li>
<li><code>title</code>, <code>abstract</code>, <code>author</code>, <code>location</code>, and <code>chair</code> contain exactly what the names say – the various pieces of information for the events.</li>
<li><code>session</code> and <code>session_order</code> are required for session presentations. They identify the session several presentations are part of, and their order. The CSV parser uses this information to group all presentations into a single session event that share the same session name.</li>
</ul>
<p>Additional properties are fine, and can be used to modify the simple parsed records that the library produces using the <code>rowParser</code> utility function. Also, it turns out that for both Excel and Google Spreadsheets, ISO 8601 seem to be difficult, so I added a <code>dateParser</code> function that allows you to quickly transform all the date columns into something that the date library — luxon — can work with. I figured this is simpler than manually correcting all the errors Google made to my spreadsheets.</p>
<p>Armed with this loaded data, the main class then proceeds to render the entire UI in one humongous <code>updateUI</code> function. This function runs every time there has been a change to the internal state. There are not much of those – adding and removing items from the personal agenda, toggling between viewing all events or just the personal agenda, and filtering.</p>
<p>Whenever the function is called, it will determine the date and time range of the currently visible events (first and last day, earliest and latest time) which define the x- and y-axes of the time grid. It will then calculate a bunch of additional information, such as the amount of pixels per second based on some configuration values and what the actual records look like. Also, it calculates room conflicts, that is, which rooms need to be displayed in their own column for each day. (Rooms with no conflicts are always displayed column-spanning.)</p>
<p>Next, the function updates the time and day gutters at the top and left side of the app, as well as the background grid. The latter is simply a repeating CSS gradient pattern, because adding all those grid lines as individual elements would’ve transformed especially older phones into stove tops.</p>
<p>After that is all done, it will actually display the events. Using the various pieces of information, it calculates the left and top offset for each event from the origin in the top-left. The left offset, as well as the width of the event, are dependent on whether it is a conflicting event (= there need to be other events displayed next to it). Then, it applies a height based on the duration of the event and the calculated pixels per second. At the very end, it adds a set of day dividers so that it is easier to distinguish between parallel events on the same day, and the next day.</p>
<p>Now, there are some performance penalties because the entire DOM is being re-rendered all the time. One could now argue that this is precisely the reason for which there are frameworks such as Vue and React, but I have figured out that since the library is very targeted in its functionality, it is an acceptable overhead. And indeed, on my old smartphone, the entire schedule still rendered appropriately fast. I will get to optimizing this if necessary.</p>
<p>One final aspect I haven’t yet gotten to is another class that manages a different part of the state: The Agenda. This is basically just a rudimentary store that reads the contents of the local storage of the browser upon page load, and stores personal agenda items in it. This means that users can add and remove events from their personal schedules and rest assured that – as long as they don’t clear their browser data – the information will remain across page loads and even computer restarts.</p>
<p>However, this doesn’t help once conference participants leave their computer and actually go to the conference and view the schedule on their phones. This is where a utility function comes in, that lives in <code>util/ical.ts</code>. This is a function that generates valid iCalendar files based on the schedule. That allows participants to first create their personal schedule, and then add it to their calendars. I didn’t use a library for this because, again, the iCalendar format is actually quite simple, and this has a very specific use-case. I implemented the bare minimum of <a href="https://www.ietf.org/rfc/rfc2445.txt">RFC 2445</a> and <a href="https://datatracker.ietf.org/doc/html/rfc5545">RFC 5545</a>, and called it a day. And it works very well.</p>
<h2>Some Notes: Caveats and Oddities</h2>
<p>As with most projects, I felt that I learned a lot while implementing this tool. For example, I once again improved my knowledge of timezones, because for conferences, this actually becomes a problem. See, oftentimes, we researchers travel across many timezones to visit conferences across the world. This then means that the dates we see in our local calendars are actually at a different time from the local one of the conference venue. The interesting part here is that the conference schedule itself should <em>always</em> display the local times, even for participants in a different timezone, because this is what will be relevant to the participants once on location. However, if they decide to add the events to their calendar, they need to have them in their correct timezone, because the events are relative to all other events they store.</p>
<p>Another fun part to figure out is that, even though the iCalendar format is relatively simple, it’s not when it comes to timezones. Effectively, you have two options: Either provide all events in UTC, or have fun manually defining all timezones you need. I absolutely opted for the first option, because there was no way I would actually enumerate all existing timezones, as we all know from Tom Scott that this is futile. This ended up going against the preferred practices of luxon, the date time library, resulting in me commenting the conversion function with “Do violence to ISO 8601 datetimes.”</p>
<p>Speaking of datetimes: It was very interesting to implement the various helper functions to manage the needs of the library, and think about how that actually works. For some, such as getting the earliest day or the latest day, there were built-in helper functions, but for others, such as <code>getShortestInterval</code>, I had to come up with sometimes odd contraptions. Also, it turns out that luxon is unable to perform calculations solely on times alone, it always requires a date. On the upshot, luxon does a good job at string-representations, since I was able to use built-in JavaScript operators such as <code>&gt;</code> and <code>&lt;</code> to compare datetimes to determine if an event has a conflict.</p>
<p>Aside from timezone difficulties, the iCalendar format really showed me how old it actually is. RFC 2445 is from 1998, and its update, RFC 5545 from 2009 sticks to most of the old conventions. While trying to get the utility function to spit out valid ical files, I discovered some … interesting choices. First, they recommend never exceeding 75 octets. If a line gets more than 75 octets long, you should employ what they call “line folding,” which essentially means to split up the line, but prepend each following line with a space character to indicate to the parser that that is actually part of the previous line. What is an octet, you ask? Well, it’s a byte. In 1998, this usually corresponded to a single character, but in times of UTF-8 this is no longer true. An emoji, for example, is often two bytes (octets), sometimes longer. And with grapheme clusters, this gets even more difficult. So I decided to roughly stick to 70 characters (after all, it was a suggestion, not a hard requirement) and do the little silly dance for them. To wrap up the iCalendar oddities, line breaks in summaries were possible, but had to consist of the <em>literal</em> characters “\“ and “n,” and not, as I had thought, newline characters. Finally, lines required to be delimited by CRLF (carriage return, line feed), instead of any valid line separator. Now that I think about it, I may write an article about this, too!</p>
<p>Lastly, some notes on the DOM part. Since I decided against using a framework for the DOM handling, I had to rely on the built-in methods such as <code>document.createElement</code> to handle everything. This means that the DOM code is actually the most verbose part of the code base, but at the same time I am very happy with the progress that has been made on semantic elements in the past years. For example, the event detail modals are implemented using <code>dialog</code> elements which essentially come with a lot of JavaScript functionality already attached, so it was relatively benign. The only remaining issue I have with the DOM part is that state and DOM is somewhat separated, and it showed me very vividly why modern frameworks such as Vue and React exist: Properly managing state and DOM updates is indeed hard, and I ended up spreading state management a bit too far across the code base. I will be fixing this at some point, but this goes to show how valuable frameworks are in keeping our brains sane.</p>
<h2>The Test Case: IC2S2 2025</h2>
<p>Let me finish this article with a few words on the first test case for which I have developed Conferia.js: IC2S2 2025 in Norrköping. It turns out that I made the right decision in developing the tool. First, I have never explicitly communicated what format I want the schedule in, I only requested a schedule so that I can turn it into my data. And, lo and behold, it came to me in the form of a large Excel spreadsheet. Especially nice was that all sessions were already properly ordered. This means that within about a single day (less if the OpenReview API hadn’t been such a chore to work with) I could take a large Excel spreadsheet with almost 300 presentations and turn it into the required data format for Conferia. Also, it turns out that whatever I did with the CSV parser, it worked flawlessly in parsing the CSV file with no additional work on my side.</p>
<p>Adding the tool into the website was just as simple and easy as I have expected it to be, and all in all, the biggest time-consuming factor of the conference schedule creation was simply the creation of the schedule itself. The interactive agenda was up and running from zero to hero in about a single work day — something I believe many conference organizers would be willing to spend for a proper agenda. In addition, updating the conference agenda is extremely straight forward: Edit the Google Spreadsheet, re-download the CSV file from it, and upload that one to the website.</p>
<p>So, in case you’re in the hot seat of a conference at some point, I would be happy if I could convince you that this tool might be worth a shot! All you need to know is outlined <a href="https://nathanlesage.github.io/conferia/organizers-guide/">in the organizer’s guide on the manual</a>. If you end up using it, please send me a message!</p>
<p>Until then, happy conferencing!</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>My advisor would like to let you know that he thinks that this was an overcommitment, and the more time passes, the more I agree. But I digress.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>After scouring the website’s source code, I have concluded that she seems to be the author, since a few files are hosted on her website, but I cannot be entirely certain.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>The Illusion of Thinking</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/the-illusion-of-thinking" />
  <id>https://www.hendrik-erz.de/post/the-illusion-of-thinking</id>
  <published>2025-06-14T13:00:00+00:00</published>
  <updated>2025-06-14T14:23:24+00:00</updated>
  <summary type="html"><![CDATA[Apple has just released a paper which claims that &quot;reasoning&quot; chatbot models are not much better than their non-reasoning counterparts, and AI apologetics are fuming. While the Apple paper certainly is lacking in terms of quality, I believe the researchers are making an important and valid point. In this article, I explain why generative AI fundamentally cannot reason and that mistaking next-word-prediction for thinking abilities is a dangerous fallacy.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/the-illusion-of-thinking">
    <![CDATA[<p>A few days ago, Apple made the headlines <a href="https://arxiv.org/abs/2506.06941">with a new paper</a> titled “The Illusion of Thinking.” In it, a team of Apple researchers essentially drag-raced “reasoning” generative AI models against non-reasoning models and found that “reasoning” doesn’t necessarily make a model better at generating better responses. Immediately, observers commented on the paper. However, while this article started off as a piece on how some criticisms are valid, while others are less so, the more I thought about the critique of the paper, the more I realized how insane the criticisms actually are.</p>
<p>The most prominently discussed criticisms are <a href="https://www.seangoedecke.com/illusion-of-thinking/">Sean Goedecke’s blog post</a>, and <a href="https://arxiv.org/abs/2506.09250">Alex Lawson’s arXiv comment</a>. I will discuss the process of “reasoning” in generative AI, Apple’s methodology and findings, and those comments in turn.</p>
<p>I personally believe that Apple is right, even if it might be for the wrong reasons. In this article, I wish to dive into what I believe the point they make is, and what this means for the usage of generative AI more generally. I also address the critique of both Goedecke and Lawson, and discuss why I think they very much miss the point. But there’s a lot to unpack, so let’s start with a brief description of what reasoning is and why it matters.</p>
<h2>What is Reasoning?</h2>
<p>“Reasoning” has two meanings. For us humans, it means the faculty of being able to reason about problems and come up with solutions. As the Wikipedia <a href="https://en.wikipedia.org/wiki/Reason">has aptly put it</a>, “Reason is the capacity of consciously applying logic by drawing valid conclusions from new or existing information, with the aim of seeking the truth.” The second meaning of the term “reasoning” has been driven by AI companies such as OpenAI. This meaning says that a chatbot model will – before generating its actual answer – generate a form of “text block” in which it attempts to reformulate the problem in simpler terms and approach a solution step-by-step.</p>
<p>However, there is a fine but crucial distinction between human reasoning and generative AI reasoning: reasoning for a chatbot consists of … literally doing the same it has done before — generating text. Essentially, what AI companies call “reasoning” is nothing but making the model generate a few additional paragraphs of text before actually generating an answer.</p>
<p>And this has nothing to do with reasoning itself. Because reasoning is not inherently … <em>textual</em>.</p>
<p>When a human reasons, we think about a problem and come up with a solution. However, not all problems are math text questions. In fact, most issues we face on a daily basis concern skills of seeing a bigger picture in, say, a project with a lot of code. Oftentimes, we do not reason with text, but with diagrams. In fact, I survived my entire Master program with a whiteboard that I used to draw my problem on, until a potential solution became apparent, based simply on the structure of the problem. Many humans are visual thinkers (at least I am, I haven’t run a study), and as such, we are better at problem-solving when we can visualize something than if not. I hope you get the point.</p>
<p>Generative AI models cannot reason visually. They can only reason by blabbering for a few minutes. And this is precisely what the Apple paper shows. Because puzzles like the Tower of Hanoi can be easier solved by visually experimenting, especially if you don’t know the exact algorithm that would solve it.</p>
<h2>“Reasoning” and the Marketing Hype</h2>
<p>However, many people remain convinced that generate AI models do in fact “reason.” AI companies are good at marketing. <a href="https://platform.openai.com/docs/guides/reasoning?api-mode=chat">As OpenAI writes</a>, “Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. Reasoning models excel in complex problem solving, coding, scientific reasoning, and multi-step planning for agentic workflows.”</p>
<p>So, essentially, they claim that those models excel in four distinct categories: (1) complex problem-solving; (2) coding; (3) scientific reasoning; and (4) “multistep planning for agentic workflows” (whatever the hell that last point means).</p>
<p>Goedecke mentions in his blog post that “Puzzles aren’t a good example” to test reasoning abilities, because “reasoning models have been deliberately trained on math and coding, not on puzzles.” Oh, have they? Then why would OpenAI claim they excel not just in coding and scientific reasoning, but also in “complex problem-solving”? Isn’t a puzzle something you can only solve using problem-solving?</p>
<p>Furthermore, aside of the whole marketing fuzz: If they can reason, they should be able to solve a Tower of Hanoi puzzle, shouldn’t they? If “reasoning” only applies to coding and math, then these models are not reasoning, because reasoning is a concept that exceeds math or coding. If we take AI companies by their word that their models can “reason,” they should be able to do so cross-domain.</p>
<h2>“Puzzles are bad examples”</h2>
<p>The very same point is also being made <a href="https://arxiv.org/pdf/2506.09250">by Lawson</a>. He states that “To test whether the failures reflect reasoning limitations or format constraints, we conducted preliminary testing of the same models on Tower of Hanoi N = 15 using a different representation.” What is this “different representation”? Well, it’s “write a Lua function.” And that apparently worked. Okay, so you have proven that models can write code. But have you proven that they can reason?</p>
<p>This is my major issue with the argument that “puzzles are not a good example.” I disagree and think that both Goedecke’s and Lawson’s comments show that the Hanoi experiment was an even better example than I initially thought. What happened here is simply that the models could not <em>explain</em> the solution of Tower of Hanoi, but they could quickly give you a function that solves this. I wonder why that is? Maybe because there <a href="https://rosettacode.org/wiki/Towers_of_Hanoi">literally is a website dedicated to providing algorithms for the same problems in all imaginable programming languages under the sun</a>, with the implication being that the <em>code</em> for this problem was clearly part of the models’ training data, while the <em>explanation</em> of the solution was not?</p>
<p>You see where I am getting here: You likely get good solutions in terms of code, because that has been solved over and over, but making the model explain a solution clearly demonstrates that they, indeed, cannot reason.</p>
<p>In this light, one statement of Goedecke is especially atrocious: “It’s possible that puzzles are a fair proxy for reasoning skills, but it’s also possible that they aren’t.”</p>
<p>This is a borderline unscientific statement. It is not <em>necessary</em> for puzzles to be a “fair” proxy for reasoning skills. It is <em>sufficient</em> that they <em>do</em> require reasoning skills. And, as we say in science, a hypothesis can never be fully proven true, it can only be proven wrong. And if you find that your hypothesis does not hold in one case, you can say that it needs to be rejected in general.</p>
<p>I do grant Goedecke that I would also have liked Apple to include other reasoning skills in their testing — not the least to prevent this counter-argument, but also to make their point stronger.</p>
<p>In other words, all Apple had to do is find some task that requires reasoning, and prove that the models aren’t generally better if they reason than if they don’t. And I believe, they did this. But this should not be the point; the point is rather that a computer – by definition – cannot reason. Which brings me to the actual important point Apple makes.</p>
<h2>GPT Cannot Reason</h2>
<p>It is the old adage: Some magic box spits out intelligible text, and humans are immediately prone to accrue sentience to it. However, just because the magic box gave you the correct answer to your homework, or wrote your essay for you, doesn’t mean it is smarter than a pea.</p>
<p>Let us think a bit for what is actually happening when a Large Language Model “reasons.” Essentially, in this instance, the model makes more of what it always does: generate more text. The intuition behind “reasoning” in AI is that the model uses the entire existing previous text in a chat session (the “context”) to generate its next token. In turn, this means that each individual word can influence what the model will deem “probable” to be the next word.</p>
<p>By training the model on examples of “dissecting” problems with simpler language first, its creators hope to nudge the internal “hidden state” of the model into a direction that makes it more probable for the model to solve the question. However, and this is a crucial point: The reasoning steps that the dataset engineers provide during model training essentially tell the model “if you see a question like this, generate that code first, and only then generate the answer.”</p>
<p>In fact, you can make any model reason by adding such a “reasoning” stage yourself after your question. Just write your question, and then spell out some thinking steps, before letting the model generate an answer. It might then be better at providing you a solution. But then, obviously, why would you need to chat with a model if you yourself do all the work of reasoning yourself?</p>
<p>And this is precisely the issue that Apple identifies. There is some fundamental limit to how good chatbots can ever become using the known architecture of the Generative Pretrained Transformer. Indeed, a GPT is essentially just a decapitated translation model that got its encoder stage removed. Instead of providing it with some matrix that encodes some meaning, they tell it to endlessly generate new tokens (this is sometimes called “autoregressive”). In other words: There is never <em>new</em> information entering the decoder stage. Essentially, during reasoning, the GPT model “cooks in its own broth.”</p>
<p>After all, Large Language Models are <a href="https://image-journal.de/the-new-value-of-the-archive/">(similar to Diffusion models, see Meyer 2023)</a> simply a ZIP archive of online text. It is not a reasoning machine, but instead it is an archive that you query by formulating questions. By implication, this means that prompt engineering is essentially a postmodern form of archeology. Prompt “engineers” merely attempt to unearth the tokens in the original training data that make a model generate text akin to the answer the prompter is searching for.</p>
<h2>Next-Word-Prediction is not a Substitute for Reasoning</h2>
<p>This brings me to a broader point that I believe neither the commenters nor the Apple researchers see: next-word-prediction is not a substitute for reasoning. And this is the actual core of the entire debate: When we talk about “reasoning” generative AI models, we are taking a phenomenon we observe and expect that it equals the process we assume, even though there is no sane argument to assume this is actually how it works. Essentially, we observe some statistical model generate word after word, and then look at the end result and deduce “this is a reasoning process!” OpenAI, Goedecke, Lawson, and even the Apple team all never ask whether what we call “reasoning” is actually reasoning, or if the results we see are generated by an entirely different process.</p>
<p>It is a very philosophical problem described by Searle’s “<a href="https://en.wikipedia.org/wiki/Chinese_room">Chinese Room</a>.” If we read some text which looks like a human’s reasoning process, was it actually reasoning that has generated the text? This is the issue: Every language model is trained on next-word-prediction. It takes all input text, and then asks one simple question: Which word is the most likely next one? From its weights and parameters, it creates a probability distribution in which a small set of words are very likely, while most others are unlikely. From those very likely words, it then randomly chooses one (a process that can be controlled with the “temperature” setting). However, every philosopher will be able to tell you that, at no point in this entire process was there any actual thinking involved.</p>
<p>You can even experiment with this yourself. If you set the “temperature” of a model to zero, it will always select the most probable next word, resulting both in perfect reproducibility (the same prompt always results in the same answer) and quite literal dataset reproduction. If you set it very high, however, the text it produces will become an incomprehensible word-salad that will make an insane person sound logical.</p>
<p>In short: The process that generates data in a large language model is one of “which words are most likely based on my training data?” while reasoning involves thinking, and only then producing text. In fact, most reasoning processes do not produce any text at all, since we are oftentimes not asked to explain what we are doing when we solve problems (or do you talk to yourself while you solve a Rubik’s Cube?). Maybe that’s one of the primary reasons why “reasoning” models are so bad: We often do not document our thought processes. The important takeaway, however, is simple: LLMs produce text based on next-word-prediction, and next-word-prediction is simply not a substitute for reasoning.</p>
<h2>Final Thoughts</h2>
<p>Before I end, let me say an additional thing on Alex Lawson’s critique on the paper that merit some comments. While Goedecke makes some arguably bad points, but still tries to understand and contextualize Apple’s paper, Lawson goes beyond that. And not just because he actually lists the Claude model as a coauthor (something which is completely ridiculous and for good reasons disallowed by many journals), but he even acknowledges Google’s and OpenAI’s models. Not the companies, but the models they have produced. It’s as if I would acknowledge the help of LanguageTool for checking my grammar in every article. But there is a bigger issue aside of this.</p>
<p>Lawson argues that the issue with Apple lies with their experimental setup. Among other things, he claims that generative AI is “aware of its token limitations.” Citing a tweet (!), he claims: “A critical observation overlooked in the original study: models actively recognize when they approach output limits.” Excuse me, what? How can a model that generates numbers filling up a limited array of, say, 50,000 elements, have access to the length of this array? From a computing perspective, this is just an insane statement to make. It is as if someone has seen the jokes on programming forums of people pretending they killed a chatbot by making it “execute <code>rm -rf /</code>” and taking them seriously.</p>
<p>Chatbots are absolutely trained with examples that include a pattern of “if something repeats, you don’t have to verbosely repeat that explicitly” and that is understandable. But that has literally nothing to do with the models “recognizing” their output limits. Because that is computationally impossible. (One of the things lies in the programming code, the other in the data. No generative model has access to source code.)</p>
<p>I get that many people are frustrated with Apple in terms of AI. They did, indeed, over-promise. Siri is still as dumb as a toast, and many of the capabilities of AI come to the iPhone merely by using ChatGPT as an intermediary. This is certainly not great.</p>
<p>But I do believe that the AI capabilities Apple <em>did</em> add thus far are solid. I believe the reason why many people are disappointed with it is that it’s so … little. AI on other devices can do many more things, and indeed, Android (especially Samsung and Google) are far ahead in their adoption of AI. But I also believe that Apple’s cautious approach – while unsexy – is sound. By only adding what actually has been proven to work, they prevent people trusting their lives with these models. Because the adoption of generative AI ought to walk a thin line between making money and instilling in the consumers some sense of limitation in the possibilities of AI. The Illusion-paper is merely an extension of Apple’s design philosophy to research.</p>
<p>Now, Apple also just wants to make money, and I don’t believe the paper to be much worthwhile until it is actually published in peer-reviewed conference proceedings, hopefully including a better setup and more examples. But I do believe they make a valid point. And the reactions to this paper essentially highlight a worrying development: that people have started to buy-in to the companies’ marketing claims of models actually being able to “think,” and that people are losing their ability for critical thinking themselves. And this is not going to end well.</p>]]>
  </content>
</entry>
<entry>
  <title>Lying to your Research Subjects, and other avoidable ethical pitfalls</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/lying-to-your-research-subjects-and-other-avoidable-ethical-pitfalls" />
  <id>https://www.hendrik-erz.de/post/lying-to-your-research-subjects-and-other-avoidable-ethical-pitfalls</id>
  <published>2025-05-09T10:00:00+00:00</published>
  <updated>2025-05-10T10:15:55+00:00</updated>
  <summary type="html"><![CDATA[Two weeks ago, a scandal shook the research community. Researchers from the University of Zürich have conducted an experiment and drawn in the ire of an entire Reddit community. While much has already been said about the problems of this study in particular, I want to take today to reflect more broadly on the state of ethics in the research community. Because I believe we can, and should, do better.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/lying-to-your-research-subjects-and-other-avoidable-ethical-pitfalls">
    <![CDATA[<p>Today I want to talk about a recent case that you may or may not have heard of already. A team of researchers from the University of Zürich wondered how dangerous generative AI such as ChatGPT could become if wielded by the wrong people. They were asking themselves: “Can generative AI convince people of arbitrary viewpoints?” In and of itself an interesting research question. But the issue is not the research question in this case. No, the issue is the study design. Because these researchers decided that a lab experiment would not yield proper insights. Instead, they wanted to conduct the study “in the field,” that is, in a setting not defined by experimental conditions.</p>
<p>So here’s what they did: They chose the Subreddit <a href="https://www.reddit.com/r/changemyview">r/ChangeMyView</a>, which lets users post a view they hold, and let comments try to convince them of another viewpoint. They inserted themselves into the discussion, but instead of trying out various argumentation techniques, they simply let an LLM generate a response with some predefined view, and copy-pasted that generated text into their comment. Their IRB <a href="https://www.nau.ch/news/schweiz/universitat-zurich-forscher-fuhrten-heimliches-ki-experiment-durch-66977975">indicated some concerns with the study design</a> (German source), but they proceeded to do it anyway. Once the study appeared as a preprint, users of r/ChangeMyView saw it, and they were <em>furious</em>.</p>
<p>To get an impression of how bad the situation was, <a href="https://www.reddit.com/r/changemyview/comments/1k8b2hj/comment/mp4vgcm/">take a look at the official comment from the research team</a> explaining and defending their research to the users of that Subreddit. And then, take a look at the comments to that explanation. Here’s a small selection:</p>
<blockquote>
<p>The mods and the members of this Community affirmatively revoked consent to interact with AI comments.</p>
<p>“We decided that didn’t matter” is simply not good enough.</p>
<p>This is a gross breach of ethics.</p>
</blockquote>
<p>Or:</p>
<blockquote>
<blockquote>
<p>After completing data collection, we proactively reached out to the CMV moderators</p>
</blockquote>
<p>That's not what &quot;proactively&quot; means.</p>
</blockquote>
<p>And, finally:</p>
<blockquote>
<p>Please cite any other studies where researchers use psychological manipulation techniques on participants who did not consent.</p>
<p>You have confirmed that we now no longer know if these posts and comments are just bots or real people, which leads to the inevitable reverse, where real people facing difficult situations are dismissed as bots. It potentially destabilizes an extremely well moderated and effective community. That is real harm.</p>
<p>You say your study was guided by your so-called principles, including user safety. Frankly I think you are lying. You didn't give a damn about others to do this study, because if you had you would have easily followed the &quot;user safety&quot; principle to it's logical conclusion, given your choice of topics to have the LLM comment about.</p>
<p>How do you think a real user who was dealing with serious trauma from sexual assault would feel after finding comfort or agreement with your bot comments, now finding out that was fake. <em><strong>That is real harm.</strong></em></p>
<p>You even <a href="https://www.reddit.com/r/changemyview/s/osctsxfQPd">tried to convince users</a> that the current situation in the US isn't really a big deal, we should focus on other problems. That is political manipulation, and while I understand this is a small community when compared to the global population, this could impact voters. Done at the wrong time of year, that's foreign election interference, a crime.</p>
<p>I'll be reporting your paper on every platform that I see it published.</p>
<p>As a scientist myself, you should be ashamed.</p>
</blockquote>
<p>As you can see: The research team really did get burned. And I believe they had it coming. In the remainder of this post, I will talk about the fact that there are already studies that follow a very similar pattern, which resulted in similar outrage; what I think this shows about the approach of many researchers to ethics; and how we can safely navigate research ethics as a community.</p>
<p>One reason for why I am writing this post now, two weeks after the incident, is that over the past two weeks I have observed something concerning. Several people I’ve seen commenting online seemed very surprised by the amount of backlash. However, I don’t believe any of this is surprising, given the gross ethical negligence at play. But it concerns me that I regularly see other social scientists surprised about the harm that ethical misconduct can cause. It shows to me that we are still not in a place where ethics is taken as seriously as it should. I will provide some additional examples that show that sometimes, researchers can be surprisingly naïve when it comes to regulatory frameworks that apply to their research.</p>
<p>The bottom line is: While we researchers often can <em>feel</em> like we’re playing in a sandbox, the real world is not a sandbox. And if we don’t “eat our vegetables” (thanks to Jacob Habinek for providing me with this apt metaphor), then we can’t have fries either.</p>
<h2>An Almost Exact Predecessor: The “Hypocrite Commits” Case</h2>
<p>First and foremost, if you were surprised by this incident, you should know that it is not the first one of its kind. In fact, just a few years ago, a study has caused similar outrage that used, essentially, structurally the same study design. It is known as the <a href="https://linuxreviews.org/images/d/d9/OpenSourceInsecurity.pdf">“hypocrite commits” study</a>. Essentially, researchers were tricking real people – Linux Kernel maintainers – by sending them faulty patches, to check how vulnerable the Linux Kernel is to adversaries. Since it’s humans who review patches, <em>of course there is some form of vulnerability</em>. We aren’t machines, so the outcome of this study was entirely predictable: Some of the faulty patches made it through, and were only stopped by unblinding the research subjects. Which then caused backlash.</p>
<p>Essentially, the research team has achieved to get the entire University of Minnesota (not just their own team) banned from ever contributing to the Linux Kernel again. The community was outraged, because none of this was communicated in advance.</p>
<p>Except for the research question, the recent Reddit experiment followed exactly the same study design.</p>
<h2>No “Respect for Persons”</h2>
<p>“But how can we avoid this backlash?!” you may now ask. Essentially, if one doesn’t know much about ethics, it may now appear as if this could happen to everyone. But that would be very detrimental, because if we are scared of our research subjects, we won’t be doing much of the necessary research to understand human society. So how <em>can</em> we avoid becoming the target of such a backlash?</p>
<p>It’s actually very simple: Show “respect for persons.” That’s it.</p>
<p>Okay, it’s not <em>that</em> simple, and it requires an explanation. “Respect for persons” is the first ethical principle outlined in the <a href="https://en.wikipedia.org/wiki/Belmont_Report">Belmont Report</a>. The Belmont Report is one of <em>the</em> ethical guidelines published about 50 years ago, in 1978. If you are somewhat trained in social science methods, you will know the principle “respect for persons” as the principle of “informed consent.” Informed consent requires researchers to tell any research subject beforehand what they are doing, and why, and asking them whether they agree to participate in that study. It’s just one example of the broader principle of “respect for persons,” but an easy one to remember. Essentially, respect for persons means: Imagine you were a research subject in your own study — would you feel respected after you read the published paper? If you can’t answer this with a confident “yes,” this already indicates that there might be an issue.</p>
<p>And essentially this is what has happened to the hypocrite commits and CMV papers. If these researchers had reflected before committing to the paper, and thought “What would I think if I learned of this research, and I was a Linux Kernel member/CMV member?”, they would probably have already guessed that conducting a study without informing anyone in advance is a bad idea.</p>
<p>The CMV paper folks, however, had a reason for why they never informed their research subjects: They argued that conducting a lab experiment with informed consent would’ve tainted the experimental results. Essentially, they said, “we can’t get to the <em>real</em> data if we do all of that in a lab.” Which is simply wrong. No, I believe that them conducting this on Reddit had two other reasons. First, using Reddit makes the research catchier and easier to communicate, because a lot of people know Reddit, and it highlights how this could happen in the real world, not just in a lab. But second, I also suspect that they simply didn’t want to set up appropriate lab conditions. Because that takes time and money. Reddit already has a platform, they wouldn’t have to invent an experimental setup.</p>
<p>From this view, I think I even find the argumentation from the hypocrite commits team more believable: They argued they didn’t think about the fact that it was real humans who would read and parse their emails. It’s still a stretch, but it’s more believable than the argumentation of the CMV paper authors. Made worse by the fact that the CMV paper authors are social scientists, the hypocrite commits authors are computer scientists. The former should definitely have enjoyed better ethical training.</p>
<h2>Ethical Principles are not Hard</h2>
<p>This leads me to the next section. I strongly believe that both papers wouldn’t have experienced this backlash if the researchers took ethical principles seriously. Ethics is not actually that hard. Sure, it adds about a week of deep reflection to the research process, and time is money. But I believe that even a superficial knowledge of the Belmont and Menlo Reports (the <a href="https://en.wikipedia.org/wiki/Menlo_Report">Menlo Report</a> updated the Belmont Report in 2012 with a fourth ethical principle) would’ve captured many of the most egregious ethical violations. Here, let’s quickly walk through the four central ethical principles:</p>
<ol>
<li>
<strong>Respect for Persons</strong>: We already talked about that. Get informed consent where possible. When that is not possible, think about why. If you can come up with a very good reason for why you can’t get informed consent, think about “What would I think if I were a research subject in my own study?” If that thought seems fine for you, good!</li>
<li>
<strong>Beneficence</strong>: This one is a bit harder, but essentially it means: minimize risks, maximize benefits. Risks here is any harm that may be caused by your research. This again requires imagining yourselves as a research subject in your own study. What harm may you get? Also, not all risks are created equal. Some risks are perfectly fine to imbue on your research subjects. Others are much more detrimental. Sometimes it’s more important to minimize risks than maximizing benefits.</li>
<li>
<strong>Justice</strong>: This connects to the second principle. Whatever risks or benefits your study comes with, ensure that you distribute this equally on all parts of your study population. Simple example: If you are conducting a study among black and white Americans, ensure that it’s not just the black study population who bears all risk, and that it’s not just the white study population that reaps all the benefits.</li>
<li>
<strong>Respect for Law and Public Interest</strong>: This is the principle the Menlo Report added to the three of the Belmont Report. It contains three elements. First, don’t do illegal research. Second, be accountable for your research. Third, be transparent. I personally find two questions very apt for thinking about whether you conform to this principle (aside from, you know, literally reading the law and consulting a lawyer). First, would you be fine with defending your research in front of your research subjects? And second, would you be fine with having to defend it in court? If you answer both questions with a confident yes, you probably aren’t about to commit <em>egregious</em> ethical misconduct. If not, you absolutely need to rethink your study design.</li>
</ol>
<p>With a very basic understanding of these four principles, I believe you can avoid the most detrimental ethical issues you could face. If you truly think about these issues, and adapt your study design accordingly, I don’t believe any research subject would be very mad at you. A few are <em>always</em> mad, but at least I don’t think <em>your entire study population</em> would hate you for it.</p>
<p>That being said, didn’t the researchers have better options? Let’s think about this next.</p>
<h2>What They Could’ve Done Better</h2>
<p>First and foremost: I’m not an experimental sociologist. I have never conducted an experiment myself. So take the following with a grain of salt. But I did read many experimental studies, and I participated in quite some experiments myself.</p>
<p>I believe that both teams would’ve avoided getting roasted by their study population with one very simple change to their study design, without yielding unusable data. The first thing they should’ve done was to actually conduct an experiment <em>in a lab</em>. The reason for this is twofold. First, I do not believe that doing so would’ve invalidated the data in any way, and second, this would’ve made basically everything easier, and would have prevented the two papers from getting retracted.</p>
<p>The CMV paper authors have argued that a lab setting would’ve tainted the data. I believe that this is a lie. Think about it: The authors say that they wanted to know if people can get convinced by a machine. You can replicate this easily in a lab. Simply tell people: “We want you to tell us a belief you have, and be open to let yourself be convinced of another viewpoint. Then, we are going to show you a few counter-arguments, and you have to select how much you feel convinced by each one. Some of these may be AI generated, but most are human generated.” This way you retain the core of the research question (the participants do not know in advance which comments are AI generated), and get an unbiased estimand, you are transparent, and you even get informed consent. <em>And</em>, you would even get an actual proper random sample of the population, not just the heavily biased Reddit community. If you so wanted, you could also have replicated the CMV community setup specifically, and set up a very simple forum, where the fact that “there are some AI generated comments” is clearly communicated, but which still essentially replicates the CMV community. If you still believe that this wouldn’t be as “true” data as you want, then you should probably not adapt your study design, but your idea of man. The exact same holds true for the hypocrite commits paper.</p>
<p>Now, again, I am not an expert in experimental study design, but I don’t think that “If we don’t do it in the real world, we won’t get the correct data” is a valid argument to make. And so I fully side with the outraged study populations in both cases. Essentially, the argument of the researchers implies: “We don’t trust other human beings of taking us seriously, and so we need to lie to them in order to get correct responses. If we invited them into our lab, they would just deceive us.” And this is a very concerning idea of man.</p>
<h2>Other Instances of Naïvety</h2>
<p>Ethical and legal conundrums don’t stop there. Another thing that I observe frequently across the board are legal and ethical violations of a more benign form. Specifically, with the rise of generative AI, many social scientists are now using ChatGPT, Mistral, Claude, or Gemini in their studies. This very often involves sending some user data to these proprietary service providers. And there are already several studies that do that. What these researchers frequently do not realize is that any user-generated data is either personal data (and as such worthy of extra protection), <em>or plainly copyright-protected</em>. Every piece of text anybody anywhere on earth produces is, by default, copyrighted, meaning that <em>you have no right to send it to a third-party provider without consent</em>.</p>
<p>A common counter-argument is: “If these people participate in my experiment, they can be implied to have agreed to that.” No, it’s not. What they often just agree to is: “<em>You</em> can use my data for research purposes.” They didn’t agree to “<em>OpenAI</em> can have my data.” And this argument becomes even weaker now that we have very capable models that we can run <em>locally</em>. I doubt users would object to you <em>using</em> generative AI in your research as long as they have consented to <em>you</em> using the data. But they are right in objecting to you <em>sending the data to a third-party service provider</em> unless explicitly granted in the consent statement. They granted <em>you</em> the right to use the data for research, they didn’t grant you the right to send it to someone else. Especially since we’re in Europe, the legal boundaries are very clear, and very strict. And if you think it’s annoying, then again, I don’t think it’s a legal issue, it’s an issue with respect for persons.</p>
<p>And this issue isn’t even new. Ten years ago, when I first was unleashed on Bachelor students at my former university, the “TurnItIn” service was becoming popular as a quick way to check for plagiarism in student essays. This service works by collecting a vast database of text and basically doing an advanced form of substring matching. However, for that to actually work, it requires researchers to upload the student essays to the service. And TurnItIn tells very plainly that they would indefinitely store those essays. Many of my colleagues were using the service very liberally, simply uploading <em>all</em> student essays from a course. They were aware of the potential privacy issues, but you know what their solution was? <em>Simply strip away any personally identifying information</em>. Which, in some way, made the problem even worse. Removing author information ensured that no student could actually exercise their GDPR rights once the GDPR was finally enacted. All of these student essays are still part of this database. And even though the students can today exercise proper data protection rights, they actually can’t because their teacher has stripped all their authorship information from the PDF file before uploading.</p>
<p>You know how many plagiators we caught this way that we would not have caught based on other clues and experience with a close reading of the essays? Zero. Now, there is an argument to be made here that university teaching conditions are so bad by now that it is impossible for teachers to properly do all the grading manually. But I don’t know if violating students’ rights is the proper way to solve these issues.</p>
<h2>Concluding Thoughts</h2>
<p>I believe social scientists need much better training in the ethical and legal surroundings of our research in order to be able to properly understand what they can and cannot do. Just as law is not optional, ethical reflections aren’t either. Just because you are in a legal gray zone or your IRB has said “Whatever, go ahead” doesn’t mean you should. Our actions have consequences, and as such we must make sure we do due diligence in order to avoid these consequences to be detrimental to either us or our research subjects. We may be experts in our field, but we are not experts in human relations. And our research subjects deserve to be treated with respect, regardless of whether they are university professors or unemployed high-school dropouts.</p>
<p>The carelessness demonstrated by the two studies I discussed harms not just their specific research subjects (and, ultimately themselves). No, this carelessness harms research in general. If this becomes a pattern — research does something, subjects are outraged, lawsuits follow — this will erode trust in scientists. It will establish our reputation as people who think we’re something better than our research subjects; as people who treat other people as literal lab rats. And this will reduce the general populations’ willingness to engage with our research, <em>regardless</em> of whether we actually do our ethical due diligence, or not. This kind of behavior is simply not acceptable.</p>
<p>I expect every researcher to take the basic established ethical principles to heart, and act accordingly. You don’t have to catch <em>every</em> ethical conundrum. We all make mistakes, and I myself have done mistakes. We all can’t become ethical experts. But there is a very big range between “Don’t care” and “Due diligence.”</p>
<p>So: Read the Belmont and Menlo reports (it takes you one afternoon), think about them, and apply these principles from hereon. It’s really not that difficult.</p>
<blockquote>
<p>In a previous version of this article, I falsly claimed that the University of Zürich's IRB has granted a waiver for the research. These parts are now corrected. Thanks to Sebastian Gießler for pointing this out.</p>
</blockquote>]]>
  </content>
</entry>
<entry>
  <title>Is Markdown Taking Over?</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/is-markdown-taking-over" />
  <id>https://www.hendrik-erz.de/post/is-markdown-taking-over</id>
  <published>2025-04-28T09:00:00+00:00</published>
  <updated>2026-02-04T10:15:16+00:00</updated>
  <summary type="html"><![CDATA[A few weeks ago, an article titled &quot;Markdown and the slow fade of the formatting fetish&quot; has been published, and it has been suggested multiple times to me. Given the resonance it has generated with the community, I take the opportunity to reflect on Markdown and my personal stance on it.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/is-markdown-taking-over">
    <![CDATA[<p>Recently, someone shared <a href="https://ia.net/topics/markdown-and-the-slow-fade-of-the-formatting-fetish">an article</a> with me from the team of iA Writer. I have read it shortly after its initial publication at the end of March, but it has been re-suggested to me multiple times since then. It is titled “Markdown and the slow fade of the formatting fetish.” The article is pretty long, but can be summarized quickly. First, Markdown is gaining popularity, as indicated by the fact that more and more software we interact with daily support it (including WhatsApp, Discord, and generative AI chatbots). Second, opposed to Markdown are proprietary formats, and the article makes the case that Microsoft has been especially egregious in “locking in” users with Word. Third, switching fully to Markdown is the way forward.</p>
<p>Given that the article seems to resonate with readers, it seems worthwhile to pause and ponder on its premise, and how it portrays the evolution of Markdown over the past two decades. Naturally, the team of an application that makes money with a Markdown editor, can be assumed to be biased. I do share their sentiment that proprietary formats, and Microsoft Word especially, should be a thing of the past, and that Markdown offers undeniable benefits for professional writers.</p>
<p>However, I believe the article misses the broader societal developments that have happened over the past 20 years, and I do not share their optimism. Again, I do believe this to be caused by bias, but I also think it is great to reflect upon my own stance towards Markdown, too, now that I have been in the game for about a decade, too. In the following, I will make the case that Markdown’s continued success is inhibited by four blockers. I argue that, unless developers of Markdown software actively work towards better support for professional use-cases, standardization, and tooling support, Markdown will remain at the level of everyday usage, and is unlikely to increase its visibility going forward.</p>
<h2>The Premise: Professional Markdown in the Last Decade</h2>
<p>Before I start, let me add the disclaimer that I am naturally also biased, so take the following paragraphs with a grain of salt. As the main developer of <a href="https://www.zettlr.com">Zettlr</a>, yet another Markdown editor, I may not make money with promoting Markdown, but I probably wouldn’t continue developing such a Markdown editor if I did not believe in its potential. However, I did keep an open mind towards how users <em>actually</em> write. And I think this is the primary critique I have towards that article.</p>
<p>Back in 2017, when I started developing Zettlr, Markdown support was dire. WhatsApp had already introduced a mutilated form of Markdown syntax for some quick formatting in messages, and there were plenty of small, independent projects that utilized Markdown syntax. But nobody really knew what Markdown actually was. And nobody cared.</p>
<p>To me, Markdown offered two primary benefits over more complex Word processing formats. First, it separates layout from writing, which I identified as a big issue especially for me. I could spend hours re-layouting my essays before finally submitting them. Second, it is much simpler and more versatile. Since the format itself does not consist of zipped folders with many XML files,<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> it is much easier to work with Markdown documents at scale. Hundreds of reading notes turn from a daunting mess of individually encapsulated Word files into a batch-processable set of searchable plain-text data.</p>
<p>Isn’t that just … plain good?</p>
<h2>Markdown Misconceptions</h2>
<p>This leads me to a few things we have to talk about. I believe that especially in the Markdown-space, we have some misconceptions about the benefits and drawbacks of Markdown. These are (1) the universality of Markdown; (2) that all users care about how they write; (3) Markdown is suitable for professional needs; and (4) the ecosystem support for Markdown is qualitatively sufficient and simply needs to be scaled up.</p>
<h3>Markdown is not Universal</h3>
<p>The first misconception is that Markdown is universal. Surely, you can use Markdown syntax across a bazillion websites and apps. WhatsApp, Discord, Reddit, this blog, static websites, RStudio, Chatbots, GitHub — you name it. Markdown is everywhere. But it’s not universal.</p>
<p>The big problem with Markdown is that it has been developed to solve one use-case, and instead has been used to solve hundreds of completely different use-cases. Mind you, Aaron Schwartz and John Gruber have developed Markdown as a solution to writing proper <em>emails</em>. Which email program do you know that actually allows you to use Markdown to compose emails? Likely no one. Most only give you the choice of plain text or rich text. Apple Mail, Outlook, Thunderbird, Gmail: none of these options allow Markdown syntax out of the box. There are some apps that do, but these are either paid, or only support a single provider.</p>
<p>Instead, Markdown has been used to solve a variety of completely different use cases: Comments and discussions on forums and messaging boards, note-taking, technical documentation, and professional text writing. None of these use-cases have been intended by Gruber and Schwartz. And this is evident when looking at <em>how</em> apps implement Markdown.</p>
<p>Here’s an incomplete list of deviations from the standard Markdown syntax that I can name off the top of my head: (a) Obsidian uses a non-standard <code>![[]]</code>-syntax to link images from across your loaded folders; (b) Zettlr and Pandoc use the non-standard citation syntax <code>@citekeyYear</code>; (c) some apps use <code>!!! note</code> for admonitions; (d) GitHub uses <code>&gt; [!NOTE]</code> to the same effect; (e) and even if you’d like to call Wiki links (<code>[[link]]</code>) “standard,” there is disagreement about whether titles should come before or after the link target (<code>[[link|title]]</code> vs <code>[[title|link]]</code>).</p>
<p>There’s much more. Every app that wants to support some specific use-case has to invent some syntax <em>ex nihilo</em>. Markdown standardization is so difficult that even John McFarlane, creator of Pandoc and the <a href="https://spec.commonmark.org">CommonMark specification</a>, has given up trying to create a comprehensive standard, and instead just calls it a baseline. There is <code>pandoc-crossref</code>, pandoc supports three different types of tables (pipe, grid, and “simple”), and a litany of custom elements scattered across the ecosystem. Some are slowly adopted across the ecosystem as de facto standards, such as footnotes or <code>==highlights==</code>, but this blog, for example, still does not support either natively.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup></p>
<p>We may convince ourselves that, once you know Markdown, you can easily switch between programs, but the reality is that you will have to re-learn some aspects of Markdown every time you do switch. Even though there is no strict vendor lock-in because it’s all just plain syntax: if you have no experience with whichever Markdown editor has created some file, you will find syntactic elements whose purpose doesn’t immediately become clear to you.</p>
<p>In essence, Markdown applications have re-created a very similar problem as they set out to solve twenty years ago. Users who use some application have less and less incentive to switch apps as they continue to use it, because it becomes more and more difficult to exchange the custom syntactic elements between apps. Markdown syntax and the apps which produce it have become less interchangeable over time.</p>
<p>Don’t get me wrong: I still believe that the benefits of Markdown far outweigh its drawbacks. But Markdown is far from universal beyond a very small set of agreed-upon syntactic elements.</p>
<h3>Users Don’t Care About Markdown</h3>
<p>This brings me to the next point: Users don’t care about Markdown as much as us creators of Markdown apps tend to believe. Yes, it feels bad if you set out to create some “Markdown Editor for the 21st century” and then some random dude comes around and demands a WYSIWYG-editor. But is the user in the wrong? Or is it rather that you have made Markdown a part of your identity?</p>
<p>The reason that Markdown is so universal is that, in many respects, a few simple syntactic elements can make messaging and commenting easier. One can securely support bold and italic text with a few additional parsing rules — no need for a complex visual editor. But as soon as you require more complex elements, the question becomes: Wouldn’t a visual editor be simpler? Sure, editing code satisfies some niche urge to maintain “purity,” where there is not a single superfluous character in your documents. But apart from that, the efficiency of whichever writing solution you have should be dictated by the use-case, not the syntax. Sometimes, a graphical editor is just simpler than adding Markdown support. Even Wikipedia now defaults to a visual editor, because it is just simpler than learning the MediaWiki syntax.</p>
<p>While each app has a small subset of users who use the app primarily because of the ability to write Markdown, the vast majority of users use it because it makes their lives easier. Because it provides functionality other apps don’t. And if it would work without Markdown, many of those wouldn’t be sad. Markdown is not a silver bullet, and users know that. Over the past ten years developing a Markdown editor, I have realized that users are very smart when it comes to understanding their needs. If a user switches to Zettlr, they usually have plenty of experience with rich text editors, and are looking for an alternative not because it uses Markdown, but because it makes citing, writing papers, and collecting research notes easier.</p>
<p>However, let me add to this that I believe users would care more about Markdown if a few show-stoppers would be removed. Since we all are still heavily indebted to Schwartz and Gruber, I feel we, the developers, are a bit hesitant to transform Markdown from its original use-case to a truly capable markup language. In the end, we are missing two crucial things to make Markdown <em>actually</em> useful to users: some added syntactic complexity, and heavy improvements in tooling.</p>
<h3>Markdown Cannot Satisfy Professional Needs</h3>
<p>Let me begin with the first of these missing things with Markdown. It cannot fully cater to professional needs. One of the reasons that there are so many custom elements is that, to satisfy professional needs, we need professional syntax. A great example is the citation- and crossref-syntax from Pandoc. Citations are indispensable tools for academics and other professionals, and as such, first-class support in Markdown is a simple necessity.</p>
<p>But which editors <em>actually</em> support citations? There’s Pandoc and Zettlr, maybe Quarto and RStudio. These are the only apps I can think of that support the syntax natively. Others, including VS Code, Obsidian, and others, support it via plugins. But the vast majority of Markdown apps don’t support citations at all. And, since citations are not considered standard <em>Markdown</em>, nobody can really complain. But this makes the vast majority of apps difficult to use for academics.</p>
<p>However, it is not as simple as making Markdown Turing complete. If we turned Markdown into something very customizable for professional needs, we would end up with a second incarnation of LaTeX. Do we want that? Not if every Markdown app under the sun (including me) doubles down on the fact that Markdown is so much simpler than LaTeX.</p>
<p>No, what I believe Markdown requires for even broader adoption is a combination of two things: Further syntax standards, and improvements in tooling.</p>
<p>On the one hand, we need to improve the syntax by standardizing crucial elements such as citations, Wiki links, and some features that Pandoc has added. We need to go beyond the existing minimal consensus of Markdown syntax and enable additions in a less haphazard way as they are currently done. Instead of leaving every app developer to their own devices when it comes to supporting additional elements, it might make sense to standardize syntax for specific use-cases, and enforce wide ecosystem support.</p>
<p>Think about it: How does “CommonMark for Academia” sound? Or “CommonMark for technical documentation”? This would mean that we still have to learn a new set of elements whenever we switch contexts. There would still be specialized syntactic elements for specific use-cases. But it would mean that there is only <em>one</em> set of specific syntax per use-case. This gives users certainty that they know which elements they can use in a given app, and give app developers a standard to implement. In addition, it would make communication simpler. It would be easy to spot whether an app is useful for you as a user (“This app implements Academic CommonMark”), because you know what you need it for.</p>
<p>But more specialized and formalized syntax is only one ingredient if we don’t want to create a second LaTeX incarnation. Instead, we need to improve tooling. Because it is one thing to offer some syntax for a certain use case, but a completely different one to make the feature usable for end users. And if we don’t want to encode all benefits in the Markdown syntax, this functionality has to come from the application ecosystem.</p>
<h3>Wide Adoption Requires Better Tooling</h3>
<p>This leads me to the last point I wish to make. A final misconception we have towards Markdown is that the current ecosystem support for Markdown is qualitatively sufficient and simply needs to be quantitatively scaled up. To the contrary, I believe that many tools remain stuck in plain Markdown land.</p>
<p>I feel that many tools are too <em>less</em> opinionated for many professional use-cases. Take, for example Obsidian and VS Code. Both are often mentioned when it comes to alternatives to Zettlr. Or more simple tools, such as Joplin and Nextcloud Notes. None of them cater to professional needs by default. Footnotes, highlights, exports, or the support for journal and conference templates can sometimes be added by installing plugins. This works for many people, but I have also seen complaints that it adds clutter. Also, plugins usually mean that there are multiple alternatives for the same functionality, making it more difficult for users to decide which plugin serves their need better. Lastly, plugins have a specific UX that will always remain secondary for ease of use. There is a big difference in experience when you can add a bibliography file using the app’s UI versus when you need to edit a configuration file.</p>
<p>But a better example for my point is collaboration. I want to share a story from a few years back. Sometime in the late 2010s, I was asked to write an opinion piece for a UK outlet. And they asked me to submit a Word file — which makes sense, given that most people back then probably didn’t know what Markdown even was. But everyone had Word installed. So I went and wrote the article in Zettlr, exported it to Word, and sent it to the magazine.</p>
<p>A few days later, I received an email indicating that their review is now done, and I should approve the article proof before publication. So they sent me a Google Docs link (because that’s the simplest way to collaborate in real time on documents).</p>
<p>And would you believe it? They converted my Word file to Markdown. The email contained a lengthy explanation on what Markdown is and that they used it on their website.</p>
<p>Whenever I think about this story, I chuckle, because it is such a powerful demonstration of the absurdity of everything Markdown. Whenever we have to work with many people, we have to use something everyone intrinsically understands, which is Microsoft Word, ideally with added real time collaboration, which is Google Docs.</p>
<p>This is where Markdown still lags behind. There is simply no proper equivalent to Overleaf for Markdown. Real-time collaboration with a “Track Changes” feature is simply not available. And it is precisely this lack of corporate features that will for the foreseeable future keep Markdown from actual wide adoption.</p>
<p>Now, of course there are services that allow collaboration with Markdown. There are ways around it. There are initiatives such as CriticMarkup. But none of this is widely supported. Collaborative Markdown services usually lack dozens of features the Desktop-apps support. And CriticMarkup, while a great initiative, has never been adopted, because most apps focus on the individual, not teams.</p>
<p>Long story short: in the professional space, we need both better support for professional use-cases per se, and better support for the collaborative team-structures that typically come along with professional work. And this is a qualitative issue, not one of simple scaling. Without these initiatives, we will forever remain stuck in the current state where many people still don’t know what Markdown is.</p>
<h2>Final Thoughts</h2>
<p>I like seeing articles that hold up the flag of Markdown, and which are very optimistic about the power of Markdown. But I also believe that this mindset can induce some misconceptions that remove us application developers from the users. Yes, we all have our devout Markdown-supporters who use an app because of the Markdown. But we are well-advised not missing out on the variety of the users we hope to have, because we focus too much on Markdown, and too little on what users actually need.</p>
<p>Indeed, it is nice that iA Writer apparently didn’t change its layout for 15 years. But in those 15 years, we have received massive improvements on Google Docs, we got Overleaf, and the industry has simply moved on. Not implementing additional features on top of the Markdown core seems more and more like the behavior we would expect from a hermit who leaves civilization because they don’t need all the new things that culture generates.</p>
<p>In short, if Markdown applications do not embrace the needs of their users, and instead cling to the assumed “purity” of plain Markdown syntax, they will lose out in the long term.</p>
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>I will never grow tired of recommending you to take a <code>.docx</code>-file, rename it so that its filename ending is <code>.zip</code>, unzip it, and then take a look at the glory mess that is the DOCX-format.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>If you’re interested, there’s a big bunch of ugly regular expressions in the backend that facilitate footnotes for this blog. I <a href="https://github.com/wintercms/wn-blog-plugin/issues/54">raised an issue</a> with the maintainers two years ago.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
<entry>
  <title>This is the Age of Bullies</title>
  <author>
    <name>Hendrik Erz</name>
  </author>
  <rights>Copyright © 2025, Hendrik Erz</rights>
  <link rel="alternate" href="https://www.hendrik-erz.de/post/this-is-the-age-of-bullies" />
  <id>https://www.hendrik-erz.de/post/this-is-the-age-of-bullies</id>
  <published>2025-03-17T09:00:00+00:00</published>
  <updated>2026-02-04T10:15:21+00:00</updated>
  <summary type="html"><![CDATA[Liberal democracy is on decline. What Trump exposes in the U.S. is just a very crass version of a more general trend that I believe is common around the world. We are entering an age of strong-man politics, where rulers in representative systems do not feel bound by their electorate as much as they did just thirty years ago. And where political power becomes a more sought-after commodity, bullies are not far.]]></summary>
  <content type="html" xml:base="https://www.hendrik-erz.de/post/this-is-the-age-of-bullies">
    <![CDATA[<blockquote>
<p>Der Mensch ist ein Unterschiedswesen, d. h. sein Bewusstsein wird durch den Unterschied des augenblicklichen Eindrucks gegen den vorhergehenden angeregt; beharrende Eindrücke, Geringfügigkeit ihrer Differenzen, gewohnte Regelmäßigkeit ihres Ablaufs und ihrer Gegensätze verbrauchen sozusagen weniger Bewusstsein, als die rasche Zusammendrängung wechselnder Bilder, der schroffe Abstand innerhalb dessen, was man mit einem Blick umfasst, die Unerwartetheit sich aufdrängender Impressionen. (Georg Simmel, “Die Großstädte und das Geistesleben,” 1903)</p>
</blockquote>
<p>This is a quote from Georg Simmel’s famous essay “The Metropolis and Mental Life,” which he wrote over a century ago. Simmel, one of the grandfathers of modern Urban studies and the study of capitalist society, was apt in reflecting on the epochal changes happening at the dawn of the previous century. Partially inspired by the writings of Marx, he was acutely aware of the negative aspects brought about by the victory march of the steam engine. Many insights of sociology into how society works today stem in part from his studies on individualism, capitalism, and industrialization.</p>
<p>I would like to briefly focus on this particular statement. What he writes is that man is a “being of differences,” that is, a creature that is agitated by differences. In the essay, he argues that it was precisely the sensory overload of modern metropolis that brought about individualism and a certain narcissism to society. The human mind, he argues, was still used to the slow, rural lifestyle that dominated society until the mid of the 18th century.</p>
<p>This is not, however, how I instinctively interpreted this section. I read it more optimistic. When I first read the essay more than a decade ago, I was convinced that this was something good; that being exposed to constant stimulation was something humans would get inspired by. At least, this is how I felt — progress, learning something new, growing from experiences. Today, I am pretty sure that Simmel wasn’t that optimistic, but I think that something can be learned from my misunderstanding. Something about the condition of modern democracy.</p>
<h2>The Rise and Fall of Liberal Democracy</h2>
<p>Liberal democracy as we know it is a very recent phenomenon. After the fall of absolutism in Europe — which was only about five generations<sup id="fnref:1"><a class="footnote-ref" href="#fn:1" role="doc-noteref">1</a></sup> ago, mind you — first countries started experiencing Republicanism as the first, cautious start into an era of representation. It took another one and a half century until women were recognized as thinking human beings, too, with universal suffrage. It then took two world wars and immense destruction for society to realize that we need even more safeguards against authoritarian and dictatorial tendencies. This has then led to modern representative democracy as we know it. And indeed, we might approximate the stability of a democratic system by its constitution’s age.<sup id="fnref:2"><a class="footnote-ref" href="#fn:2" role="doc-noteref">2</a></sup> If we take this measurement seriously for a moment, the French are <em>the</em> harbingers of democracy, given they’re now at attempt number five in succeeding with this democracy-thing.<sup id="fnref:3"><a class="footnote-ref" href="#fn:3" role="doc-noteref">3</a></sup></p>
<p>What I’m saying is that the kind of democracy which my age group grew up with is a pretty modern phenomenon. Even our parents — the boomer generation — got to experience a less friendly age, with the Cold War commencing fully when they were born, and ending when we were born. My mother used to make the joke that she was born the day the Berlin Wall was raised, and I was born the day it was torn down. And it’s true: My grandfather was born just weeks before World War II started, my grandmother just weeks after the United States entered it, my mother when the Berlin Wall was built, and I … well, I was born the day all that was over. “History” was over.<sup id="fnref:4"><a class="footnote-ref" href="#fn:4" role="doc-noteref">4</a></sup> The “short 20th century”<sup id="fnref:5"><a class="footnote-ref" href="#fn:5" role="doc-noteref">5</a></sup> was over.</p>
<p>And now? Well, some of my friends are getting offspring now. And if we are not all collectively mistaken, it may once be known as the time liberal democracy died. What started as some fringe right-wing movements across the world has finally turned ugly. While Europe is somehow still managing its far-right threat, the U.S. appears to have engaged full throttle. Nobody really knows how the second Trump presidency will end. Will the U.S. Republic still stand in three years and 10 months?<sup id="fnref:6"><a class="footnote-ref" href="#fn:6" role="doc-noteref">6</a></sup> Or will it have devolved into a techno-fascist dystopia akin to what Margaret Atwood has described fourty years ago?<sup id="fnref:7"><a class="footnote-ref" href="#fn:7" role="doc-noteref">7</a></sup></p>
<p>It definitely doesn’t look good. The January 6 rioters have been pardoned as one of the first executive actions Trump signed; Elon Musk is busy dismantling the federal administration; and every week we hear of someone new who legally tried to enter the United States, just to get arrested by Customs and Border Protection. The YouTube channel Legal Eagle is uploading a new video almost every day now with something the Trump administration did — or didn’t — do that is legally at the very least doubtful, but often straight illegal. The Trump admin is very successful in exhausting public discourse, with stunts such as their constant oscillation between “Musk is in charge” (in which case it would be illegal, so they only say it when they speak to their voter base) and “Musk is not in charge” (which would not be illegal, but is not correct, because he is, in fact, in charge, but they say this to courts whenever they ask to hope that they can get through with it).</p>
<h2>What Is Happening in the U.S. Is Happening Elsewhere, Too</h2>
<p>What we can observe in the U.S. is an especially crass delegitimization of the very foundations of liberal democracy. Neither Congress nor the Judiciary have the power to stop whatever insanity the Trump administration is doing every day. The executive is going rampant, and the zookeeper has fled the confines of the tiger enclosure. More and more often do we hear that the administration straight out ignores court orders, and no longer cares about upholding even the appearance of common sense. It’s just about creating facts, and they do create facts very efficiently.</p>
<p>But the foundations of liberal democracy are more fragile than that. It doesn’t take some Donald Trump to erode a state. No, just a few weeks ago, wedged in between his inauguration and the German federal elections, something equally frightening happened in Germany. On Wednesday, the 29th of January 2025, for the first time since its inception, the far-right <em>Alternative für Deutschland</em> (AfD) was able to provide the necessary majority for an equally far-right resolution by the German conservatives (CDU) under the leadership of Friedrich Merz.</p>
<p>Merz and his party added two items to the German parliamentary calendar just weeks before the federal election: A resolution that was asking the Bundestag to recognize the “dangers” of “illegal migration,” and a bill that would’ve become law, which contained roughly the same. The resolution was under vote on Wednesday, the 29th, the law on Friday, January 31st. The (not legally binding) resolution did get the necessary majority with the help of the far-right, while the (legally binding) bill was rejected with a dangerously small margin. It all came down to a few representatives from the Conservatives falling over last-minute. It was a thriller — and painful to watch. For four hours we had to wait for the actual vote to happen.<sup id="fnref:8"><a class="footnote-ref" href="#fn:8" role="doc-noteref">8</a></sup></p>
<p>Now, why do I call this an “erosion of the foundations of liberal democracy”? After all, unlike some of Trump’s executive orders, this all followed correct procedure, right? Well, technically yes. But here’s the thing: Democracy is not just about technicalities. Democracy is also about values. One of these values is, for example, not to pass a change of the constitution after a new parliament has been voted for, but the old one still being in office. While the old parliament is still regularly in power, it was frowned upon just a few years ago to do anything too drastic without necessity. Do we have to reform the federal debt ceiling? Absolutely. But is it necessary to do it using a “lame duck” parliament? Debatable. Is tomorrow’s vote<sup id="fnref:9"><a class="footnote-ref" href="#fn:9" role="doc-noteref">9</a></sup> technically legal? Absolutely. As I said (and the German constitutional court verified some years back), the old parliament is still regularly in power. Is it clean and tidy? Not at all. Something very similar applies to what Merz did. While technically clean and orderly, it eroded one fundamental value of liberal democracy.</p>
<p>And that value is: Don’t reach for power as an end in itself.</p>
<h2>Welcome To An Age Of Bullies</h2>
<p>The reasoning behind the strategic move by Friedrich Merz was clever. I would almost be inclined to call it extraordinarily strategically smart. Because he successfully scared the entire parliament. There’s nothing the democratic parties are more afraid of than passing legislation with the votes of the far-right. We did learn this much from our own history. By placing a non-binding resolution on the agenda first, and then a legally binding law, winning the resolution with the votes of the AfD and then losing the bill vote by a slim margin, he sent an unmistaken message to the rest of the Bundestag: “I am not afraid of using the AfD if it suits my goals. So if you don’t want the AfD facilitating a majority, do as I say.” It was essentially a thinly veiled threat against democracy itself. Just believable enough, but not too dangerous to Merz himself. He made sure to be able to contain the ghosts he called — barely.</p>
<p>Merz is a politician of power. And so is Trump. And so is, to a degree, Emmanuel Macron in France.<sup id="fnref:10"><a class="footnote-ref" href="#fn:10" role="doc-noteref">10</a></sup> The reason we are somewhat okay with Macron using the technical powers of his office to the fullest extent is that he’s a democrat. But liberal democracy cannot survive on the good will of individuals alone. This is what I mean by “age of bullies”: We are entering an era in which politicians do what they want, by strategically using the technicalities of the rule of law. And more and more, the rule of law is being replaced with charismatic rulers.</p>
<p>The only reason Europe hasn’t fallen as deeply as the U.S. into this pit is because many of our current charismatic rulers are still democrats at heart. Macron is ruling with an iron fist, but at least he’s using his presidential powers for strengthening a democratic Europe. But it doesn’t have to stay like this. And this is what is deeply problematic about this style of government.</p>
<h2>Democracy is not Self-Evident</h2>
<p>If we think about it, this is all reminiscent of the old Greek cycle of government.<sup id="fnref:11"><a class="footnote-ref" href="#fn:11" role="doc-noteref">11</a></sup> Europe went from feudalism to absolutism, then to republicanism, democracy, and now we’re approaching authoritarianism. I often think about the way Max Weber’s forms of government<sup id="fnref:12"><a class="footnote-ref" href="#fn:12" role="doc-noteref">12</a></sup> fit into this cyclical structure. We start with the rule of law, which slowly turns into charismatic government, which turns into traditionalist government, and then… well, then what? We don’t know.</p>
<p>What we are living through is obviously not a cycle. History rarely repeats itself.<sup id="fnref:13"><a class="footnote-ref" href="#fn:13" role="doc-noteref">13</a></sup> Instead, it is a constant progression through forms of government. It looks very much like what Karl Marx once wrote: History is a march of constant innovation and governmental progression. However, unlike Marx, history doesn’t seem as optimistic that all would culminate in communism. Instead, history looks more like the <em>angelus novus</em> from Walter Benjamin; the “angel of history” which is doomed to unstoppably move towards the future while looking at humanity’s bloody and gruesome past. “It wishes to rest, awaken the dead and repair the destroyed. But a storm is pushing from Paradise, into its wings, and so strong that the angel cannot close them anymore. This storm is pushing him inevitably into the future, to which he turns his back, all the while the mountain of debris in front of him grows unstoppably. That, which we call progress, is this storm.”<sup id="fnref:14"><a class="footnote-ref" href="#fn:14" role="doc-noteref">14</a></sup></p>
<p>When I first read Georg Simmel’s essay, I was convinced that us humans <em>needed</em> difference in our lives, to stimulate our brains. And to a certain degree, I still believe that this is true: It is the constant questioning, the constant asking “Why is that?” which drives scientific, but also societal progression. It is this axiom that motivates me every day to continue my research, even if research on U.S. Congress seems futile nowadays. But just as how we crave difference to stimulate our minds, a <em>lack</em> of difference can be very dangerous. And I believe that the “end of history” has marked the loss of difference for democratic governments. When the Iron Curtain fell, so did the <em>raison d´être</em> for democracy. All the institutions that the democratic world had built — NATO, INF,<sup id="fnref:15"><a class="footnote-ref" href="#fn:15" role="doc-noteref">15</a></sup> and even democracy itself — weren’t self-evident anymore.</p>
<p>What was self-evident for the founding fathers of the U.S. was only self-evident <em>in the difference, or comparison, to the colonial power of Great Britain</em>. Protecting the “free West” was only self-evident <em>in difference to the Warsaw Pact</em>. And what was self-evident as a beacon of freedom was only until its lived counter-example fell. And when democracy is not self-evident anymore, people with a lust for power simply create new difference. Whether it be out of boredom or because others won’t stop them anymore is just a technicality at this point, only serving to illustrate the differences between Macron and Trump. This is the lesson of thinkers such as Karl Popper: In the grand scheme of things, democracy is not self-evident. It cannot guarantee its own fundamental preconditions.<sup id="fnref:16"><a class="footnote-ref" href="#fn:16" role="doc-noteref">16</a></sup> And this is why democracy is always one step from decay. To maintain a democracy, we need difference.</p>
<p>If the Trump administration gets its way, then this might create enough of a counter-example that Europe finds its way back to democracy in a sick twist of history. And Europe does appear to awaken again. Several countries have already started moving towards defensive self-sufficiency. But it might not be enough. Because the only self-evident fact in human history seems to be: “Between equal rights, force decides.”<sup id="fnref:17"><a class="footnote-ref" href="#fn:17" role="doc-noteref">17</a></sup></p>
<hr />
<div class="footnotes" role="doc-endnotes"><hr /><ol><li class="footnote" id="fn:1" role="doc-endnote"><p>Using a 60-year-lifespan definition of one generation. Imagine that: Louis XVI was dethroned just 232 years ago.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:1" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:2" role="doc-endnote"><p>Don’t get at me saying that this is oversimplifying — I know it is.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:2" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:3" role="doc-endnote"><p>This, of course, is a reference to the fact that they have overhauled their entire constitution five times since the end of the Napoleonic era. The most recent was caused by Charles de Gaulle after World War II.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:3" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:4" role="doc-endnote"><p>Francis Fukuyama, 1989, “The End of History?”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:4" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:5" role="doc-endnote"><p>Eric Hobsbawn, 1994, “The Age of Extremes.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:5" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:6" role="doc-endnote"><p>Yup, it’s been less than 60 days.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:6" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:7" role="doc-endnote"><p>Margaret Atwood, 1985, “The Handmaid’s Tale.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:7" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:8" role="doc-endnote"><p>I watched the livestream from parliament, which included a more than four-hour long break as all parties called for emergency meetings to discuss the options that were on the table. I really realized this day that watching parliamentary debates is excruciating.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:8" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:9" role="doc-endnote"><p>I’m referring to a planned change to the constitution to reform the debt ceiling (“Schwarze Null”) that is set on the parliament’s agenda for Tuesday, March 18, 2025. This is just a week before the new parliament will be constituted. While technically completely legal, it is frowned upon because it is this weird period where we already had elections, but the old parliament is still the legislative body. I’m all in for the proposed change to the constitution, because that is necessary, but I do understand critics who aren’t happy with the timing. Especially since this reform would have been much simpler for the conservatives with the new parliament, as they have a bigger majority there. But due to how they did it now, the greens of all parties have had a major impact on it. I’m happy with it, but I honestly do not understand the reasoning of the CDU here. I will explain why shortly in the text.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:9" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:10" role="doc-endnote"><p>Do not misunderstand me here. I am not saying that Macron and Trump are anything alike. Trump is a petty autocrat, while Macron still holds dear democratic values. I even believe that Merz is still a democrat, even though he gambled with democracy. Rather, what I am trying to say is that the styles of government they all share are very different instances of a common trend.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:10" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:11" role="doc-endnote"><p>Many Greek thinkers, from Thucydides through Aristotle to Plato have had their own theory that forms of government alternate, called κύκλος, or cycle. They hypothesized that there is some set of government forms, and the city states would cycle through them, from good to bad — from aristocracy, through timocracy, then democracy, ochlocracy, and finally tyranny, only to start anew.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:11" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:12" role="doc-endnote"><p>Weber argued for three types of government that would morph into another one. It starts with charismatic leadership where a charismatic individual reigns absolutely; then, after their death, the rule becomes traditional that is, “it has always been like this,” before being replaced by the rule of law (“Wertrationalität”).&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:12" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:13" role="doc-endnote"><p>If it does, it does so as a tragedy first, and as a farce second.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:13" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:14" role="doc-endnote"><p>Translated by me from a great piece of the Deutschlandfunk on Walter Benjamin’s relationship to the painting “Angelus Novus”, <a href="https://www.deutschlandfunk.de/walter-benjamins-engel-der-geschichte-ein-sturm-weht-vom-100.html">https://www.deutschlandfunk.de/walter-benjamins-engel-der-geschichte-ein-sturm-weht-vom-100.html</a>.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:14" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:15" role="doc-endnote"><p>I believe the differences and similarities between the reasoning for the existence of NATO and the INF are instructive here. The North Atlantic Treaty Organization was created by the U.S. to secure the West against the Soviet Union. With the dissolution of the USSR, it lost its reason to exist, and many people across all member nations hold the belief that it should be disbanded. But as both Russia’s invasion into Ukraine and the Trump administration’s most recent threats of leaving NATO illustrate, NATO has turned from a mere defensive pact into a beacon for the values it was set up to defend. The INF, or Intermediate-Range Nuclear Forces treaty, on the other hand, is not part of public debate, because too few people ever heard of it. But it, too, was a beacon of peace and meant to safeguard the liberal West from destruction. However, as it was deemed “unnecessary” after decades of nuclear disarmament, it was broken. And you know by whom? Donald Trump. In his first presidency. To a certain degree, the INF was the canary that only few talked about that could’ve given us the hint as to what is about to happen to various other important columns of the “free West” in the coming years.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:15" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:16" role="doc-endnote"><p>See also the <a href="https://en.wikipedia.org/wiki/B%C3%B6ckenf%C3%B6rde_dilemma">Böckenförde-dilemma</a>.&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:16" role="doc-backlink">↩</a></p></li>
<li class="footnote" id="fn:17" role="doc-endnote"><p>Karl Marx, “Capital,” volume I. Somewhere towards the end, if I recall correctly. I unfortunately have the book only in its analog version, and not with me to look it up. I believe this is a statement where we clearly see that Marx has read Thomas Hobbes’ “Leviathan.”&nbsp;<a class="footnote-backref" rev="footnote" href="#fnref:17" role="doc-backlink">↩</a></p></li></ol></div>]]>
  </content>
</entry>
</feed>