Telemetry, Data Privacy, and Zettlr | Hendrik Erz

# Telemetry, Data Privacy, and Zettlr

A few days ago, a scandal erupted in the Open Source Community: After being acquired by the Muse group, the free audio editor Audacity was supposed to receive an update that would include telemetry in the app. And users didn't like that. So today I want to talk about telemetry, what it is, and how the Audacity debacle prompted me to finally purge any form of data transmission from Zettlr myself.

You might or might not have heard of this, but recently a discussion ensued about the Open Source audio editing program Audacity. That was recently added to the Muse group. As a hobby musician and audio “engineer”1 myself, I know and have used both Audacity as well as MuseScore. Audacity is basically an audio editing program which allows you to do some basic editing of speech, some recordings, and the likes. Nothing very special, but reliably and since the pro-audio market is basically dominated by utterly expensive software (AVID Pro Tools, Apple Logic Pro, and Steinberg Cubase, all of which I’ve used in the past), having one small open source editor for simple editing steps comes in pretty handy for all those of us who, understandably, don’t want to buy the “full” editors themselves.

However, that peculiar acquisition has sparked major outrage across the user base. The reason was twofold. I’m pretty sure that the very fact that Audacity has joined some incorporated organisation sparked some anger, although that is not being uttered openly. The other reason, however, was the much more vocal one: that Audacity was getting an update that suddenly included telemetry. And users were not happy about that. In fact, this announcement, although not that visible in itself, sparked anger across the net so that even I, who hasn’t used the app in years, has basically gotten all important info just by scrolling through Reddit.

But what is telemetry, why did it cause such a mess, and why did I include “Zettlr”, my own app, in the title?

## Telemetry

Telemetry is something like the “Fall of Man” of consumer software. Its definition and use are summarised quickly: Telemetry as its being used in the consumer software market describes data that is being collected about the usage of an app and automatically transmitted to some server. This data includes, e.g., which buttons in an app were clicked, where the mouse cursor has been while the app was active, and, more general, how users actually use the app. The use case is simple: Make an app better.

However, the use case might still be somewhat vague if you’ve never developed an app. So let’s take Zettlr as an example. Once Zettlr became a little bit popular, my first thought was “Oh my God, am I going to become popular?” (Really, if the first time your name appears on serious blogs and other central internet hubs for many people, such as Reddit or Hackernews, and it wasn’t you who put it there, that’s a big thing!) The second thought, immediately afterwards, however, was: “Damn, how do I do all of that? Like, creating some consumer software?!”

The thing is: Developing an app for yourself is one thing. But developing an app for other people to use is quite a different beast. I have a lot of repositories, most of which contain some code I’m basically only maintaining for myself, but Zettlr is the “big one” that actually eats away a lot of my free time. If you’re like me and not a douchebag, if some software apparently helps out a lot of other people, you’re going to take their needs into account more and more. However, as a matter of fact, you are not those people. You yourself only know how you use the software, but not how other people use it.

## What is Telemetry?

Let us take one very simple example: The toolbar. Zettlr has a toolbar just like many other apps out there. I put some frequently used functions in there so that you don’t have to open up the context menu, hit a keyboard shortcut or search for the corresponding menu item every time. However, if I were to develop Zettlr just for myself, the toolbar would look way different. For example, there wouldn’t be a button for the preferences since I normally just hit Cmd/Ctrl+, to open these. But I know that other people – especially those coming from Zotero – will expect that button.

That is just a very simple example, but what about all the other buttons? Are they necessary? Or can they go? Should I put other buttons there? Put simply: I don’t know.

That’s where telemetry comes in: Telemetry collects data about which buttons users click, which menu items they use, and this data can tell you quite a lot about how people work with your app. Many companies do that – Microsoft’s Visual Studio Code collects telemetry, for example, and most proprietary software as well. But Open Source doesn’t, most of the time.

The reason is that telemetry can also be viewed as a massive breach of user’s privacy. If some server receives data about what I do, I might not want that. After all, this telemetry data is the kind of “metadata” that tells you a lot about the person themselves – even without knowing exactly what the person has written/said/communicated. We all know that debate.

Back to Zettlr: I thought to myself: “Okay, how am I going to accomodate those people whom I don’t know but who use the app?” My immediate thought was: “Telemetry!” since that’s what I knew other people did. As sociologists would say: If you experience a novel situation, you tend to just copy other people until you know how to handle the situation yourself, and when I realised that Zettlr was going to be an important piece of software for many people across the globe, that was also the action frame I fell into.2

However, since I didn’t know how to cleverly collect some telemetry about the actual usage, and since I never dealt with anything like it before, I came up with a very basic plan: Whenever the app checks for updates, I simply have it also transmit the UUID (unique user ID) of the application, which is being generated whenever an app is first run on a computer and which is fairly unique, the current version, and the operating system. And, since I wanted to know where people actually are, I also ran a very coarse GeoIP-software on the IP-addresses from which the users ran the app (country level, nothing more specific).

You might already know where this is heading: Even though the UUID does not tell you anything about someone (since it’s being randomly generated), the IP addresses can. Upon sifting through the IPs, for example, I figured out several universities from which researchers used the app. I also figured out that one whole lab of 10 people were using the application, since ten different UUIDs curiously shared the same IP address.

The actual important information – which version people used and which operating system – basically just mirrored the publicly available statistics and common knowledge: Most users were using Windows, second was macOS, and finally some flavour of Linux. And, people are lazy when it comes to updating. But that’s it. However, it is still kind of a breach of trust of users if you collect this kind of information. Even though it’s just IP addresses!

However, the former me thought “Oh, that’s cool!” Since I work with data a lot, it was fascinating for me to see how an app actually disperses globally. However: It did not bring any benefit for me when it comes to making the app better. Because I’m still developing an Open Source app, and thus people could just alter the source code and implement those changes they wanted to have. And they did. And I merged their changes into the app. And everyone is happy. So after three months of looking at the data in awe, I figured I don’t need it.

Additionally, the users are obviously not stupid, and people saw what kind of information was being transmitted and they fairly quickly demanded an option to turn off the update checks. But, knowing how lazy people are, I didn’t want to turn that off. I didn’t want to have anyone who, completely oblivious to a new version, encountered one bug, opened an issue and thus prompted me to close the issue, referring to the new version which already fixed that bug. However, checking for updates and sending data are two completely different things. It took me quite a while to figure out that the underlying reason for these demands was the data transmission, not necessarily the update check itself.

By coupling data transmission to a vital function (update checks) I effectively prevented me from easily getting out of the trap I have just built for myself.

In short: I was naïve. Telemetry is extremely sensitive data, and just because other people use it doesn’t mean I have to. I never really removed that code even though I haven’t looked at the data at all in three years. Why? I honestly don’t know. Maybe there’s a tiny Minime inside of my head that still thinks “But data is just beautiful!” and I would lie if I denied that.

However, once the scandal surrounding Audacity became widely known, I thought again about those three lines of code that would submit telemetry to my server. Three lines of code that delivered me cool, but ultimately useless, data.

This week I finally removed those lines of code. No more information being transmitted once everyone switches to a newer version. And the data that’s already on my server? That went down the toilet yesterday. Upon writing this article, I finally realised that would be for the best.

It’s sad for me as someone who likes data and treats it responsibly. But it will be for the best. Because: Can you really trust me? “Of course, you can!” I will shout out in joy once asked, but then I’m just a stranger on the internet to you, kind enough to deliver you some good app to work with. But we’ve most likely never met. So is it good for me to have your data? Probably not.

## You Don’t Need Telemetry If You Have Good Communication Channels

After all, this adventure of mine was eye-opening. It was super interesting to see how an app develops. It was super interesting to see where my app was being used, and how. But in the end, all issues that someone had with the app can be resolved either on Reddit, on the Forum, or on GitHub.

That was also one possibly wise decision by me a long time ago: I don’t require you to have a GitHub account or a Reddit account or to be comfortable with forums in order to voice requests and support other people. I give you enough opportunities to reach out and tell me directly what bugs you, instead of having to rely on telemetry. Yes, sometimes this means I have more work. And this also means that you have to actually come forward and say what you’d like to see changed. But it means that you decide if and how any data of yours is ending up on the internet. None of your data is being shared with anyone without your active doing.

In the end, you don’t need telemetry, if you just listen to people. And the Audacity debacle finally gave me the last bit of convincing I needed. From this week onward, I will forfeit any possession of telemetry or any other, potential personal data that you might or might not produce. From now on, the update check of Zettlr will just do that: Check for updates.

Oh, and before I forget: If you don’t want to use unstable releases of Zettlr: I killed every piece of code that would actually record the data that Zettlr is sending. So even if you’re still on 1.8.9 and thus Zettlr happily transmits that info, my server will flush that right down the toilet to where it belongs: the void.

1. Since I’m not a trained engineer it would be illegal for myself to actually call me that. So as a disclaimer: No, I’m not an engineer, neither in the sense of the German nor U.S. law.

2. If you’re interested in the science behind this, I recommend you to first read a little bit about Hartmut Esser’s concept of “frames”, and later on delve into the sociology of culture (especially Paul DiMaggio) and institutionalism (Walter Powell, Brian Rowan, John Meyer, and, curiously enough, also Paul DiMaggio). It is super interesting to see how much we solve unexpected situations simply by copying – even if we can’t be sure whether or not that works.