Documentation Considered Harmful

Written by Troy Howard

01 April 2019

Key Words and Phrases: documentation, knowledge sharing, engineering culture, technical writing, user experience (UX), product culture

Background

For those of you familiar with my community and professional work—as a co-founder and organizer of the Write The Docs conference (2013-2015), as an outspoken advocate for documentation, and as the Technical Lead (TL) of Twitter's internal TechDocs team (2014-current)—this publication may come as a surprise. It is precisely these professional experiences which have led me to the understanding that I have today: that documentation may in fact be more harmful than it is helpful.

For a number of years I have been familiar with the observation that the quality of software product teams is a decreasing function of the density of documentation that accompanies the products they produce. More recently I discovered why the use of documentation has such disastrous effects, and I became convinced that documentation should be abolished from all "higher level" software development efforts (i.e. everything except, perhaps, plain machine code). At that time I did not attach too much importance to this discovery; I now submit my considerations to the public, because in very recent discussions in which the subject turned up, I have been urged to do so.

What Problem Does Documentation Solve?

My first remark is that, although the programmer's activity ends when they have constructed a correct program, the customer experience taking place under control of that program is the true subject matter of a programmer's activity, for it is this customer experience that has to accomplish the desired effect; it is this process that in its dynamic behavior, has to satisfy the desired specifications and deliver value to the customer of the product. Yet, once the program has been made, the "making' of the corresponding process is delegated to the machine.

Often the use of these products is made in isolation from the programmer. In fact, the customer is interacting with the machine, not with the programmer who created the product. The programmer attempts to capture all potentially relevant knowledge about the product in the form of documentation, which is then provided to the customer alongside the final product. The customer is then faced with the daunting task of interacting with an inscrutable machine process, which does not (and can not) explain itself, while digging through a veritable mountain of information provided by the programmer, attempting to locate the one small bit of information that will help them accomplish their goals, and ultimately derive value from the product.

The Problem Of Docs

This model is at best anachronistic and at worst has an effect that is an extreme opposite of its intentions. A customer may spend a good deal of time attempting to read documentation, failing to find answers, and interacting with the process via a trial-and-error methodology. A customer may spend vast amounts of time interacting with the product without ever succeeding in deriving value from it, which results in a net loss of value to the customer.

This model is anachronistic in that it is based on an old and now outmoded method of communicating knowledge: two dimensional written language. Hearkening back to the earliest days of human history, regardless of medium—the clay tablets, papyrus sheets, and tortoise shells of 3500 BCE, the printing press of 1440 CE, or the hypertext web of 1989 CE—all such information delivery systems suffer from a number of major flaws.

The first and most notable flaw is in the maintenance and updating of this knowledge. While we have come a long way since the era of clay tablets and hand-copied illuminated texts, even modern systems for publishing documentation rely on a point-in-time capture of knowledge in the written word, which is then published and read by the customers of the product. At some point, the knowledge may change, requiring our diligent programmers and technical writers to manually locate all outdated knowledge, remove it, and replace it with current information, then publish once more. This snapshotting process means that the information will always be, at minimum, slightly out of date, and sometimes, grossly inaccurate.

It is also very easy for even the most well-meaning of documentation authors to "miss a spot", leaving outdated information in the final published product, especially in a large corpus. It is also theoretically impossible to include every piece of information, and a constant process of prioritization, filtering, and compromise must be made leading to a subset of relevant knowledge being published, but leaving out details that may be important to certain customers in certain situations, leaving those customers without a resource to discover that knowledge.

The issue of discoverability within what is published is another concern. Modern technologies like search engines, especially those backed by artificial intelligence, attempt to help the customer locate knowledge by formulating their questions in natural language. This technology is impressive, but ultimately it is just a very sophisticated version of the printed index located in the back of most textbooks and other non-fiction technical publications. These systems improve the speed with with a customer may dig through a corpus of knowledge, but do little to change the fundamental paradigm of the knowledge acquisition task. The customer is still digging through a haystack, looking for a needle, which may or may not be there, but these days, uses a fancy metal detector.

Rarely do we stop to question this somewhat insane activity. Our current system is inefficient for both authors and readers, and has large functional gaps against its intended effect. Further, it is a strong indicator of what should be two very bright red flags; poorly written code, and suboptimal user experience (UX).

Excessive documentation is a side effect of poorly written code. Modern programming languages are just that: languages. They are a form of communication, and when done properly, code represents a form of communication not between a human and a machine, but between two humans, who are working together with a machine to accomplish a task. A machine speaks in 0s and 1s, numbers, and a constantly flowing stream of electrons. Computers are unable to comprehend words. Words—in the form of a programming "language"—must be compiled (i.e., translated) into machine language for the computer to be part of the conversation.

Our programming languages are designed to reduce ambiguity, to allow our natural form of human expression (words, phrases, sentences, which capture semantic concepts), to be made concrete into a set of numeric instructions that our machine may process, unconscious of both their intended effects or actual outcomes. Programming languages are optimized to be readable and expressive for humans, so that we may collaboratively work within the mental model of our natural human languages, while still being able to produce this machine output as a side effect of our conversations.

Since programming languages are designed for humans, there is no reason they should not be easily readable, easily comprehended, and should be sufficient information for anyone who would like to know how a system works. The process of writing documentation is a meta-information process in which we explain what we've already explained, without the benefit of the concision and clarity that the formal language of code provides. In essence we are undergoing multiple stages of translation, from our initial mental model, into a programming language, into machine code, and then into English (or some other natural language).

At every stage of translation there is some information loss, and some lack of precision that occurs. At the level of translating from our mental model to code, it is the compromises we make from our ideal product constrained by the limitations of the machine. At the level of programming languages to machine code, it is the bugs we introduce and other unintended behaviors (or lack of behaviors), and at the documentation level, it is the verbose, informal, and incomplete explication of the aforementioned processes.

This complexity is hurting us, not helping us. Code should be simple and readable. Programming languages should be made more sophisticated so that they are closer in nature to our natural human languages, or closer in nature to other forms of abstract writing like mathematical notation. If the code was clear, concise, and written in a more natural language model, there would be little to no need for additional translation into documentation.

Excessive documentation is also a side effect of poorly design user experiences (UX). Machine systems should be obvious in their behavior with clear instrumentation, transparent mechanisms, and intuitive interactions. Products that require explanation to be used are poor products.

Consider something as simple, yet useful, as a hammer. A hammer is a tool that humans have been using in some form or another since time immemorial. A heavy mass is lifted, moved with speed, translating its stored potential energy into kinetic energy, and the resulting kinetic energy brought to bear upon a focal point, transferring that energy into the subject. Whether it's a rock raised high above one's head and brought down onto coconut, or a modern metal-headed hammer swung precisely to hit a nail, the hammer augments our abilities as humans, providing power and firmness, where we are naturally weak and soft.

The hammer is well adapted to its environment—powered by gravity and the innate forces of our physical world of mass, potential, and kinetic energies, and controlled by a simple hand-grasp motion and arm swing, that even a 1 day old human baby is capable of—a hammer needs no instruction manual. The user experience has been refined through thousands of years of trial and error, and the product incrementally improved from lifting heavy rocks, to ideally proportioned swingables made by melting iron and curing hardwoods, augmented now by modern materials like nylon and other advanced polymers. Could our software products be more like hammers? Why aren't they? I propose that the more documentation which is required to explain a product, the worse that product's design is.

Ultimately, the need for documentation can be eliminated by improving both our programming languages (and our use of them), and by improving the design of the products we build.

To make a complete analysis of this, I think it is pertinent to call out one final example of the many unnecessary layers of abstraction that are part of most modern software documentation systems. We author our documentation in Markdown, which is translated to HTML, which is read and translated to thoughts by the end users. Translation after translation, with transmission loss at each stage.

Couldn't we remove some of this inefficiency by authoring directly in HTML (another machine-readable language meant to be readable, now deemed too unreadable)? Or even better, could we not update our web browsers to read and display Markdown directly, without the need for translation, JavaScript libraries, or other additional complexities?

There are so many areas for improvement in this process, yet we continually build more layers on top, rather than improving the existing structures. This tendency is a strong anti-pattern and such efforts should be discouraged.

A Better Way: Distributed Direct Communication

Rather than simply levelling a critique against our current modality, I would like propose an alternative that I think has a lot of strong advantages: Distributed Direct Communication.

Direct Communication (DC) involves the transference of knowledge from one human to another without intermediaries, using natural language. A good example of this which is common in today's workplace is the practice of "shoulder taps", wherein a coworker who needs information simply taps the shoulder of their knowledgeable peer, and asks their question directly. The worker receives their reply and can carry on with their work immediately, with the exact knowledge they requested. These interactions are quick, with a minimal invest paid by both the information seeker and the information provider. The worker is unblocked immediately and can return to using the tool, and deriving value from it.

This method is also nothing new. Orally transmitted knowledge is our oldest system of information retention and knowledge sharing. It has its roots in the storytelling traditions of primitive man, and continues on in many forms today, via cultural activities such as musical lyrics, TED Talks, and instructor-led classroom teaching. Many product organizations understand this value and have customer support teams dedicated to fielding questions posed via a dizzying array of communication mediums; Twitter, Slack, Email, Facebook comments, phone calls, and in-person customer-service desks. These all share one common attribute, which is the use of natural language to understand the need of the customer and provide a very specific ad-hoc response that directly serves the customer's need.

These systems, however, are still quite basic. Customer support is nascent in its development. There is little effort made beyond the simple transactional process of question and answer, to ensure that the customer has not only the knowledge they seek, but additional knowledge that they might need in the future. Imagine a database that served only one field of information per transaction, vs returning an entire row, or set of rows. This would be a very inefficient system, and would put all the burden on the querying client to associate multiple pieces of information together, or to group and aggregate. Insights that can only be derived through higher-order analysis are left to the intuition and synthesis powers of the learner, and there is no way to reliably ensure that these insights are ever produced, or are accurate.

Lacking a complete graph of knowledge, the customer is also unable to assist other customers with any knowledge they haven't personally acquired. Consider the model of Distributed Version Control Systems (DVCS) like git, or Mercurial, wherein each copy of the repository contains a full history of all changes, or said another way, a full graph of all related knowledge. There becomes no functional difference between a clone and the original source. A clone may serve as a source of truth as easily as the master repository does. This improves the overall system resiliency, and can also majorly change the network communication patterns, allowing for a highly distributed graph, with nodes that are both server and client, allowing any node to "shoulder-tap" its nearest neighbor to get a full understanding of the system (or any individual part of it).

Similarly, other Peer-To-Peer (P2P) protocols and systems share knowledge through a web of interconnected nodes, with no specific node being any more of an authority than the others. A gossip protocol can be employed which publishes every new piece of information to each of its neighbors, and then its neighbors publish to their neighbors, until every node in the system has the same knowledge.

Could we not do the same with our human knowledge, using our natural language as our transmission mechanism?

Consider an office of ten (10) programmers. One individual authors a new program, and in their head, holds the entire set of knowledge about it: a complete picture even beyond the well-written code, including the original vision, and what compromises were made during implementation. This individual can then explain it to the programmers sitting to either side of them. Once they have learned the information the original programmer may return to work. Now three (3) people in the office have a complete understanding of the code. The other two may then tell two (2) more, and so on, until every programmer in the office has a full understanding of the product. Each one only spending the amount of time required to explain it once to a small set of listeners, then immediately returning to work.

At this point, documentation is entirely unnecessary. The information is fully transmitted to all coworkers, with minimal time investment, and is fully replicated to all employees, allowing the group to lose or gain members without losing information. New members can be filled in verbally, with any of the existing members of the staff acting as mentor, requiring minimal on-boarding investment, allowing the new member to come up to speed quickly. As changes to this information occur, the change can be expressed once by the originating programmer, and distributed by a gossip-protocol-like transfer between members. These small micro-updates are concise and focused and require only negligible cost. Any time knowledge is missing, or forgotten, a simple shoulder-tap to any of their peers can refresh this information.

Compare how this process works with current documentation practices: The original programmer creates the product, and then must spend significant amounts of time re-expressing the product in an alternate written form, likely leaving out some details. The written documentation is published somewhere passively, possibly with an alert to the other staff that there is new content available to read. Other staff continues working, deferring the information acquisition process until it is needed, causing them to block on their work. They then spend a large amount of time attempting to find the answer to their specific problem, searching through a large knowledge set, losing context from their original problem. If they don't find it, they must resort to verbal knowledge transference.

However this "shoulder tap" is not efficient, because it's not a first-class feature of the knowledge sharing process. The original programmer is the only person that reliably has all the knowledge, so they become the recipient of all "shoulder taps" instead of distributing that load to the rest of the engineering staff. They may end up with a large portion of their day consumed by such interruptions. To mitigate that, they then further interrupt their primary work to update the documentation to include the missing, inaccurate, or poorly expressed information.

The time spent writing and reading documentation, and the time high-value individuals are being interrupted because they are the single-source-of-truth could have been spent improving the readability of the code, or the design of the user experience to obviate the need for additional knowledge transmission. Unfortunately, they are unable to ever make such improvements because of the constant process of documentation updates they must partake in. As the product continues to develop this problem gets worse and worse, ultimately either halting all forward progress, or leaving customers of the product without vital information, and thus unable to derive value.

Does It Scale?

An initial reaction to the above proposal may be that this system could never scale to the level that popular software products with millions of customers would need. Documentation may seem to be the only reasonable way to transmit knowledge to the masses (or perhaps some other 1-to-many knowledge capture tools like in-person classes, or video recordings). While scaling up to will always be a challenge, I propose that a Distributed Direct Communication (DDC) system has better scaling performance than existing documentation-focused methodologies.

With a DDC-based system, every engineer holds a complete replica of information. This can be transferred to a dedicated staff of customer support engineers, who can function like a cache in modern computing infrastructure. This dedicated knowledge cache can serve orders of magnitude of customers inquiries. Frequent requests from specific individual can be identified, and that person can be scheduled for a full knowledge transfer via an in-person class.

Once knowledgeable, that customer can serve as a community support resource for other customers (similar to the Microsoft MVP program). Those customers can form affinity groups and share knowledge with one another, allowing the information to propagate outside of the paid product team, reducing inbound requests over time. A nice feature of this system is that it becomes more efficient over time.

Conclusion

Documentation as it stands is just too primitive; it is too much an invitation to make a mess of one's product and one's code. One can regard and appreciate the described direct communication practices as bridling its use. I do not claim that the practices mentioned are exhaustive in the sense that they will satisfy all needs, but whatever practices are suggested (e.g. verbal gossip protocol) they should satisfy the requirement that the customer experience is always a net gain, and that technical information is maintained in a helpful and manageable way.

Humans have been using the system of tribal verbal knowledge for eons… why does it need to change? Rather, it needs to evolve to take advantage of our modern communication systems, and advancements in understanding of distributed systems.

Finally, there is one critical optimization of of verbal communication has been a feature of tribal knowledge sharing since its inception, which is the use of rhythmic speech and singing to facilitate understanding and retention. Whether it's the chanted mantras of Buddhist dharma transmission, or the singing and dancing that is part of the storytelling customs of ancient peoples in every culture, it has been proven that these techniques provide a massive improvement to the process of verbal knowledge transfer. For this reason, it is recommended that any real-life implementation of DDC consider using rhyming couplets as a basic underlying data structure.


Opinions? Discuss this article on Twitter ...