Risk-First Analysis Framework
If we all had identical knowledge, there would be no need to do any communicating at all, and therefore and also no Communication Risk.
But, people are not all-knowing oracles. We rely on our senses to improve our Internal Models of the world. There is Communication Risk here - we might overlook something vital (like an on-coming truck) or mistake something someone says (like “Don’t cut the green wire”).
In 1948, Claude Shannon proposed this definition of communication:
“The fundamental problem of communication is that of reproducing at one point, either exactly or approximately, a message selected at another point.” - A Mathematical Theory Of Communication, Claude Shannon
And from this same paper, we get the diagram above: we move from top-left (“I want to send a message to someone”) to bottom left, clockwise, where we hope the message has been understood and believed. (I’ve added this last box to Shannon’s original diagram.)
One of the chief concerns in Shannon’s paper is the risk of error between Transmission and Reception. He creates a theory of information (measured in bits), the upper-bounds of information that can be communicated over a channel, and ways in which Communication Risk between these processes can be mitigated by clever Encoding and Decoding steps.
But it’s not just transmission. Communication Risk exists at each of these steps. Let’s imagine a human example, where someone, Alice is trying to send a simple message to Bob:
Step | Potential Risk |
---|---|
Motivation | Alice might be motivated to send a message to tell Bob something, only to find out that he already knew it. |
Composition | Alice might mess up the intent of the message: instead of “Please buy chips” she might say, “Please buy chops”. |
Encoding | Alice might not speak clearly enough to be understood. |
Transmission | Alice might not say it loudly enough for Bob to hear. |
Reception | Bob doesn’t hear the message clearly (maybe there is background noise). |
Decoding | Bob might not decode what was said into a meaningful sentence. |
Interpretation | Assuming Bob has heard, will he correctly interpret which type of chips (or chops) Alice was talking about? |
Reconciliation | Does Bob believe the message? Will he reconcile the information into his Internal Model and act on it? Perhaps not, if Bob thinks that there are chips at home already. |
There is a symmetry about the steps going on in the diagram above, and we’re going to exploit this in order to break down Communication Risk into its main types.
To get inside Communication Risk, we need to understand Communication itself, whether between machines, people or products: we’ll look at each in turn. In order to do that, we’re going to examine four basic concepts in each of these settings:
And, as we look at these four areas, we’ll consider the Attendant Risks of each.
There are lots of different types of media for communicating (e.g. TV, Radio, DVD, Talking, Posters, Books, Phones, The Internet, etc. ) and they all have different characteristics. When we communicate via a given medium, it’s called a channel.
The channel characteristics depend on the medium, then. Some obvious ones are cost, utilisation, number of people reached, simplex or duplex (parties can transmit and receive at the same time), persistence (a play vs a book, say), latency (how long messages take to arrive) and bandwidth (the amount of information that can be transmitted in a period of time).
Channel characteristics are important: in a high-bandwidth, low-latency situation, Alice and Bob can check with each other that the meaning was transferred correctly. They can discuss what to buy, they can agree that Alice wasn’t lying or playing a joke.
The channel characteristics also imply suitability for certain kinds of messages. A documentary might be a great way of explaining some economic concept, whereas an opera might not be.
Shannon discusses that no channel is perfect: there is always the risk of noise corrupting the signal. A key outcome from Shannon’s paper is that there is a tradeoff: within the capacity of the channel (the Bandwidth), you can either send lots of information with higher risk that it is wrong, or less information with lower risk of errors.
But channel risk goes wider than just this mathematical example: messages might be delayed or delivered in the wrong order, or not be acknowledged when they do arrive. Sometimes, a channel is just an inappropriate way of communicating. When you work in a different time-zone to someone else on your team, there is automatic Channel Risk, because instantaneous communication is only available for a few hours’ a day.
When channels are poor-quality, less communication occurs. People will try to communicate just the most important information. But, it’s often impossible to know a-priori what constitutes “important”. This is why Extreme Programming recommends the practice of Pair Programming and siting all the developers together: although you don’t know whether useful communication will happen, you are mitigating Channel Risk by ensuring high-quality communication channels are in place.
At other times, channels are crowded, and can contain so much information that we can’t hope to receive all the messages. In these cases, we don’t even observe the whole channel, just parts of it.
When we are talking about a product or a brand, mitigating Channel Risk is the domain of Marketing Communications. How do you ensure that the information about your (useful) project makes it to the right people? How do you address the right channels?
This works both ways. Let’s looks at some of the Channel Risks from the point of view of a hypothetical software tool, D, which would really useful in my software:
Internal Models don’t magically get populated with the information they need: they fill up gradually, as shown in the diagram above. Popular products and ideas spread, by word-of-mouth or other means. Part of the job of being a good technologist is to keep track of new Ideas, Concepts and Options, so as to use them as Dependencies when needed.
“A communication protocol is a system of rules that allow two or more entities of a communications system to transmit information. “ - Communication Protocol, Wikipedia
In this section, I want to examine the concept of Communication Protocols and how they relate to Abstraction, which is implicated over and over again in different types of risk we will be looking at.
Abstraction means separating the definition of something from the use of something. It’s a widely applicable concept, but our example below will be specific to communication, and looking at the abstractions involved in loading a web page.
First, we need to broaden our terminology. Although so far we’ve talked about Senders and Receivers, we now need to talk from the point of view of who-depends-on-who. That is, Clients and Suppliers.
In order that a web browser (a client) can load a web-page from a server, they both need to communicate with shared protocols. In this example, this is going to involve (at least) six separate protocols, as shown in the diagram above.
Let’s examine each protocol in turn when I try to load the web page at the following address using a web browser:
http://google.com/preferences
The first thing that happens is that the name “google.com” is resolved by DNS. This means that the browser looks up the domain name “google.com” and gets back an IP address.
This is some Abstraction: instead of using the machine’s IP Address on the network, 216.58.204.78
, I can use a human-readable address, google.com
.
The address google.com
doesn’t necessarily resolve to that same address each time: They have multiple IP addresses for google.com
, but as a user, I don’t have to worry about this detail.
But this hints at what is beneath the abstraction: although I’m loading a web-page, the communication to the Google server happens by IP Protocol - it’s a bunch of discrete “packets” (streams of binary digits). You can think of a packet as being like a real-world parcel or letter.
Each packet consists of two things:
But, even this concept of “packets” is an Abstraction. Although all the components of the network understand this protocol, we might be using Wired Ethernet cables, or WiFi, 4G or something else beneath that.
I ran this at home, using WiFi, which uses IEEE 802.11 Protocol, which allows my laptop to communicate with the router wirelessly, again using an agreed, standard protocol. But even this isn’t the bottom, because this is actually probably specifying something like MIMO-OFDM, giving specifications about frequencies of microwave radiation, antennas, multiplexing, error-correction codes and so on. And WiFi is just the first hop: after the WiFi receiver, there will be protocols for delivering the packets via the telephony system.
Another Abstraction going on here is that my browser believes it has a “connection” to the server. This is provided by the TCP protocol.
But, this is a fiction - my “connection” is built on the IP protocol, which as we saw above is just packets of data on the network. So there are lots of packets floating around which say “this connection is still alive” and “I’m message 5 in the sequence” and so on in order to maintain this fiction.
This all means that the browser can forget about all the details of packet ordering and so on, and work with the fiction of a connection.
If we examine what is being sent on the TCP connection, we see something like this:
> GET /preferences HTTP/1.1
> Host: google.com
> Accept: */*
>
This is now the HTTP protocol proper, and these 4 lines are sending information over the connection to the Google server, to ask it for the page. Finally, Google’s server gets to respond:
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/preferences
...
In this case, Google’s server is telling us that the web page has changed address. The 301
is a status code meaning the page has moved: instead of http://google.com/preferences
, we want http://www.google.com/preferences
.
By having a stack of protocols, we are able to apply Separation Of Concerns, each protocol handling just a few concerns:
Protocol | Abstractions |
---|---|
HTTP |
URLs, error codes, pages. |
DNS |
Names of servers to IP Addresses. |
TCP |
The concept of a “connection” with guarantees about ordering and delivery. |
IP |
“Packets” with addresses and payloads. |
WiFi |
“Networks”, 802.11 flavours, Transmitters, Antennas, error correction codes. |
HTTP
“stands on the shoulders of giants”: not only does it get to use pre-existing protocols like TCP
and DNS
to make its life easier, it got 802.11
“for free” when this came along and plugged into the existing IP
protocol. This is the key value of abstraction: you get to piggy-back on existing patterns, and use them yourself.
Hopefully, the above example gives an indication of the usefulness of protocols within software. But for every protocol we use, we have Protocol Risk. This is a problem in human communication protocols, but it’s really common in computer communication because we create protocols all the time in software.
For example, as soon as we define a Javascript function (called b here), we are creating a protocol for other functions (a here) to use it:
function b(a, b, c) {
return a+b+c;
}
function a() {
var bOut = b(1,2,3);
return "something "+bOut; // returns "something 6"
}
If function b then changes, say:
function b(a, b, c, d /* new parameter */) {
return a+b+c+d;
}
Then, a will instantly have a problem calling it and there will be an error of some sort.
Protocol Risk also occurs when we use Data Types: whenever we change the data type, we need to correct the usages of that type. Note above, I’ve given the JavaScript
example, but I’m going to switch to TypeScript
now:
interface BInput {
a: string,
b: string,
c: string,
d: string
}
function b(in: BInput): string {
return in.a + in.b + in.c + in.d;
}
function a() {
var bOut = b({a: 1, b: 2, c: 3); // new parameter d missing
return "something "+bOut;
}
By using a static type checker, we can identify issues like this, but there is a tradeoff: we mitigate Protocol Risk, because we define the protocols once only in the program, and ensure that usages all match the specification. But the tradeoff is (as we can see in the TypeScript code) more finger-typing, which means Codebase Risk in some circumstances.
Nevertheless, static type checking is so prevalent in software that clearly in most cases, the trade-off has been worth it: even languages like Clojure have been retro-fitted with type checkers.
Let’s look at some further types of Protocol Risk:
The people you find it easiest to communicate with are your friends and family, those closest to you. That’s because you’re all familiar with the same protocols. Someone from a foreign country, speaking a different language and having a different culture, will essentially have a completely incompatible protocol for spoken communication to you.
Within software, there are also competing, incompatible protocols for the same things, which is maddening when your protocol isn’t supported. Although the world seems to be standardizing, there used to be hundreds of different image formats. Photographs often use TIFF, RAW or JPEG, whilst we also have SVG for vector graphics, GIF for images and animations and PNG for other bitmap graphics.
Even when systems are talking the same protocol, there can be problems. When we have multiple, different systems owned by different parties, on their own upgrade cycles, we have Protocol Versioning Risk: the risk that either client or supplier could start talking in a version of the protocol that the other side hasn’t learnt yet. There are various mitigating strategies for this. We’ll look at two now: Backwards Compatibility and Forwards Compatibility.
Backwards Compatibility mitigates Protocol Versioning Risk. Quite simply, this means, supporting the old format until it falls out of use. If a supplier is pushing for a change in protocol it either must ensure that it is Backwards Compatible with the clients it is communicating with, or make sure they are upgraded concurrently. When building web services, for example, it’s common practice to version all APIs so that you can manage the migration. Something like this:
/api/v1/something
./api/v1/something
./api/v2/something
./api/v2/something
./api/v2/something
./api/v2/something
API.HTML
and HTTP
provide “graceful failure” to mitigate Protocol Risk: while its expected that all clients can parse the syntax of HTML
and HTTP
, it’s not necessary for them to be able to handle all of the tags, attributes and rules they see. The specification for both these standards is that if you don’t understand something, ignore it. Designing with this in mind means that old clients can always at least cope with new features, but it’s not always possible.
JavaScript
can’t support this: because the meaning of the next instruction will often depend on the result of the previous one.
Do human languages support this? To some extent! New words are added to our languages all the time. When we come across a new word, we can either ignore it, guess the meaning, ask or look it up. In this way, human language has Forward Compatibility features built in.
A second aspect of Protocol Risk exists in heterogeneous computing environments, where protocols have been independently implemented based on standards. For example, there are now so many different browsers, all supporting variations of HTTP
, HTML
and JavaScript
that it becomes impossible to test comprehensively over all the different versions. To mitigate as much Protocol Risk as possible, generally we test web sites in a subset of browsers, and use a lowest-common-denominator approach to choosing protocol and language features.
Although Shannon’s Communication Theory is about transmitting Messages, messages are really encoded Ideas and Concepts, from an Internal Model. Let’s break down some of the risks associated with this:
When we construct messages in a conversation, we have to make judgements about what the other person already knows. For example, if I want to tell you about a new JDBC Driver, this pre-assumes that you know what JDBC is: the message has a dependency on prior knowledge. Or, When talking to children, it’s often hard work because they assume that you have knowledge of everything they do.
This is called Theory Of Mind: the appreciation that your knowledge is different to other people’s, and adjusting you messages accordingly. When teaching, this is called The Curse Of Knowledge: teachers have difficulty understanding students’ problems because they already understand the subject.
A second, related problem is actually Dependency Risk, which is covered more thoroughly in a later section. Often, to understand a new message, you have to have followed everything up to that point already.
The same Message Dependency Risk exists for computer software: if there is replication going on between instances of an application, and one of the instances misses some messages, you end up with a “Split Brain” scenario, where later messages can’t be processed because they refer to an application state that doesn’t exist. For example, a message saying:
Update user 53's surname to 'Jones'
only makes sense if the application has previously processed the message
Create user 53 with surname 'Smith'
For people, nothing exists unless we have a name for it. The world is just atoms, but we don’t think like this. The name is the thing.
“The famous pipe. How people reproached me for it! And yet, could you stuff my pipe? No, it’s just a representation, is it not? So if I had written on my picture “This is a pipe”, I’d have been lying!” - Rene Magritte, of The Treachery of Images
People don’t rely on rigorous definitions of abstractions like computers do; we make do with fuzzy definitions of concepts and ideas. We rely on Abstraction to move between the name of a thing and the idea of a thing.
This brings about Misinterpretation Risk: names are not precise, and concepts mean different things to different people. We can’t be sure that other people have the same meaning for a name that we have.
Another cost of Abstraction is Invisibility Risk. While abstraction is a massively powerful technique, (as we saw above, Protocols allow things like the Internet to happen) it lets the function of a thing hide behind the layers of abstraction and become invisible.
Invisibility Risk is risk due to information not sent. Because humans don’t need a complete understanding of a concept to use it, we can cope with some Invisibility Risk in communication, and this saves us time when we’re talking. It would be painful to have conversations if, say, the other person needed to understand everything about how cars worked in order to discuss cars.
For people, Abstraction is a tool that we can use to refer to other concepts, without necessarily knowing how the concepts work. This divorcing of “what” from “how” is the essence of abstraction and is what makes language useful.
The debt of Invisibility Risk comes due when you realise that not being given the details prevents you from reasoning about it effectively. Let’s think about this in the context of a project status meeting, for example:
Invisibility Risk is everywhere in software. Let’s consider what happens when, in your program, you create a new function, f:
But something else also happens: by creating f, you are saying “I now have this operation. The details, I won’t mention again, but from now on, it’s called f” Suddenly, the implementation of “f” hides and it is working invisibly. Things go on in f that people don’t necessarily need to understand. There may be some documentation, or tacit knowledge around what f is, and what it does, but it’s not necessarily right.
Referring to f is a much simpler job than understanding f.
We try to mitigate this via (for the most part) documentation, but this is a terrible deal: because we can’t understand the original, (un-abstracted) implementation, we now need to write some simpler documentation, which explains the abstraction, in terms of further abstractions, and this is where things start to get murky.
Invisibility Risk is mainly Hidden Risk. (Mostly, you don’t know what you don’t know.) But you can carelessly hide things from yourself with software:
When you build a software service, or even implement a thread, ask yourself: “How will I know next week that this is working properly?” If the answer involves manual work and investigation, then your implementation has just cost you in Invisibility Risk.
So finally, we are coming to the root of the problem: communication is about transferring ideas and concepts from one Internal Model to another.
The communication process so far has been fraught with risks, but we have a few more to come.
Although protocols can sometimes handle security features of communication (such as Authentication and preventing man-in-the-middle attacks), trust goes further than this, it is the flip-side of Agency Risk, which we will look at later: can you be sure that the other party in the communication is acting in your best interests?
Even if the receiver trusts the communicator, they may not believe the message. Let’s look at some reasons for that:
From the point-of-view of Marketing Communications, choosing the right message is part of the battle. You are trying to communicate your idea in such a way as to mitigate Trust & Belief Risk.
If the messages we are receiving force us to update our Internal Model too much, we can suffer from the problem of “too steep a Learning Curve” or “Information Overload”, where the messages force us to adapt our Internal Model too quickly for our brains to keep up.
Commonly, the easiest option is just to ignore the information channel completely in these cases.
It has often been said that code is harder to read than to write:
“If you ask a software developer what they spend their time doing, they’ll tell you that they spend most of their time writing code. However, if you actually observe what software developers spend their time doing, you’ll find that they spend most of their time trying to understand code. “ - When Understanding Means Rewriting, Coding Horror
By now it should be clear that it’s going to be both quite hard to read and write: the protocol of code is actually designed for the purpose of machines communicating, not primarily for people to understand. Making code human readable is a secondary concern to making it machine readable.
But now we should be able to see the reasons it’s harder to read than write too:
In this section, we’ve looked at Communication Risk itself, and broken it down into six sub-types of risk, as shown in the diagram above. Again, we are calling out patterns here: you can equally classify communication risks in other ways. However, concepts like Learning Curve Risk and Invisibility Risk we will need again. Also, note how these risks are, in a sense, opposite:
In the next section, we will address complexity head-on, and understand how Complexity Risk manifests in software projects.