Understanding the role of an API
Our world is hopelessly, hilariously complex. Both Thomas Thwaites in his quest to build a toaster[1] and AJ Jacobs in his quest to thank…
Our world is hopelessly, hilariously complex. Both Thomas Thwaites in his quest to build a toaster[1] and AJ Jacobs in his quest to thank all the people responsible for his morning coffee[2] illustrate this complexity; it is essentially impossible to reason about how the world works together to produce the various miracles we experience on a day to day basis.
With software eating the world[3] we in the software engineering community are tasked with finding a way to represent this impossible complexity in our software ecosystem. Just as in meatspace[4] we cope with these complexities by creating a set of rules around certain things and treat those things as a kind of “black box”.
There are many things in life that we (or at least I) consume that I do not have a good knowledge of. For example, I consume:
Cars through the steering wheel and pedals
Restaurants through the menu
Toilets through a handy “push button” interface
Police through the telephone
Computers via a keyboard, mouse and the occasional profane word
In all these examples there is a hidden complexity to the objects that we do not see. We do not need to see it, nor understand it to to gain utility out of that service. So long as the interface is understandable and behaves in a predictable way, we do not worry too much about the underlying internals.
However, if you’ve ever travelled through an airport in a country with a bathroom culture significantly different than your own and experienced a toilet with different behaviour than you expect you know when these interfaces are unpredictable it is an unpleasant experience.
Within software it is much the same. We access databases through structured query language (SQL), network services via the transmission control protocol (TCP) and underlying system resources through the portable operating system interface (POSIX). While at this point in my career thinking in terms of these interfaces is about as natural as breathing these interfaces did not always exist. It is useful for us when we design our own interfaces to understand how such systems were designed, packaged and sold into the wider market to become the fundamental building blocks they are today.
Its thus perhaps reasonable to dive into what an application programming interface (or API) is designed to accomplish, how to think about APIs when designing or own software and what makes a “good” API.
What is an API?
Before carrying on to work out what constitutes a good or bad API and how to design them it’s worth establishing a common understanding of what an API is in the first place. Naively we might think of an API and consider HTTP API implementations such as:
https://abusiveexperiencereport.googleapis.com/v1/?key={YOUR_API_KEY}
https://kubernetes.io/docs/concepts/overview/kubernetes-api/
However, the definition of an API is somewhat more inclusive. Wikipedia defines it as:
… A set of subroutine definitions, communication protocols, and tools for building software. In general terms, it is a set of clearly defined methods of communication among various components.
It goes on to say:
A good API makes it easier to develop a computer program by providing all the building blocks, which are then put together by the programmer.
Given this definition the aforementioned resources are indeed APIs. But in addition to those APIs there are vastly more:
The Hypertext Transfer Protocol (HTTP)
The Browser APIs used in JavaScript
The POSIX APIs
Indeed, I would go so far as to consider any structured interface defined to expose some software functionality to the consumer is an API — regardless of whether that was an intended use of that interface or not.
What makes a “good” API”?
When considering what makes a “good” API its perhaps reasonable to look at some of the more successful APIs that have come before us. For our use case we’ll look at perhaps the most successful API in history by consumption — the HTTP API.
The HTTP API
The HTTP protocol (0.9) was given life in 1991 by Tim Berners-Lee[5] and shortly thereafter succeeded the Gopher protocol[6] as the canonical way to request “hypermedia” data from other computers.
An simple example of this protocol is:
$ telnet google.com 80
Trying 172.217.168.238...
Connected to google.com.
Escape character is '^]'.
GET /about/
This produces the response:
HTTP/1.0 302 Found
Location: https://about.google/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
... truncated for brevity ...
Connection closed by foreign host.
This simple protocol would go on to dominate user interactions with computers, helping form the “world wide web” and kicking off what has been referred to as the “new industrial revolution”[7].
In my opinion HTTP displays some of the most important properties of APIs. It is:
Predictable
When HTTP was built it was initially built on top of other, well defined and commonly understood standards. In particular it was built on top of:
TCP
ASCII
The entire protocol was encapsulated in the telnet
command above. It consists of the request format:
${REQUEST_VERB} ${URI} ${HTTP_VERSION}
${HEADER_KEY}:${HEADER_VALUE}
${BODY}
And response format:
${STATUS_CODE}
${HEADER_KEY}:${HEADER_VALUE}
${BODY}
Where
${REQUEST_VERB}
- Something like "GET", "PUT", "POST", "HEAD" etc. Used to indicate whether to send, retrieve or inquire about data${URI}
- The address of that data. Commonly, but not always modelled after the filesystem${HTTP_VERSION}
- The HTTP protocol version${STATUS_CODE}
- A numeric constant and text reference for the status of the request${HEADER_KEY}:${HEADER_VALUE}
- Key / value pairs that hold request metadata${BODY}
- The payload of data being sent either way
These primitives can be combined to create the exceedingly complex web that we see today. The particularly exciting part of this API is that each specific request is not complex, consisting only of a subset of constants. It is easy to understand on the wire both the HTTP request and response.
However, that simple API allows us to generate the extremely complex interactive experiences we see today.
Reliable
The #1 and #2 fallacies of distributed computing[8] are:
The network is reliable
Latency is zero
These illustrate some of the harder problems to reason about with networked computing. Within networked systems one cannot guarantee delivery successfully exactly once[9]; they are either delivered:
Repeatedly, until that delivery is acknowledged or
A single time, with no guarantee that the message was delivered at all
And yet, HTTP (on top of TCP) manages to overcome this inherent flakiness and is used as the underlying protocol for browser traffic, REST API, gRPC and a host of other network communication.
While networks are at best flaky annoying messes, HTTP in conjunction with TCP (or in the case of HTTP/3 TLS) provides some safety by providing well defined failure semantics for remote procedure calls (RPCs).
Given the scenarios of:
An upstream service being unavailable: HTTP will return the “503” service unavailable status code
The canonical service being unavailable: HTTP encourages (but does not require) timeout & retry
The upstream service fails: HTTP will return a “500” internal error status code
The upstream service is fine, but the request is bad: HTTP will return a 400 status code
The request is fine: HTTP will return a 200 status code.
The full list of conditions HTTP is set up to handle is perhaps best expressed via “HTTP status codes as dogs” (or more officially rfc7231)
Consistent
Linus Torvalds is somewhat infamously quoted as saying[10].
If a change results in user programs breaking, it’s a bug in the kernel. We never EVER blame the user programs.
The rest of that mail that quote is taken from serves to emphatically drive home this point in a less than ideal way, but the point still stands. APIs should essentially never change.
HTTP maintained wire format backwards compatibility between “0.9” and “1.1” and has remained the same semantic structure in HTTP/2 and will continue to do so in the upcoming HTTP/3. For application developers this has meant a largely smooth transition between all versions of the HTTP protocol with nearly no changes required to applications that use this protocol to continue use.
Interestingly HTTP is also a demonstration of how certain practices such as concatenation of assets and spriting images become “semi official APIs”, and that when even these longstanding but never documented practices are revised it can cause significant friction.
Designing our own “good” APIs
While one might consider HTTP an unusual API to use a benchmark of API success I chose it deliberately because it’s so easy to forget we’re dealing with it on daily basis. Languages, frameworks and other tooling hide the HTTP details from us such that we do not usually inspect it save in the case of a particularly unusual bug. Indeed, I regularly see developers reimplement HTTP semantics on top of HTTP itself; recreating error conditions and so fourth.
However, there are steps that we can take to ensure that the APIs we craft make our users happy and live long, healthy lives of their own.
To illustrate how to craft an API for long life we can take a look at the fledgling littleman.co project “bioprofile.co”
Use an API specific DSL
As discussed earlier in the HTTP specification our goal with the API design is to be predictable. Perhaps the best way to be predictable is to reuse an existing model for API design. There are lots of different ways to model network APIs:
REST, often represented with an OpenAPI specification
Using any of the above protocols means that a whole swathe of problems are immediately solved such as:
Documentation
Wire format
Error reporting
Interchangability
Language library generation
Most importantly, anyone who consumes your API is likely to have used one or more of the formats described above.
Of the options defined above I prefer gRPC and a slavish adherence to the Google API design guide[11]. In addition to the properties defined above gRPC uses the wire format “Protobuf”[12] which is opinionated, efficient, strongly typed and can be used to generate both client and server libraries in a number of different languages.
gRPC itself is implemented on top of HTTP/2.
Understanding the domain
In order to derive value from an API it must be possible to map it to some sort of human process. The problem is modelling human interactions is hard. Really hard. There have been multitudes of different ways of trying to express how humans work in software directly, or in a domain specific language (DSL):
The designing of APIs is the definition of boundaries between a given process. The better the understanding of both a given process and the boundaries between that and other processes the better the API can be designed to reflect those relationships. Its worth spending at least much time understanding the domain as it is understanding the nature of programming more generally.
Consider the example in bioprofile where a user would like to submit their heart beat frequency to the bioprofile service. It poses some interesting questions, such as:
What is a heart rate?
A heart rate could be:
A metric sampled over a fixed, standardized period
A count and average over an arbitrary period
The former gives some insight into the current rate but runs the potential to be an inaccurate representation over time if samples are not structured. The latter is always 100% accurate but loses granularity over the arbitrary period.
In this case the way I’d approach this is to count & average over standardized periods ala Prometheus count
metrics
How does the user identify themselves?
The user might not be collecting their heart rate data while connected to the internet. How then do we know whether the data actually comes from the user? Further, how does the user even authenticate themselves presuming we can guarantee the data? How long do we trust the user is “that user” for?
In this case the the way I’d approach this is with OpenID & token. In future with better identity claims thanks to WebAuthN)
Modelling it
The in the example above we have abstract, human problems and need to model them in software — particularly in this case over a networked API.
By using an existing authentication specification (OpenID) we can assume that:
Our user will attach a json web token (JWT) to the request that identifies who they are, and that can be verified against the authentication servers public key
That JWT will contain a list of scopes that user is allowed to access
Accordingly whenever a user makes that request we can verify that they either own or have access to that data by comparing user IDs and can determine whether they should be able to view or modify that data through the scopes attached to the JWT.
We must still model the actual request and response. In this case its likely the user will wish to submit a set of “heart rate samples” to the API at any given time. Accordingly, we should have both a “heart rate sample” type and an “heart rate sample list” type as well as endpoints that allow submitting both of these types.
In protobuf the type definition might look something like:
syntax = "proto3";
package v1alpha1.types;
import "google/protobuf/timestamp.proto";
message HeartRateSample {
// When the sample started
google.protobuf.Timestamp start = 1;
// The length of time of the sample, expressed in seconds
float seconds = 2;
// The total number of heart beats in the sample
int32 beats = 3;
}
message HeartRateSampleList {
repeated HeartRateSample samples = 1;
}
And the service definition look something like:
syntax = "proto3";
package v1alpha1.services;
import "v1alpha1/types/heartrate.proto";
import "google/protobuf/empty.proto";
service HeartRateSampleService {
// Push a single heart rate measurement to your profile
//
// Will overwrite other measurements started in the same second
rpc PutHeartRate(v1alpha1.types.HeartRateSample) returns (google.protobuf.Empty) {
}
// Put a list of heart rate measurements
rpc PutHeartRateList(v1alpha1.types.HeartRateSampleList) returns (google.protobuf.Empty) {
}
}
In this case, understanding that:
Time series data is inherently time specific, and the API may as well express that rather than hide it
Users will likely want to submit multiple samples in a single RPC call
Users will likely be submitting their heart rate samples at a different time than they’re sampled, and thus need to embed that data in the RPC
The RESTful methods are still a good model for managing this data
Allows us to craft an API that should make sense to implementers. Further, because the API only deals with the specific problem of sending and receiving heart rate data and makes no assumptions about how such data will be generated or consumed at either end of the RPC it should be flexible for a large range of use cases and require minimal maintenance over time.
In the context of bioprofile.co its likely that one user will upload another users data on their behalf. For example, a sports coach may upload profile data on behalf of their athlete. The API is not currently built to handle this, and likely should be adjusted to include a notion of “patient” or “athlete”. See the next sections for how to address these “unknown unknowns”.
Take it slow
One of the characteristics of new API systems is that they do not make significant departures from the existing design:
TCP was built after experience developing the PARC universal packet[13] and experience at ARPANET[14].
gRPC was developed with experience of developing stubby[15]
REST was designed based an examination of the properties of the web[16]
If we accept that API is a representation of the conceptual model of software we should also accept that the clarity of that conceptual model is based on others ability to compare that model to what they know and understand.
It turns out that making things that others understand is exceedingly difficult; the entire field of UX design has been created to attempt to create this understanding for users of commercial products. With APIs our audiences are definitely smaller but their understanding is no less critical.
One of the best ways to try and create something that is useful for others is to involve them in the process. The proverbial “release early and often”[17]. Unfortunately, this is in direct violation with the aforementioned “Never change APIs”! To address this, we split our API into two a set of “versions” that provide limited guarantees.
APIs start at:
Alpha
An alpha API provides absolutely no guarantees about its stability, usage. It is essentially a design declaration and may be useful for those who wish to understand where the software is going or prototype their own solutions based on that API.
It should absolutely never be used for customer data.
Beta
Beta APIs are a sign that the API from the Alpha period is nearing the end of its design and are a sign that implementers may wish to start designing and dogfooding[18] their implementations.
Beta APIs should come with some guarantees, such as:
3 month deprecation and removal period
No breaking changes without a version bump
A single version of backwards compatibility
An availability service level objective
Stable
Stable APIs are when the API is 100% complete and no further backwards incompatible changes are ever expected to it. Stable APIs should essentially never change.
They should come with guarantees such as:
A 12–24 month deprecation period
An availability guarantee
By only guaranteeing an API when the design is already well tested and understood by multiple implementers the API stands a far greater chance of lasting a long time and software implementers can construct their own designs on top of the existing API.
In addition to the “alpha”, “beta” and “stable” API levels it may be worth floating “trial” APIs published silently and included only to a small number of partners to see whether that API is valuable prior to committing design resources to it — even for an alpha.
Be Unopinionated
It is hard to draw the line between “too opinionated” and “not opinionated enough” in an API. Too much the former and the API will be brittle and cost far too much to maintain; too little and the API provides no market value.
APIs that have just the “right” amount of opinion allow the implementation of constructs on top of their own APIs. For example,
The kubernetes “deployment” object embeds a copy of the “pod” object inside itself
Amazon Web Services allows users to build complex, user facing services on top of their APIs
Browser APIs allow users to create rich, interactive experiences without needing to understand browser internals
A general rule is to be “just opinionated enough” to provide some value with your API design. The Kubernetes approach of composing its own primitives into larger primitives or Amazons model of building services on top of its own services are both good examples of good primitive APIs that can be aggregated together to create higher order primitives, both internally within those companies and externally by consumers.
Be clear about failure
All software will fail eventually. Whether that's as a result of:
Bad data sent in by the user
Temporary conditions such as overload on a server
Fatal server conditions rendering a service unavailable
Network issues
Our APIs will not always work in the ways we would intend them to. However its possible to considerably reduce the pain by being clear as to how and why the service failed, as well as what the user might do to address this.
Within our language specific DSLs that are usually ways to express error:
HTTP Status codes
gRPC error codes
User facing errors
At minimum we should correctly set the status codes on our responses, or provide in our client libraries mechanisms to handle a lack of responsible from the upstream system.
However, generally speaking when users are debugging our systems they will not have the knowledge that we do to translate an error condition into a conjecture as to why an error may be the case. To address this we can send back rich error information when users encounter one of these error conditions.
This information should include:
A code to reference the error. For example, “v1alpha1.foofield.length”
A human readable description of the error. For example, “The field FooField is longer than 10 characters”
A tip to try to address the issue directly. For example, “Try capping the length of the content in FooField”
A URL where the user might understand more about this particular class of errors. For example: “see e.api.bioprofile.co/v1alpha1.foofield.length”
This is a high burden to put on our developers for all types of errors. However, the burden is several orders of magnitude higher for those who are attempting to consume the API and supplying good error information can save significant hardship and mistrust in implementers.
In Conclusion
Our world is indeed horribly, hilariously complex. However we as software engineers must find a way to model that complexity in the far stricter world of computing. By being careful and deliberate with our design of APIs we can considerably increase their longevity and improve the experience of users consuming those APIs. Looking at the story beyond and implementation of other successful APIs provides us good guidance as to how to build our own APIs in future.
Bibliography
T. Thwaites, “How I built a toaster — from scratch.” https://www.ted.com/talks/thomas_thwaites_how_i_built_a_toaster_from_scratch , Nov-2010.
A. J. Jacobs, “My journey to thank all the people responsible for my morning coffee.” https://www.ted.com/talks/aj_jacobs_my_journey_to_thank_all_the_people_responsible_for_my_morning_coffee , Jun-2018.
M. Andreessen, “Why software is eating the world.” https://a16z.com/2011/08/20/why-software-is-eating-the-world/ , Aug-2011.
“Meatspace.” https://www.merriam-webster.com/words-at-play/what-is-meatspace .
I. Grigorik, “A breif history of HTTP.” https://hpbn.co/brief-history-of-http/ , 2013.
“Gopher (Protocol).” https://en.wikipedia.org/wiki/Gopher_(protocol) .
C. Mims, “Inside the new industrial revolution.” https://www.wsj.com/articles/inside-the-new-industrial-revolution-1542040187 , Nov-2018.
P. Deutsch, “The fallacies of distributed computing.” https://web.archive.org/web/20160909234753/https://blogs.oracle.com/jag/resource/Fallacies.html .
“Message Delivery Reliability.” https://stackoverflow.com/questions/44204973/difference-between-exactly-once-and-at-least-once-guarantees .
L. Torvaldis, “Re: [Regression w/ patch] Media commit causes user space to misbahave (was: Re: Linux 3.8-rc1) share.” https://lkml.org/lkml/2012/12/23/75 , Dec-2012.
“API Design Guide.” https://cloud.google.com/apis/design/ , Apr-2019.
“Protocol Buffers.” https://developers.google.com/protocol-buffers/ .
“PARC Universal Packet.” https://en.wikipedia.org/wiki/PARC_Universal_Packet .
NetworkT, “Which networking model came first: TCP/IP or OSI?” https://www.networkhunt.com/networking-model-came-first-tcp-ip-osi/ , Dec-2017.
“Principles.” https://grpc.io/blog/principles/ .
“Representational State Transfer.” https://en.wikipedia.org/wiki/Representational_state_transfer .
R. T. Fielding, “Release Early, Release Often.” http://www.catb.org/ esr/writings/cathedral-bazaar/cathedral-bazaar/ar01s04.html , 2000.
“Eating your own dogfood.” https://en.wikipedia.org/wiki/Eating_your_own_dog_food .