API thinking vs. client thinking

Have an API? No? So obscure. Everybody has one these days as APIs were the foundation of online success in the last decade. But building a good API is hard. In fact, the mindset that’s required is peculiar enough to consider separating people who will build it from those who will use it.

APIs were all the rage that, along with AJAX, kicked off Web 2.0. By allowing others to tap into the features and data of your application, you could spark a whole community of clients and mash-ups, making you the platform. Twitter is a well known child of this era, where an API was built first, then Twitter’s own clients as well as all the independent ones on top of it.

APIs, APIs everywhere

This obviously takes away control of the application’s future from its creators, putting it into the hands of a broader community. Example being again Twitter, where features such as retweets were only added to the platform once they became widely used in independent clients. At some point Twitter decided to reclaim control of its brand and user experience, which started to diverge between applications. Certain requirements were imposed on how tweets may be displayed and what functions should be available. Break those and you may be kicked off the API completely.

For System Architects, APIs are the panacea in a multi-device world. With the variety of client applications being demanded – web, native, embedded, large-screen, tiny-screen etc. – we want to keep complexity low by reusing as much code as possible. A properly written API can be shared between all clients and even allow for gracefully dropping support for a legacy generation, like a browser that’s becoming obsolete.

Trello makes excellent use of this graceful degradation pattern:

[T]he website is just a face that chats with the Trello API and that the iOS and Android apps are also just faces and you can make your own face.
(…)
[T]here’s a special face out there for people using Internet Explorer 9.

There’s the API shared by all official and unofficial clients, each one called a “face”, and there’s a special, older version of the web face that’s left to support the remaining users of Internet Explorer 9. Brilliant.

  • Yes! I want to build an API. How do I go about it?
  • With foresight and planning.

APIs are a special case of Separation of Concerns and here’s where I’m starting to think that APIs and clients should be built by different people:

  • clients are focused on their immediate needs; I’m building feature X and need data A, B and C formatted this way.
  • APIs are catering for many clients and their different, often incompatible needs.

If the same person writes the client and the API, or even if they’re separate but on one, tightly knit team, they are much more likely to reconcile the conflict by leaning towards the immediate need of the client, away from the broader needs of the ecosystem. Every subsequent client that comes in with their needs will receive their own, special endpoints. Soon you’ll have an explosion of similar, oddly named methods for very specific use cases, little reusability, where a simple change may require modifications to hundreds of lines of code. In other words, you’ll have built a monolith where “API” will merely be a different name for the application’s model layer, and since that model will be separate from the rest of the application, complexity becomes even worse.

Then, once it’s in production, you’re dead in the water, because:

Public APIs are forever.

Joshua Bloch, How to Design a Good API

Anyone can use a public API and you’ll have to maintain backwards compatibility for a long, long time.

However, if you task different people with building APIs and clients, you’ll get a lot of conversations, often conflicts, which are essential for getting the best result for the broadest amount of use cases.

Make sure the API team consists of people who have as wide a perspective as possible. Keep thinking well beyond the immediate requests they receive, weighing those against all similar requests in the past and thinking forward into the future. What else could be required from this method later? What else might someone want to extract from this particular data set? Will it need filtering, sorting, paging?

Building a good API requires following guidelines, which are not the ones usually proposed for client design:

  • violate YAGNI – think of what might be useful in the future, but leave out things that are easy to add, because removing anything is much harder;
  • write a broader than usual set of features for the method, weighing the possible performance penalties against power;
  • displease everyone equally – clients will often times need to curb their requirements, to allow for broader reusability;
  • document extensively – your documentation will become a guide to understand the contract of each method – what it expects and returns. Without it, you’ll be swamped by questions and complaints.

Joshua Bloch, creator of, among others, the Java Collections API, shares a number of excellent recommendations for building APIs in a Tech Talk he gave years ago at Google. It’s well worth the hour to see it:

If the API is done right, it’s an investment that pays back many times the effort put into it. The multitude of clients that can use it, the flexibility to rapidly build features that weren’t previously thought of. For any regular software company it’s possibly the most complex task it will handle and you should put your best, brightest people on it. And make sure they spark conflicts with all the developers building clients, because that means they’re having real conversations about how to build the best solution for everyone.

Coding is cheating

Programming is perhaps the only job where lying, cheating and deceiving will not only get you paid but also praised for being innovative and creative. That’s because computers are severely limited. We’re literally fitting square pegs (real world) into round holes (0’s and 1’s).

Assuming you already know that everything in the computer world is represented in binary – combinations of 0’s and 1’s – consider the simplest example – trying to store the value 0.2:

0.2 decimal = 0.00110011... binary

There is no precise representation in binary for the decimal 0.2. Instead it’s a repeating pattern of 0011 after the mark. To make matters worse, a computer will only store a limited amount of digits for a number, say 32 for a fraction like the one above. But due to the specifics of number storage, the actual representation will only keep 26 digits from the recurring pattern of 00110011.... The rest will be cut and gone as if it never existed. So the number a computer will actually store is:

0.199999988079071044921875

Close enough to round it off to 0.2, but still, what a cheat!

We continue to work around constraints, this time of memory. All computer memory is limited and we shouldn’t waste it unnecessarily. So when we want to store a value in a program, we often won’t store the actual value, but a pointer instead:

value = "banana"
valuePointer = &value

The exact code will differ depending on the language used, but essentially it says:

  1. save the value banana under the name value, then
  2. assign to the name valuePointer the memory address of value (it points to where the original value is stored).

Pointer illustration

In consequence we’re using much less memory, because the banana is stored only once, but as a side effect (sometimes desired), if we change to value = "kiwi" later, then valuePointer will also suddenly return kiwi.

Let’s look at something more tangible – a sphere.

Sphere

You know how a sphere looks like, you can recognize one if you see it. But a computer is inherently incapable of producing a real sphere (though that’ll change once ray tracing goes mainstream, thanks Marek). For reasons that require a university course to explain, 3D spheres are drawn with… triangles.

Sphere

There’s just so many of them and so tiny that you are fooled and see a smooth surface. In the first 3D games that surfaced in the 90’s you could actually see the edgy surfaces. Nowadays computers have enough horsepower to draw millions of triangles without much sweat.

Another funny concept is lazy loading. We’re usually storing data in some kind of database, which makes it expensive to retrieve. There’s the time needed for a network call, the database engine reading files from disk etc. It all adds up, hence we want to make as few database calls as possible, so that users won’t have to constantly stare at a screen saying “loading”.

Let’s say you want to open up a contract. We’ll represent it in code as an object that includes all relevant information – ID, date of signing, ship-to and bill-to companies etc. We’ll also tell you it has line items, which we may conveniently program as:

getLineItems()

where calling the above function will return the list of line items. However… we don’t really have those line items ready to display, because we purposefully didn’t ask the database for them just yet. You might not need them at all – just want to check some basic details of the contract. So only the moment you ask for line items explicitly, and getLineItems() is called, do we make the query to the database (and let you wait for it), then return and display the list.

Finally, some problems in computing are very hard and extremely expensive to calculate at scale. Even for the modern beastly machines we have available. If you’re using any sort of map application, you’re seeing one such problem: calculating the best route between two points.

In order to perfectly calculate the best route – be that the shortest or the quickest one, whatever the criterion – the computer would have to have to calculate the distances and routes between every single point in the database. The number of calculations to perform would be the square of the number of points. Warsaw alone has thousands of addresses. Think how many points would there be on the route between, say, Warsaw and Berlin.

The trick we use in these hard cases is heuristics which boils down to using extra information we may have, and allowing for suboptimal results, providing the ones we deliver are good enough. For finding the best route on a map, we already know the locations (latitude and longitude) of all points. We can use that information to limit the area in which we’ll calculate the routes, often to a shape resembling an ellipse:

Shortest path calculation area

We won’t consider points and roads outside of this area at all. That’s why when trying to cross Warsaw North (say Marymont) to South (Ursynów), the GPS might offer you a straight line through the city center, while a quicker and more convenient route may lead along the city bypass. But the calculation is much faster.

It’s all cheating. Bending, stretching the material we are working with – computers – in order to deliver bigger, better and more vibrant experiences to users. We’re not sorry, not at all. It’s like solving elaborate puzzles every single day, while getting pay and praise for it. The joy of programming.

Just enough logging

Debugging is a lot like police forensics. You’re chasing the villain (bug) by analyzing eye-witness accounts (users’ reports), inspecting the crime scene (source code), and combing through often the most helpful resource: CCTV recordings (application logs), if only their quality allows.

I got upset lately, looking for the needle in a stack of log spam:

where everything was logged and only mildly structured, making tracking application flow a nightmare. Clearly just dumping data into a text file doesn’t make debugging easier.

The two most common faults with logging are:

  1. logging to the wrong level, e.g. everything is INFO, forcing someone to dig through tens of thousands lines of text in Mega- to Gigabyte-sized files,
  2. collecting the wrong logs per environment, e.g. logging DEBUG in production, which slows and breaks stressed systems due to “out of disk space” or other errors.

The bigger your organization, the more important it becomes to get logging right. You can’t ignore the time wasted searching overblown logs, nor throw more hardware at the problem. BBC’s preparations for the London 2012 Olympics, for instance, revealed incorrect logging as one of the top performance killers:

  • Monitoring Thresholds
  • Verbose logging, everywhere
  • Timeouts
  • No data
  • Volumetrics
  • Unfair load balancing

Andrew Brockhurst, The BBC’s experience of the London 2012 Olympics

Most organizations’ findings would likely be identical. Clearly, we can do better.

Log the right things, right

With 5 standard levels of logging, it’s not always easy to choose the right one. There are good rules of thumb though, and one of the best write-ups I’ve seen comes from StackOverflow:

  • error: the system is in distress, customers are probably being affected (or will soon be) and the fix probably requires human intervention. The “2AM rule” applies here- if you’re on call, do you want to be woken up at 2AM if this condition happens? If yes, then log it as “error”.

  • warn: an unexpected technical or business event happened, customers may be affected, but probably no immediate human intervention is required. On call people won’t be called immediately, but support personnel will want to review these issues asap to understand what the impact is. Basically any issue that needs to be tracked but may not require immediate intervention.

  • info: things we want to see at high volume in case we need to forensically analyze an issue. System lifecycle events (system start, stop) go here. “Session” lifecycle events (login, logout, etc.) go here. Significant boundary events should be considered as well (e.g. database calls, remote API calls). Typical business exceptions can go here (e.g. login failed due to bad credentials). Any other event you think you’ll need to see in production at high volume goes here.

  • debug: just about everything that doesn’t make the “info” cut… any message that is helpful in tracking the flow through the system and isolating issues, especially during the development and QA phases. We use “debug” level logs for entry/exit of most non-trivial methods and marking interesting events and decision points inside methods.

  • trace: we don’t use this often, but this would be for extremely detailed and potentially high volume logs that you don’t typically want enabled even during normal development. Examples include dumping a full object hierarchy, logging some state during every iteration of a large loop, etc.

ecodan, Logging levels – Logback – rule-of-thumb to assign log levels

While you’re still coding:

  • make sure you don’t flatten stack traces – let them spill into logs in their usual, multi-line form, which then creates a visual pattern that’s easy to scan for,
  • avoid concatenating log messages – the (computationally expensive) code would execute even when a given level is not logged. Your logging library will often help you out, for instance SLF4J will let you replace:
    log.debug("Loaded User: " + user);
    

    with

    log.debug("Loaded User: {}", user);
    

    and automatically compose the message only when the given level should be logged.

Collect the right logs

Different application environments have their own logging needs. The general rules to follow are:

  1. the more traffic, the less logging, but longer persistence,
  2. the more active development, the more logging, but shorter persistence.

This translates into the following setup:

  • development/sandbox DEBUG or even TRACE levels, with logs that are deleted within 24-48 hours.
  • testing DEBUG most of the time, to efficiently follow-up bug reports, with logs deleted after 2-3 days.
  • staging INFO to match production closely, with the occasional fallback on DEBUG if you need to trace problems, logs lasting for as long as the staging phase takes.
  • production INFO or even WARN level if you have a very high-traffic system, with log storage for at least a week or longer.

The environment should also configure the correct appender:

  • use text files wherever possible. They’re open, easy to parse and can be saved to disk without relying on intermediaries, like queues or database connections, to behave correctly (logs stored in databases won’t help much if it’s the database failing);
  • log asynchronously. You really don’t want the application to wait until each and every log message is written.

Bonus: Decide explicitly, or be decided for

Make sure you have explicit logging configuration everywhere, even if some module doesn’t use logging at all. Otherwise a newly pulled in dependency might come with its own setup, which your logging library or other dependencies will happily accept. Just last week we spent a few hours tracking down why our integration test suite suddenly started blasting out DEBUG level logging from all libraries, slowing running speed to a crawl. It was a new mocking library we had added to the Maven dependency list, which brought its own logging configuration, overwriting ours.

 

Thanks to Tim Barnett for motivating me to write this post.