top of page
  • Petri Silen

Billion dollar mistakes #1 and #2 in software engineering

Updated: Sep 26, 2019

In my opinion, there are two billion dollar mistakes in software engineering. The first one you may have heard of, but how about the second one.

1. Null Reference

Null references are not bad by itself. It is how they are incorporated into the type system. Or actually the bad thing is when they are not incorporated into the type system.

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. -- Tony Hoare

There it says "because it was so easy to implement". We all can sympathize with him. There is always a temptation to create a feature if it seems to easy to implement. But we should think twice and see all the impacts that implementing an easy feature entails. Null references are easy to implement when references are pointers to a memory location. If the pointer is null i.e. it is zero, then it cannot be a valid reference, because no object can be stored at memory location zero.

But now everybody needs to check if a function can return a reference to an actual object or null reference. Similarly inside a function, you might want to receive a reference to an object, but the caller of the function might send you a null reference what you are not expecting.

The solution to this problem is and would have been to introduce optional types. Optional type is a composition of two types, the actual type of value and the type of not having a value at all. Nowadays in many modern languages, optional types exists. And we should use them. But the problem is still there in older languages and their legacy APIs. In modern languages, like typed JavaScript (Flow) or TypeScript when can express optional types very easily:

Some languages even offer a nice short-hand for optional types using a '?' character, for example '?Customer' means an optional Customer type.

2. Index numbering

Where do you think that off-by-one errors come? They come from numbering indexes to an array. I don't know who invented array index numbering, but I think he/she made another billion dollar mistake.

It is very unnatural that indexing start from zero instead of one. I know where this comes from! It is a similar case to null reference invention. Because it was easy to implement! Array variables are typically pointers to the beginning of array. And when you use index zero, it will point to the beginning of the array. So this is easy for the compiler but hard for the compiler user, the developer.

Most of software developers, easily write correct code when they need to get the first element of the array. They write, for example, customer[0]. Almost nobody writes accidentally customers[1], when they want to get the first customer.

How about getting the last customer? Most of us, software developers, easily write correct code: customers[customers.length - 1].

But problems come with more complex situations which are typical in any software. It is a rare case where you only access the first or last element. In more complex cases, your software development "flow" is interrupted by the fact that you have to stop writing the code and start to think how to solve this case. This is the moment where a bug may enter the software.

Let's take a simple example where we need to get next the customer in the customers list given that we have index of the current customer and if there are no more customers we get the last customer:

No we have to think that what is the last index, but we also have to think if currentIndex less or equal to what. So there is another possibility to write the code:

When your coding flow is interrupted and you try to solve this problem as quickly as possible to get back to flow, a bug can enter the software. Let's assume that you have thought both of above solutions, but accidentally write this:

And there you have a bug in the code! And this is a simple case, think of anything more complicated than this. The chance to introduce a bug is then much more likely.

Let's think about the above two solutions from unit testing point of view. You should always choose the simplest solution, i.e. the one where we are checking that currentIndex is less than "customer.length - 1". This is because now you need to test only two cases, but if you had use less or equal, you would have to test three different cases in order to find the possible bug. This would be quite hard, because testing the two cases would result in 100% test coverage and you could easily miss testing the "equal" case, because why would you test it, you already have 100% coverage?!

If we had indexing starting from one, we would be able to write this code easier and even in a fashion that does not interrupt the flow and thus does not introduce a possibility for an error. Let's also assume that in addition to length property we also introduce a lastIndex property that returns same value as length in case the numbering starts from 1.

Now the code is easy to read: if current index is less than last index, we can increment it by one, otherwise we can't.

If we think current programming languages, for example, JavaScript simply just adding a lastIndex property, that returns 'length - 1' to an array would help in many cases and make code easier to read.

What does this slice function call return? Well, we have here other problems than index numbering. The name of the function does not describe well enough what it does. We could guess that the first parameter is perhaps an index, but what about the second parameters? It could be an index or count. Let's assume that it is also an index. Now we have two index parameters for a function. Can we assume that function call returns indexes 3-5 ['camel', 'duck', 'elephant']? No, we cannot, because the last index is NOT inclusive index. It doesn't sound very logical that first parameter is inclusive index, but second parameter is not inclusive index. The function call returns ['camel', 'duck'].

Let's correct the naming of the function and use one-based indexing:

Now it is easy to read for everyone that we get the third and fourth element of the array: ['camel', 'duck']. You don't have to guess if one of the parameters is a count or index and you can safely assume that indexes are inclusive. We could even introduce a shorthand version of the function like this:

What does this Excel function return? Let's make it easier for you: Excel uses one-based indexing.

If we think that those numbers could index, where end index could be non-inclusive or the last value probably could a count also. There are following possibilities: 'p', 'pp' or 'ppl'. Correct answer is the last one, it extracts 3 characters starting from index 2. So the first parameter is index and second is count. It is hard to name such function correctly without making it very long: EXTRACT_COUNT_CHARS_STARTING_FROM_INDEX(..). I would rather redesign this API fully: EXTRACT_BETWEEN_INDEXES("apple", 2, 5). Now it is using only indexes, and if you want to think extracting number of characters, you can write it like this: EXTRACT_BETWEEN_INDEXES("apple", 2, 2 + 3). You can then use a calculator to calculate 2 + 3 = 5 and you end up with the same.

The off-by-one errors are result of bad index numbering (zero-based) and sometimes bad API design.

71 views0 comments
bottom of page