It’s a natural process of software development for testers to live in a steady flow of bugs. Bugs can often cause developers and project managers to (figuratively) pull their hair out in frustration. For testers bugs can be exciting, interesting, fulfilling, and frustrating too. When a project comes across our desk for testing it can feel like the beginning of an Easter egg hunt. Bugs can be hidden in the most unexpected places and when discovered can give a ‘Eureka!’ moment for a tester. They can also range from very minor problems that most people would never notice, to severe errors that adversely affect business and/or technical requirements of the project. These can bring a feeling of dread and discouragement to a team, especially when schedules are tight. At times it helps to step back and recognise that even the biggest, most well established and resourced companies in technology like Microsoft, NASA, IBM, Intel and others have had monumentally bad bugs in their products. So let’s look at a couple of better-known software ‘Easter eggs’ from the past and present.
Apple goto fail;
This is perhaps one of the most widely reported bugs in recent times, that is in no small part due to its severity, but also because of Apple’s profile as a company and pervasiveness of their products. A little over a month ago, Apple scrambled to release iOS 7.0.6 which patched a significant security flaw on any device running iOS 6 up to and including iOS 7.0.5. Think about that for a moment. iOS 6 was released on September 19 in 2012 and iOS 7.0.6 21 February 2014. That’s 17 months that a serious security bug was on your iPhone, iPad, and iPod before it was identified and fixed… by one of the top technology companies on this planet.
So what was the bug exactly? SSL/TLS connections (Secure Sockets Layer and Transport Layer Security) are cryptographic protocols that encrypt data sent over the Internet. The encryption prevents attacks such as man-in-the-middle (MITM), which allows someone to read, modify or intercept data communicated to another recipient over a network you’re connected to. The bug caused verification of the SSL/TLS connection to fail authentication, effectively communicating data over a network unsecured and allowing an attacker with a privileged network position to read or modify the data.
The result of the bug is bad enough without even considering how it was introduced in the first place, so you will love this excellent example of how fickle software can be. The culprit of the bug was a single superfluous line of code, shown below on line 3.
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
Failure to verify SSL authentication occurs because of the two consecutive goto fail; instructions on lines 2 and 3. When the program reads that second goto fail; it skips over the critical verification step on line 4. Removing the unnecessary code on line 3 is all that was required to fix the bug.
If you’re not sure whether your device is vulnerable to this exploit, navigate its browser to https://gotofail.com. If the test fails you should update the device’s iOS version to 7.0.6 or later. However, if you own an iOS device older than an iPhone 3GS or iPod Touch (fourth generation) you will not be able to update any further than iOS 6.1.6, and should seriously consider buying a new device if you value the privacy and security of your data.
It’s just a matter of time
While security bugs like the one above from Apple are serious, they pale in comparison to the severity of others throughout history. Some have caused massive collateral damage, and others have even caused fatalities. But perhaps the granddaddy of all bugs in terms of monetary cost is the Y2K bug. Children and teenagers of the 80’s and 90’s (or should I say 1980’s and 1990’s?) will remember it well, but if you were living under a rock or too young to remember the late 90’s, it’s worth mentioning the nature of the bug.
A common convention on computer systems was to abbreviate four-digit years to two digits e.g. 1995 would be represented as 95, or 2000 would be represented as 00. This could confuse computer systems which assumed that 00 meant 1900, or even 19100 in some cases. The potential errors that could have occurred in computer systems that rely on accurate dates were hard to predict, so in the year leading up to the turn of the century it caused a stir in the media, which fanned the flames of a nervous public who were imagining nightmare scenarios of stock market crashes, planes falling from the sky, power outages and so on. The total cost of preparations, upgrades, and down time to patch the Y2K bug has been estimated anywhere between $300-400 billion. Once the 1st of January 2000 rolled around the impact of Y2K was relatively low, however it’s difficult to predict exactly how severe its fallout may have been if there weren’t years and months of preparations leading up to New Year’s Eve in 1999.
Thankfully we made it clear out of such costly date based bugs after the year 2000, right? Well… not just yet. Perhaps worse than Y2K is the 2038 problem in Unix systems. The 2038 problem is predicted to be potentially more dangerous because of Unix’s pervasive use in embedded systems; that is systems that have a dedicated function in mechanical or electrical systems, such as anti-lock braking systems (ABS), electronic stability control, traction control, aircraft guidance systems and GPS receivers, communication systems (cell phone, Internet, telephone), and many more*.
Unix systems handle dates and times as the number of seconds since 00:00:00 UTC on Thursday, 1 January 1970. The number of seconds is stored as a signed 32-bit integer i.e. a natural whole number represented in binary by a combination of 32 1’s and 0’s. However these 32 1’s and 0’s have an upper limit of the biggest number they can represent. That number is 2,147,483,647. The date and time 2,147,483,647 seconds after 1 January 1970 is exactly 03:14:07 UTC on Tuesday, 19 January 2038. After that point the counter runs out of usable binary bits and “wraps around” to a negative number, now representing the date as December 13, 1901; that’s 2,147,483,647 seconds earlier than January 1, 1970. Similar to Y2K, computers using this date system will think they’re operating in the past, by 137 years. What kinds of errors this could cause in safety-critical software of embedded systems is difficult to predict, but what is easy to predict is there will be similar fear of catastrophic disaster scenarios as we approach the beginning of 2038.
But rest assured, there are a few solutions to the 2038 problem, one of which involves updating the Unix time value from 32-bits to 64-bits, giving 18,446,744,073,709,551,615 (18.4 quintillion) seconds instead of a measly 2,147,483,647 (2.1 billion) seconds. However you might ask, “This will only push the same problem further down the track when we hit the upper limits of the 64-bit value, right?” and you would be correct. We will pass this new upper limit at 15:30:08 UTC on Sunday, 4 December 292,277,026,596… as mentioned on Wikipedia, “This is not anticipated to pose a problem, as this is considerably longer than the time it would take the Sun to theoretically expand to a red giant and swallow the Earth.”
In the meantime, I wonder what other software ‘Easter eggs’ are waiting to be found out there. Happy hunting.
* I don’t suggest you try this, in case it breaks your phone, but it’s interesting to note that since Android phones use an operating system based on Unix you can actually see this bug occur by setting the phone’s time and date just before 03:14:07 UTC 19 January 2038 and letting the time tick past it.