6 Common types of Bias in Software Engineering

By Eric Koyanagi

Posted on 01/31/24

Studies show that we make choices based on emotion, even those of us that try to be Vulcan-like in their logic. Understanding common forms of bias is obviously relevant for data analysis, but it can apply to software engineering, too.

Selection Bias

I once helped set up a fairly robust and elegant data pipeline that used a pixel-based technique to push data into a CDP -- a platform specialized in centralizing and analyzing large amounts of customer data. The powers to be wanted to build some models to help identify attributes shared by product purchasers. With Amazon-level scale, maybe this would work a bit better, but the problem was that our samples were not representative of the wider population. It was too narrow a slice of people to draw any sort of real conclusion.

Selection bias is when you're trying to make generalizations based off an invalid sample. It might be too small or too specific (not random enough).

In software design, selection bias is a common issue in dealing with machine learning, for example trying a model against too specific a demographic, then trying to apply that model to the entire population. It also applies to the way in which technical leaders pick software stacks and enterprise products. Too narrow a demographic (for example, usage of a product at FAANG companies) is often used as a basis for comparison. Obviously just because something works well at Netflix doesn't mean it will work well for you or that it's the best choice for your priorities.

Historical Bias

Historic bias happens when data used is no longer reflective of current reality. For example, Covid happened, and it was a big time of change for some e-commerce sites. They boomed as the rest of the world had no choice but to shutter. I worked for one such company, and their "projections" were reflective of historical bias because they were based on an outlier year that no longer reflected current reality in the world. Of course every monthly profit goal was a "miss" in that context.

This easily happens in software as legacy code crystalizes, and also as engineers cement old habits. For example, an engineer might choose a cron job as a way to schedule a recurring task because at one time that was the best (or only) way to solve the problem. That might work, but perhaps there's a more robust and modern job system in their ecosystem with better visibility and fewer risks. Or perhaps you're judging a technology based on an old version (either more beneficially or not) instead of the reality of how it exists today.

Confirmation Bias

Confirmation bias is wildly common in every corner of human life. Confirmation bias is the tendency to believe information that confirms your values. If, for example, you believe PHP is an inferior language, you're far less likely to consider something like Laravel as an option when it might be a perfect fit for your use case.

Anyone that's used social media or youtube has experienced some form of confirmation bias, as their algorithms are designed to cater to your inherent bias and deliver content that affirms your values and beliefs.

Confirmation bias can affect our software in numerous ways, besides the obvious as it applies to machine learning. For example, if we're diagnosing an issue with throughput and we believe the problem lies in the database layer, you'll likely be looking specifically for evidence of that in the logs, perhaps not giving other pieces of diagnostics the attention you should.

Availability Bias

Everyone's talking about AI, so a firm decides to integrate AI into a customer chat bot to save a lot of money and improve efficiency for customers. It backfires spectacularly, and everyone wonders why they decided to do something like that when established tech worked much better.

We're living in the generational AI hype, and we've yet to see exactly where it will go, but it definitely leads to availability bias. This is where there's simple a lot more data for a given topic or idea and that shifts our perspectives about how popular or useful that idea really is. It also has to do with how readily available information is.

If you see an article with a headline about how "Shopify is dying," or the like, it might make you conclude that Shopify isn't a healthy platform just because you didn't investigate. We don't always weigh information equally, sometimes we make snap judgements based on how easily available it is.

Outlier Bias

Performance is a big deal when it comes to e-commerce conversions. Imagine the common scenario where New Relic is measuring real-world page speeds and tells you that pages load in under a second for 90% of all users. Great! That might make you believe that the site is performant, but then you see conversions aren't as good as they should be as you push some campaigns, for example in rural areas where Internet isn't as reliable.

The point is that it's very easy to bury issues in averages and pretend that outliers aren't important, when they sometimes are. If 90% of your customers are happy, do you conclude you're doing a good job, or do you want to know more about that 10% that aren't satisfied? Of course you should want to learn more, because even though that 10% is the outlier, they have a lot more useful information to help you improve than the majority (and are the most likely customers to churn).

This bias easily leaks into software engineering as we try to combat it when exploring "edge cases". It's natural to give into outlier bias as you write code, catering to the most natural or obvious use cases and thinking it "will work for almost everyone, so it's good enough". Maybe that's true, but do you think your boss would be happy letting even a few percent of customers not convert just because you missed some edge cases? Doubtful!

This also applies to discussions around technical stacks and enterprise products. "Well, if 90% of major players use x, y, or z, that is the best choice for us!"

That might seem logical, but it ignores outliers that might be super relevant by burring them in averages.

Survivor Bias

Ah, my favorite form of bias, if I'm allowed to have a favorite. If you've ever seen a click-bait article with a title like "10 habits of high successful people", it's preying on survivorship bias. This bias is when we focus only on the "winners". For example, the idea that it's simply a matter of hard work and smarts to build a unicorn-level startup ignores the fact that ninety-percent of startups fail. Focusing on those few college dropouts that went on to found wildly successful companies ignores those many more that followed a similar path but didn't have success.

This form of bias is common when discussing modalities of software engineering. If x, y, or z can run their tech smoothly with a strict adherence to agile, well...maybe that's the best bet. That ignores the many other firms where strictly following every agile rule just doesn't work.

It's similar when thinking about frameworks or enterprise products. Of course we want to focus on the winners, the big names and most popular solutions...but that isn't always what you actually need.

It also applies when thinking about algorithms or techniques. You might believe that your tech will be successful if you emulate other "winners", but that ignores the (many) others that have failed doing the same thing; emulating some technique or philosophy because it's used by "winners" will not always lead to success.

Why Bias is Important

No one is a Vulcan, driving their lives purely based on some (highly flexible) notion of logic. We all have bias, and that bias affects us every single day. Yes, this bias can be viewed in a technical way in how it affects our data or software, but it's important to remember that cognitive bias affects every facet of our lives.

Remembering this makes us better engineers and better human beings.

« Back to Article List

This Small Corner