Building fail-safe software systems
Software systems are everywhere. And as their prevalence grows across every part of our lives, it’s essential that these systems are both trustworthy and fail-safe. We see this in specific industries, of course, like transportation and health care, where software failure can have deadly implications. But it also increasingly applies to other facets of modern life, where software’s dependability, accessibility and reliability are crucial.
With mechanical systems, we often have some warning before they fail – which can enable us to take action to minimize or prevent the impact. In the virtual space of software, it’s a different story.
When software fails, it often fails spectacularly, with significant impacts. And with AI and machine learning there are further complications, where we are often not able to even pinpoint causes of failure.
Building resilient systems
Malicious attacks and security breaches pose additional threats to software dependability and resilience. In December 2021, for example, the Canada Revenue Agency went offline as the administrators were worried about a potential security threat that was affecting organizations – including hospitals – around the world. The threat was a flaw in a common open-source logging tool used in cloud servers across industry and government that enables hackers to access data, embed malware and engage in other nefarious activities.
With the increasing use of software systems, there’s a parallel growing demand for software engineers who can design and implement resilient systems that will continue operating in the face of failure or security breach.
The MEL in Dependable Software Systems is a graduate degree for software professionals with several years of work experience who want to upgrade their skills in this critical area.
Three of the required courses enable students to acquire new skills in the field, including software testing, dependable systems design and software security. When I taught the resilient systems class, students had to build a system and then simulate a variety of failures. They competed with each other to see which system would be the last one standing. Similar applied learning projects are required in the other software engineering courses.
Students in the program also complete a capstone project. In 2021, students developed a Gradle plugin and a GitHub action for fuzz testing, which is emerging as a popular approach to testing software. Fuzzing tools have been developed for a variety of languages, including Java, and a team of two DSS students developed additional tools to simplify the use of fuzzing in software development. The Gradle plugin is now available for everyone to use from the Gradle Plugins website, making the students’ work available to the entire software engineering community.
Another team tackled the issue of how to cope with errors in training data used for machine learning. Most machine learning approaches today need a set of input and output pairs that correctly explain the behaviour of the system that we are trying to “learn.” Two students studied the use of ensemble methods as a mechanism for coping with erroneous data (assuming that incorrect outputs were linked to some of the inputs during the training phase) that was being used to train machine learning models. Their study found that one could use a collection of machine learning models and approaches such as voting to reduce the impact of errors in training data. This is a promising result because it is expensive to obtain error-free training data sets.
The need for professionals with the ability to assess and design safety-critical and fail-safe systems will only continue to grow.
Our program meets that need, and the business courses – which make up about half the curriculum – enable students to deepen their skills in communication, business and leadership, positioning them to successfully transition into management and leadership positions.
Dependable Software Systems
Gain the technical, business and project management skills to design and maintain reliable software systems.Read More