As I read the “we solve ransomware” emails in my inbox and saw comments on Twitter and Slack, I started to think about how to solve ransomware once and for all. It sounds like a difficult task, but I think it is all about an architecture: an architecture that uses modern ideas. A solution needs to combine security with data protection. I have written about detecting ransomware before, but now we need to find a way to include everything we know to ensure institutions can recover quickly from a new attack while preventing known attacks. This concept came to fruition at VeeamON 2017, and I briefly spoke about it on The Cube. Now it is time to put everything together.
This proposed architecture combines security capabilities with data protection capabilities to cover not only prevention but also detection and subsequent recovery of unknown ransomware and even malware attacks. The goal is to keep the business running with little to no downtime. The key components of this architecture comprise security, data protection, and archival storage. However, it is data protection with a twist. Figure 1 offers a brief overview of this architecture. It is the same architecture spoken about on The Cube.
Now let us break down the components necessary:
- Prevention is where existing and future security tools come into play. These tools prevent attacks either by quarantining web and email attachments or by exploding those attachments within a cyber-security range before delivery. In some cases, the malicious changes occur outside of email and web pages, instead taking the form of direct attacks through exposed ports (like the approach WannaCry has taken). To solve these issues, we need better firewalls and other controls to limit access. Once these rules are created, they need to be adopted as a preventative.
- Detection overlaps security and data protection. Detection covers the ransomware cases not yet prevented. By using canary files, behavioral analytics, and rates of change for write activity, detection should detect issues either whenever the canary files are tested (at least every five minutes), or when a backup or replica is created (perhaps every five minutes for critical systems).
- Hold Logic is purely a data protection capability, as the hold only applies to full or synthetic full backup images of files or systems. The idea is that once ransomware is detected in the current recovery point, a hold is placed on a previous recovery point. That hold includes conversion to a full image and storage of the previous recovery point on some immutable storage device or, if no immutable storage device is available, then to an air-gapped location.
- Immutable Storage is the new player and the twist to data protection I mentioned earlier. Immutable storage is storage that provides the ability to do versioned writes where each write can no longer be changed once written. Some object stores are good candidates for immutable storage as long as previous and current versions of a write can no longer be modified once written. Each versioned write is either a file or image that was protected. This is not usually the data protection target but a replica of that target: a secondary set of storage that has a different control plane and interface than the data protection targets. This way, if standard targets are attacked, the immutable storage tier can be a recovery source. Some companies currently combine the data protection target and the immutable storage into one layer.
- Archive Storage is the traditional air-gapped archive tier of a data protection architecture. The goal of archive is to provide a way to recover the business when all else fails. As such, it must participate in an anti-ransomware architecture. Recovery could be to a cloud (DRaaS), a hot site, or even back to the same systems.
- Instant Recovery is another twist to data protection. While instant recovery does exist in many data protection systems, it is often a manual process. We need to change that to a workflow that is fired once ransomware is detected. The gap between detection and recovery needs to be as close to zero as possible. This does require one to trust one’s automation.
The big picture of this architectural approach to anti-ransomware is to also keep the control planes for each subsystem segregated from each other. That way, if one subsystem is attacked, the others do not fail at all. Even so, an overarching workflow logic is required to ensure that instant recovery fires as expected, that the hold logic will work, etc. While the control planes are segregated, the workflow can use their APIs to pull it all together with as much automation as required.
Let me know your thoughts. What do you think we should do about ransomware? How do you provide anti-ransomware support?