CompTIA Security+: Resiliency and Automation Strategies ~ Constellations

This chapter covers a number of (you guessed it) resiliency and automation strategies that closely relate to previous chapters. In typical CompTIA Security+ guide style, certain themes get siloed out, like this one.

What is resiliency, and why do we want it? Resilient systems can quickly return to their normal state after some kind of disruption. Resilient systems have reduced risk associated with failure (and failure is inevitable).

This is a continuation of my blog post series on the CompTIA Security+ exam, where I share my studying and connect it to real-world events.

Automation and Scripting

The book defines automation within the context of systems administration:

The use of tools and methods to perform tasks otherwise performed manually by humans, thereby improving efficiency and accuracy, and reducing risk.

Automation is often done by scripts, which are “automated courses of action.” They offer several advantages over doing things by hand:

If they’re tested, prewritten scripts significantly lower the chance of user error
Scripts can be chained together to automate complex commands.
They save a lot of time, since they’re running at machine speed vs human input speed.

The security world even has some standards and protocols related to automation for vulnerability management, like SCAB (Security Content Automation Protocol).

Continuous monitoring describes a system that has monitoring built into it (rather than being an external event or action). This ties in with automation because automated dashboards and responses can be used in conjunction with monitoring.

Likewise, there is configuration validation, which can also make use of automation. When you start using a system, you should have already validated its configuration against security standards. You want the system to do what it is supposed to do, and only that (extra ports, services, etc. are disabled). Automated testing can make this validation process less error prone, and also easier to scale.

Templates and Backups

Templates are “master recipes” used to build servers, programs, systems, and so on. They are a crucial part of Infrastructure as a Service (IaaS). These automated, preconfigured templates are what allow services like Digital Ocean to offer a one-click install of a LAMP stack. Much like automation, templates allow for “rapid, error-free creation of configurations, connection of services, testing, deployment and more.”

Templates rely on master images, which is a pre-made, fully patched image of a given system. They also provide a clean backup of operating systems, applications, and so on (everything except the data).

If you have a non-persistent system, you don’t need backups (as much). Non-persistence is when a change to a system isn’t permanent, so once a user logs out or the system is rebooted, all new files (and any malware) are removed.

Snapshots are instantaneous ‘save points’ in time, typically for virtual machines. Apple’s Time Machine backup is another example. Snapshots are a form of backup that let you restore a system to a previous point in time. They’re usually much faster than normal recovery methods, but storage can be a problem.

By reverting to an older snapshot, you are reverting to a known state. Similarly, there’s also rollback to a known configuration, but the focus is more on returning to a known good configuration (vs a certain point in time).

You can also store a VM or OS on an external device. These are known as live boot media: a USB or optical disk that contains a complete bootable system. This can be handy for task-specific OSs (forensics, incident response, etc).

Elasticity and Scalability

These two terms are closely related, but not quite the same. Elasticity is the ability to dynamically increase the workload capacity of a system using on-demand hardware resources to scale up. Scalability is a design element that allows a system to accommodate larger workloads by either scaling up (better hardware), or scaling out (more nodes).

The book then goes through a number of related ideas:

Distributive allocation is the “transparent allocation of requests across a range of resources.” This directly addresses the availability aspect of security for a given system.
Redundancy is “the use of multiple, independent elements to perform a critical function, so that if one fails, there is another one that can take over the work.” Organizations might want to consider redundant ISPs and other services, in addition to redundant hardware.
High availability is a measure of a system’s ability to provide uninterrupted access to data and services, even in the event of a fault or disruption. Fault tolerance is a design objective to achieve high availability.

RAID

Lastly, the chapter talks about Redundant Array of Independent Disks, or RAID. This is a way of taking data normally stored in one disk location, and spreading it out among several disks. There are several different RAID options. RAID 0 through 5 are covered individually, RAID 6 and RAID 10 are mentioned briefly. These options are different combinations of how the data is spread across disks, and where/how parity bits and error checking are handled.