Over the years, I have come to realize that context is everything in cybersecurity. Way too often, we rely on just a single piece of information (or very few) to make decisions about risk, security postures, or the nature of security events. In fact, the ability to take onboard contextual factors is well beyond the reach of most tools out there. I have experienced that firsthand. I personally worked for a number of cybersecurity vendors covering 3 main domains of cybersecurity; identify, protect, and detect (referring to the NIST framework here) and I have seen how the lack of context inevitably leads to significant blind spots and crude approximations.
This is hardly surprising, contextual awareness has always been crucial in human learning and interactions. Machine learning (deep learning, AI, or whatever else you want to call it) has brought some of that and can incorporate some limited context, which still relies on some generic context provided by the dataset that the model is trained on. That is to say, it cannot take onboard previously unseen conditions or even subtle nuances in existing ones, and would generate completely unpredictable outputs in that case. They are also quite far from achieving it anytime soon. Not only that but efforts to introduce more context might even introduce other issues such as amplified bias, which would make them unusable across environments for instance. But that’s more of a modeling issue and a story for another time!
So what context are we talking about here?
Most advancements in context awareness in the cybersecurity industry have been focused on data augmentation, where the input to modeling is further supplemented by additional data from other sources (generally from 3rd parties). So beyond the ability of the modeling to take onboard a given piece of context, or what particular piece of information can actually be considered as a relevant context of a given task, it is primarily a data availability issue. Here is a list of some crucial contextual data that is generally missing to security tools to improve their modeling and decision-making:
- Assets’ whereabouts and criticality: security solutions should be able to view the context (e.g., owner, type of data hosted, network view, business and technical impact, cost of failure, etc.) necessary to dynamically prioritize high-value assets. I haven’t seen accurate representations of any of these in current security tools. At least those I’m familiar with.
- Software posture: poor visibility into what’s running in the environment and third-party apps awareness in general, which increase the attack surface and will inevitably leave out possible avenues to threat actors to exploit.
- Threat landscape: understanding the attack and the attacker, exploits and techniques, how threat actors operate to carry out attacks and pass through defenses, and staying up to date, are all fundamental to both stopping attacks and preventing them. This is context data that should not be taken out of context though! Meaning, the ability to dynamically comprehend and cut through the noise to what is relevant for you is a huge challenge. Here again, we still don’t know how to do it properly.
- Co-occurrence/Sequence of events: collected telemetry and logging have their limits. These are generally reactive. But how do we know what we need to know!? I’d suggest making it random, and working it out from there!
- Others: the setup of other security controls, the industry, location and geopolitics, news and social context, risk appetite, historical events, and I’m certainly missing others, as this ultimately depends on how different people and organizations think about risk in the first place.
I believe attaining 100% clarity through context is not possible. Most vendors would not have the tools/technology nor the data to do it. However, incorporating anything from the above will definitely change the picture, and will help the industry and the community deliver better solutions.