The threats of cyber attacks are growing increasingly challenging as a result of the increasing complexity of the enterprise network. With increased connectedness come more serious types of attacks, as breached systems are taken over and used as the launch point behind the firewall for insidious behaviors such as data exfiltration or ransomware attacks. More workers are distributed across geographic locations, requiring extensions to the networks. At the same time, those within the organization are much more connected in different ways. For example, bring your own device (BYOD) policies allow individuals to connect their personal devices to the corporate network, creating potential security holes. At the same time, multiple platforms used within the organization for communications, expanding the entry points for more sophisticated, higher volume cyber attacks.
The ability to effectively enable the security information and event (SIEM) analysts becomes significantly more difficult. Threat detection can take a significant amount of time, and often cannot be done until long after the breach has taken place. Developing more effective ways for analyzing and visualizing cyber threats is critical, and that is where graph analytics comes in.
Cyber security is effectively a graph problem: network traffic transactions link the external systems attempting to breach the environment with the internal systems that require protection. Each access attempt establishes a directed link between the source of the attempt and the target, and the details of the attempted access become the properties of the directed edge. You can capture this information using the NetFlow data, which is the network protocol that network vendor Cisco developed for collecting IP traffic and monitoring network traffic. NetFlow data is generated by the routers and is forwarded to a central repository where it can be stored for analysis. An approach to identifying breach risks and attackers uses combinations of different network analysis algorithms. Ranking algorithms (which evaluate the relative weights of “importance” of nodes within a graph) and clustering algorithms help in grouping systems in the graph based on network activity and characterizing cyber security risks.
There are two key challenges associated with creating and analyzing this type of graph: volume and scale. In a reasonably-sized network the volume of NetFlow packets to be analyzed is so great that it is difficult to identify threats in real time. For example, one hour of collection of NetFlow data in a corporate environment can generate tens, if not hundreds of millions of NetFlow records. The volume of links between the different systems over even short periods of time become so large that it is difficult to visually represent the graph, let along have an analyst discover anything by viewing the graph. And while big data platforms such as Hadoop and Spark can be used for the analytics processing, these paradigms remain largely batch-oriented, preventing any ability to identify an attack while it is actually happening.
One tack that is being explored blends the use of Blazegraph’s GPU-enabled graph management environment for managing the graph and applying analytics and Graphistry for visualization. Graphistry uses GPU acceleration to allow graphs containing huge numbers of edges to be visualized and manipulated by an analyst, while the ranking and clustering can be performed using applications written with Blazegraph’s DASL language.
The resulting capability provides a platform for rapidly analyzing NetFlow data and providing an interactive visualization environment that can be used by the SIEM analyst to quickly narrow down and identify suspicious activity. For example, by looking at a NetFlow graph with assigned PageRanks, analysts were able to find edge cases of data exfiltration by finding a node outside the boundary that was communicating with interior hosts and executing FTP transfers of files out of the network.
Under the right circumstances, graph technology is bound to transform the cyber security process. There may be tools today providing 99.9% accuracy, but with billions of communications, the number of false positives will overwhelm human analysts, so increasing precision by reducing the “noise” will simplify the cyber analyst’s process. Since the challenge is the massive volume of cyber data, the graph methodology needs techniques that can take those massive volumes of NetFlow transactions, use graph analytics to find communication patterns that might represent anomalous behavior, and try to highlight them quickly for the user. The combination of Blazegraph, Graphistry, and GPU acceleration provides this capability.