Networks are under constant security threats from various sources, including botnets, worms, spam, phishing, and denial of service attacks. The state of the art in cyber security typically focuses on information collected at the individual host and IP address levels; e.g., which IP address has been seen sending out spam. A prime example is the commonly used host reputation systems or blacklists (RBLs) that collect and distribute information about such externally observed malicious activities associated with individual host IP addresses. These are routinely used in filtering and blocking policies adopted by network operators, but the highly dynamic nature of IP addresses can severely limit the timeliness and accuracy of these lists, and tracking malicious IP addresses is not the same as tracking infected hosts.
We posit that information collected on individual hosts or IP addresses (be it malicious activities, network misconfigurations, or active threat instances) should be aggregated in an intelligent way, and that when viewed collectively, the totality of these data about a network exhibits fairly stable and thus predictive behavior over time. As a result, this information can be very indicative of the general cleanness of a network and how much risk it faces, which in turn enables more proactive policy design. This is because the factors influencing a network's cleanness or security posture generally vary on a relatively slow time scale, including various network policy related issues, such as operating systems and patch levels, firewall policies, password strength checks, the expertise and training of IT personnel, and even user awareness levels.
Under this project, we have started to build a global network reputation system that aims at providing the data and technology for accurate and quantitative assessment of the security posture at an organizational level. There are two main domains where such an assessment framework can be extremely useful.
The challenges in building such a system are many, including establishing a set of metrics to use for the assessment, developing techniques to process the data, determining the most effective way of utilizing such assessment in making security investment decisions and designing reputation-aware security policies. Toward this end, we have built technology that allows us to collect a wide variety of Internet measurement data related to a network's malicious activities, mismanagement, and active threats, among others. These diverse data sources are then aggregated to provide a holistic and dynamic view of a network entity's behavior over time. We are particularly interested in two types of metrics, the first concerning a network as a standalone entity irrespective of other networks in the same ecosystem, the second concerning a network as one of many inter-connected networks. This second type is crucial due to the interdependence or externality nature of network security, i.e., what one network does affects others.
In a parallel effort, by applying advanced machine learning techniques to the aggregate data, we are able to demonstrate our ability in assessing risks and predicting future security incidents for an organization. Such cyber incident forecasting offers a completely different set of characteristics as compared to detection techniques, which in turn enables entirely new classes of applications which are not feasible with the more commonly seen detection techniques alone. Recent data breaches, such as those at Target, JP Morgan Chase, and Home Depot, highlight the increasing social and economic impact of such cyber incidents and the importance in being able to make accurate forecasts.