DIMACS/RUCIA Workshop on Information Assurance in the Era of Big Data

February 6, 2014
DIMACS Center, CoRE Building, Rutgers University

Thu Nguyen, Rutgers University, tdnguyen at cs.rutgers.edu
Manish Parashar, Rutgers University, parashar at caip.rutgers.edu
Hoang Pham, Rutgers University, hopham at rci.rutgers.edu
Rebecca Wright, Rutgers University, rebecca.wright at rutgers.edu
Hui Xiong, Rutgers University, hxiong at rutgers.edu
Presented under the auspices of the DIMACS Special Focus on Algorithmic Foundations of the Internet, the DIMACS Special Focus on Cybersecurity, and The Rutgers Center for Information Assurance (RUCIA), with additional support from the National Science Foundation under grant number DGE-1241315.


Ramón Cáceres, AT&T Labs

Title: DP-WHERE: Differentially Private Modeling of Human Mobility

Models of human mobility have broad applicability in urban planning, ecology, epidemiology, and other fields. Starting with Call Detail Records from a cellular telephone network that have gone through a straightforward anonymization procedure, the WHERE modeling approach generates realistic sequences of locations and times for arbitrary numbers of synthetic people moving across metropolitan-scale regions. The accuracy of WHERE has been validated against billions of location samples for hundreds of thousands of cellphones in the New York and Los Angeles areas. In this work, we introduce DP-WHERE, which modifies WHERE by adding controlled noise to satisfy the rigorous requirements of differential privacy while preserving accuracy.

This is joint work with Darakhshan Mir, Sibren Isaacman, Margaret Martonosi, and Rebecca Wright. It was presented at the 2013 IEEE International Conference on Big Data.

Tina Eliassi-Rad, Rutgers University

Title: Affecting Dissemination on Graphs

Controlling the dissemination of an entity (e.g., a meme or a virus) on a large graph is an interesting problem in many disciplines such as epidemiology, computer security, marketing, etc. Previous studies have focused mostly on removing or inoculating nodes to achieve the desired outcome. We shift the problem to the level of edges and ask: which edges should we add or delete in order to speed-up or contain a dissemination? In this talk, I will describe our effective and scalable algorithms for solving these dissemination problems. I will also present experiments on real topologies of varying sizes. Lastly, I will describe some limitations of our approach and describe a promising solution based on functional roles of nodes and edges.

This is joint work with Hanghang Tong, B. Aditya Prakash, Michalis Faloutsos, Christos Faloutsos, and Long T. Le. to Tina Eliassi-Rad's abstract, please?

Murat Kantarcioglu, University of Texas at Dallas

Title: Risk Aware Data Processing over Hybrid Clouds

Organizations today collect and store large volumes of data that they would like to analyze for a multitude of purposes. Often the in-house computational capabilities of such organizations cannot easily support such complex data analyses. While such limitations were a serious impediment in the past, emerging cloud computing platforms (e.g., Amazon's EC2) offer a viable alternative. Despite numerous benefits, organizations that especially deal with potentially sensitive data (e.g., business secrets, medical records, etc.), hesitate to embrace the cloud model completely due to security concerns. A possible approach to overcome the security challenge is for organizations to encrypt data prior to outsourcing it to the cloud and to perform data analysis over encrypted data in the cloud. Although the past decades of research have made significant progress on developing cryptographic approaches that allow limited computation over encrypted data, no general solution efficient and cost effective enough for practical use has yet emerged.

As organizations shift towards a cloud strategy, an alternate approach that is gaining traction both in research and in practice is that of a hybrid cloud paradigm, which enables composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability. In this talk, we discuss how such hybrid cloud architecture could be used to securely outsource data processing while managing risks. Furthermore, we present our vision for the formalization of the workload partitioning problem over hybrid clouds such that an end-user's requirements of performance, data security and monetary costs are satisfied. Finally, we discuss how encrypted data processing capabilities could be integrated to our risk aware data processing architecture.

Panagiotis Karras, Rutgers University

Title: Publishing Microdata with a Robust Privacy Guarantee

Today, the publication of microdata poses a privacy threat. Vast research has striven to define the privacy condition that microdata should satisfy before it is released, and devise algorithms to anonymize the data so as to achieve this condition. Yet, no method proposed to date explicitly bounds the percentage of information an adversary gains after seeing the published data for each sensitive value therein. This talk introduces b-likeness, an appropriately robust privacy model for microdata anonymization, along with two anonymization schemes designed therefor, the one based on generalization, and the other based on perturbation. Our model postulates that an adversary's confidence on the likelihood of a certain sensitive-attribute (SA) value should not increase, in relative difference terms, by more than a predefined threshold. Our techniques aim to satisfy a given b threshold with little information loss. We experimentally demonstrate that (i) our model provides an effective privacy guarantee in a way that predecessor models cannot, (ii) our generalization scheme is more effective and efficient in its task than methods adapting algorithms for the k-anonymity model, and (iii) our perturbation method outperforms a baseline approach. Moreover, we discuss in detail the resistance of our model and methods to attacks proposed in previous research.

Chung-sheng Li, IBM Research

Title: Provenance Technologies for Continuous Assurance in the Big Data Era

Business Integrity Management formalizes new layer of abstraction for the business & closes the loop between the abstraction and the reality of the business. Contemporary enterprises face the challenges of being increasingly governed by various regulations (Sarbane-Oxley Act, HIPAA, Basel II, etc.) and are being held accountable for meeting the high expectations set by the boards, investors, regulators and other stakeholders, as business functions and decisions are increasingly dependent on the insights derived from big data. Significant losses caused by inadequate risk management and controls have plagued the industry over the past decade with a significant increase in the number of firms involved in large failures over the past few years. Costs associated with compliance are significantly impacting enterprises. However, the costs associated with non-compliance, fraud, improper risk management and loss of business integrity are even more significant. In all of these circumstances, Business Integrity Management becomes crucially important.

The view of integrity is evolving itself. The evolution from transactional integrity to business integrity is underway as businesses evolve to develop a consistent and interdependent view of their policies, processes, and core entities both internally and externally. Managing Business Integrity requires a holistic view of governing policies, business processes and core entities. At present these are at best managed independently.

In this talk, I will focus on the technology implications of managing business integrity in the big data era are as follows:

John-Francis Mergen, Raytheon BBN Technologies

Title: Opportunities and Challenges for Internet-derived Big Data

For commercial enterprises, the use of big data in adding value to a customer product is becoming a well-developed practice. However, the use of Internet derived big data to strengthen the internal value chain for industrial activities is less mature. Opportunities exist for the use of big data techniques for realizing substantial improvements in industrial processes such as software development, large scale sensing, identification and elimination of system inefficiencies and improvements in large-scale control of industrial processes.

Making use of widely collected and dissimilar data presents problems in building sufficiently reliable and stable results that can be used in industrial processes. An industrial activity typically needs solid, well-characterized inputs to provide economic returns. Much of Internet derived big data does not fit this description. One of the major challenges to the industrial use of Big Data is creating systems that are responsive and which can integrate disparate information. Once the data can be integrated, a platform is established that allows the industrial organization to optimize or eliminate redundant, manual processes.

This talk provides examples of ways that researchers at Raytheon BBN Technologies are attempting to deal with these challenges.

Ramendra Sahoo, PWC

Title: Bridging the Regulatory and Compliance Gap through Data Driven Strategies

The convergence of emerging regulations highlight some key organizational challenges that many financial services firms face today. To begin with, they must update their data transparency measures and introduce new processes to meet the growing regulatory demand for granular information. Next, they must address an array of qualitative and quantitative assessments that are designed to evaluate the strength of various internal processes as well as the feasibility of their strategic plans. From timely reporting of risk-weighted assets, exposures, and key performance and control indicators, to reckoning with difficult regulatory mandates and their impact on model validations; and managing risk and finance integrations are just some of the major challenges today's top executives confront everyday.

This talk highlights that an end-to-end data- and analytics-driven framework, beginning with underlying data needs and extending to independent model validation and reporting at the highest level of operations, can align the interests of regulators and bank management.

By rationalizing processes, internal workflows, and overall decision-making through the application of data analytics, organizations can drive efficiency in areas such as capital management, capital diagnostics, and capital availability, leading to a reduction in capital waste.

Shambhu J. Upadhyaya, State University of New York at Buffalo

Title: Insider Threat Analysis and Countermeasures

Threat Management based on misuse signatures is a first step in dealing with insider attacks, but there are still several fundamental challenges, beginning with the understanding of the insider threat. In fact, any good model or assessment methodology will be already a significant advance. In this talk, we will first look into the challenges and examine some of the recent attempts to address them. This includes a new threat assessment methodology by which specific and targeted countermeasures can be deployed against stealthy attacks for which no effective solutions currently exist. We briefly outline this scheme, demonstrate a proof-of-concept prototype and show how this scheme can be used to assess insider activities and harden the network against insider attacks. This research has been funded by DARPA.

Previous: Program
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on November 13, 2014.