DIMACS Workshop on Algorithmic Information Fusion and Data Mining (WAIFDM)

September 19 - 20, 2013
DIMACS Center, CoRE Building, Rutgers University

Organizers:
Frank Hsu, Fordham University, hsu at cis.fordham.edu
Fred Roberts, DIMACS, froberts at dimacs.rutgers.edu
Alexis Tsoukias, University of Paris and LAMSADE (CNRS), tsoukias at lamsade.dauphine.fr
Presented under the auspices of the Special Focus on Algorithmic Decision Theory, the Special Focus on Information Sharing and Dynamic Data Analysis and in partnership with the European Consortium ALGODEC.

Abstracts:


Miguel Couceiro, University Paris Dauphine

Title: Qualitative Learning through "Fitness"

We will consider the problem of interpolating points in a space by lattice polynomials. This lattice variant differs from the classical interpolation problem, where points are interpolated by real polynomials, in many aspects. For instance, existence of solutions is not guaranteed and, even when solutions exist, usually they are not unique. Hence, we will present necessary and sufficient conditions that guarantee the existence of interpolating lattice polynomials, and provide complete descriptions of such solutions, when they exist. Apart from the theoretical interest, this lattice version of the interpolation problem constitutes a tool in fields pertaining to decision making and artificial intelligence. If time allows, we will thus discuss its potential use in preference learning within the qualitative setting of multicriteria decision making.

Some of the material presented constitutes ongoing research in collaboration with Tam\'as Waldhauser (University of Szeged, Hungary), Didier Dubois and Henri Prade (IRIT, France).

Bio:

Miguel Couceiro received his PhD. in Mathematics from the University of Tampere, Finland, in 2006. He was a postdoctoral fellow at the University of Luxembourg (2007-2012). Currently he is Associate Professor at Université Paris Dauphine and he has two honorary positions of Docent (Adjunct Professor) in Mathematics at University of Tampere (since December 2007) and in Discrete Mathematics at Tampere University of Technology (since March 2008). His research interests can be found in discrete mathematics, theoretical computer science, and decision mathematics. His earlier work focused on function theory, including aggregation theory, clone theory, multiple-valued logic, and theory of Boolean and pseudo-Boolean functions. His recent works are pertaining to decision making, in particular, preference modeling and learning in the qualitative setting (with particular emphasis on aggregation, decomposition and reconstruction techniques). The tools used are borrowed from universal algebra, relational theory (mainly, order theory), combinatorics and theory of functional equations. He has more than 70 papers in international journals and conference proceedings, and he has co-organized several international conferences and colloquia.


Xiaoxu Han, Fordham University

Title: Disease Network Marker Query from Next Generation Sequencing (NGS) Data

As a typical big data, next generation sequencing (NGS) data pose acute challenges in bioinformatics and data mining. Although there are several differential expression methods proposed to conduct differential expression analysis from different viewpoints, there is no previous investigation on the biomarker discovery from NGS data, which can play an essential role in personalized medicine. In this study, we proposed a novel disease network marker discovery algorithm called NGS-Marker to query differentially expressed network markers from NGS data, along with a purely data-driven feature selection algorithm. Compared with the network markers from other omics data (e.g. microarray), the disease network markers demonstrated high diagnostic accuracy and reproducibility, in addition to its effective gene marker inference capability. To the best of our knowledge, as the first study in NGS network marker discovery, our work not only bridges transcriptomics and systems biology, but also contributes to clinical diagnostics, large scale biological data mining, and information fusion.

This work is jointed with Dr. Frank Hsu, and Henrique Valim at Fordham University

Bio:

Dr. Henry Han is an Associate Professor in the Department of Computer and Information Science at Fordham University. He got his Ph.D. in Computational Science at University of Iowa in 2004. He published more than thirty papers in leading journals and conferences in bioinformatics, data mining, and graphics such as BMC Systems biology, BMC Bioinformatics, Bioinformatics, ACM Transactions in Computational Biology and Bioinformatics, Journal of Bioinformatics and Computational Biology, SIAM Journal on Computing, SDM, CSB, and RECOMB etc. His current interests include bioinformatics, machine learning, cyber security, and financial informatics,


Frank Hsu, Fordham University

Title: Cognitive Diversity vs. Statistical Correlation in the Analytics and Fusion of Big Data

The concept of correlation, first defined in 1888 by Francis Galton to quantify the statistical relationship between two sets of data values using anthropometric data, has been used widely in the small-data world. Recently, correlations have been used in the big-data environment to search for possible cause-effect relations or to identify useful proxies. In a big-data world, data with characteristics of volume, velocity and variety have to be analyzed and fused in order to provide value for stake-holders in scientific discovery, technology innovation, business sustainability, social analysis, and knowledge management. Big data analytics, more data-driven than hypothesis-driven, requires new paradigm change blending techniques and methods from computing, mathematics, and statistics, as well as informatics (including machine learning, data mining, information fusion, and knowledge discovery). My talk will cover: (1) the concept of cognition diversity (CD) as opposed to and in complement with the statistical correlation and (2) the use of CD to measure the diversity among and combine multiple scoring systems (e.g.: multiple feature systems, multiple classifier systems, multiple neural nets, multiple data mining systems, and ensemble of multiple models). Examples are drawn from various domains in the big-data landscape including bioinformatics, virtual screening, target tracking, cognitive neuroscience, affective computing, and corporate revenue prediction.

Bio:

Frank Hsu is the Clavius Distinguished Professor of Science and a professor of computer and information science at Fordham University in New York City. He is Vice Chair of the New York Chapter of the IEEE Computational Intelligence Society.

Hsu's research interests include combinatorial method; network interconnection and communications; and computing, informatics and analytics. Combinatorial fusion analysis(CFA), an algorithmic information fusion paradigm, proposed and developed by Hsu and colleagues has been used in a variety of domains including bioinformatics, virtual screening, target tracking, information retrieval , financial informatics, and cognitive neuroscience.

Dr. Hsu has served on several editorial boards including Journal of Interconnection Networks , Pattern Recognition Letter, IEEE Transactions on Computers, Networks, International Journal of Foundation of Computer Science, Journal of Advanced Mathematics and Applications, and the book series " Health Information Science" (Springer). Receiving a MS degree from the University of Texas at El Paso and a Ph.D. degree from the University of Michigan, Hsu is a Fellow of the New York Academy of Sciences(NYAS), the Institute of Combinatorics and Applications(ICA), and the International Institute of Cognitive Informatics and Cognitive Computing(ICIC). He is a Senior member of the IEEE.


Melvin F. Janowitz, DIMACS

Title: Generalized Oligarchies

In cases of a medical, terrorist, or natural emergency there often is a need to simultaneously reach multiple but possibly related decisions relating to public safety. A recent newsworthy event involves the explosions at the Boston Marathon. This suggests a study of the direct product of oligarchies involving the same collection of agents, but analyzing different but possibly related issues. The talk will relate conditions that normally involve social networks with conditions that have arisen in developing the fundamentals of the structure of finite lattices. The need for a simultaneous analysis of related conditions is that a solution to one problem may adversely affect the possible solutions to some other related problem.

Bio:

Retired as a Full Professor after 33 years of service from University of Massachusetts at Amherst. Included therin was a 4 year term as Assistant Dean for the College of Natural Sciences and Mathematics. Appointed Associate Director of DIMACS in 2000, and still serving in that role. Published 3 books, and over 90 refereed research papers. President of Classification Society 2006-7. Organized many workshops and two annual meetings of CS. Served on Editorial Board of Mathematical Social Sciences as well as Journal of Classification. Member of several professional societies.


Howard Li, University of New Brunswick, Canada

Title: Information Fusion in Robotics and Unmanned Vehicles

Data centric algorithms have been developed and widely used in machine learning. In this talk, a few well know algorithms will be introduced first:

Then, we will present some real-world applications of the above algorithms. Results of our current research in robotics and unmanned vehicles will be presented. Unmanned vehicles and robots usually are related to situations involving hazardous environments, repetitive and menial tasks. There is a growing demand and interest in the sensing, perception and navigation control of unmanned ground vehicles (UGVs), unmanned aerial vehicles (UAVs) and autonomous underwater vehicles (AUVs). Information fusion is a fundamental issue in data acquisition, state estimation, perception and navigation for robotic systems.

Bio:

Howard Li is an Associate Professor in the Department of Electrical and Computer Engineering, University of New Brunswick, Canada. He is a registered professional engineer in the Province of Ontario, Canada. He is a senior member of IEEE. He received the Ph.D. degree from the University of Waterloo, Canada. He worked with Atlantis Systems International, Defence Research and Development Canada, and Applied AI Systems Inc. to develop unmanned ground vehicles, unmanned aerial vehicles, autonomous underwater vehicles and mobile robots for both domestic and military applications. His research interests include sensor fusion, state estimation, algorithms, control theory, unmanned vehicles, mechatronics, robotics, multi-agent systems, artificial intelligence, motion planning, and simultaneous localization and mapping.


Chris Mesterharm, Applied Communication Sciences

Title: Improving Repeated Labeling for Crowdsourced Data Annotation

The emergence of crowdsourcing resources such as Amazon Mechanical Turk has made it possible to obtain labels for data more cheaply than ever before, but often with significant unreliability. To improve reliability, we can view Turk as a noisy oracle that can be queried multiple times to label an item. The most common approach for doing this, consensus labeling, uses the majority vote of a fixed number of labels as the final label for an item. We present a new approach for repeated labeling, Beat-by-k, which repeatedly asks for labels from the oracle until the number of labels from one category outnumbers the other by some fixed amount. We show through both theoretical and experimental results that Beat-by-k requires fewer calls to the oracle to reach a given level of accuracy compared to consensus labeling and other annotation strategies.

We also show that Beat-by-k is a better alternative in the context of generating labels for machine learning algorithms. We give upper bounds on the performance of a simple empirical risk minimization algorithm and show how Beat-by-k can be used optimize the bound in terms of the cost to label data. We compare these results with more practical algorithms on real world datasets.

Bio:

Chris Mesterharm has been a senior research scientist at Applied Communication Sciences since March of 2012. Previously, he was research scientist at Rutgers University working on computer science topics including active learning and computational advertisement, and a visiting professor at Fordham University in the Computer and Information Science department. He completed a computer science Ph.D. in 2007 at Rutgers University and was previously a research assistant working on machine learning at NEC Research Institute with Nick Littlestone. His research focus has been on machine learning with a particular focus on on-line learning, a type of inductive learning that can adapt to a changing environment. He has written extensively on this and other machine learning topics.


Fred Roberts, Rutgers University

Title: Consensus List Coloring of Graphs

In graph coloring, one assigns a color to each vertex of a graph so that neighboring vertices get different colors. We shall talk about a consensus problem relating to graph coloring and discuss a variety of applications arising from traffic phasing, channel assignment, scheduling, routing, fleet maintenance, and DNA physical mapping. In many applications of graph coloring, one gathers data about the acceptable colors at each vertex. A list coloring is a graph coloring so that the color assigned to each vertex belongs to the list of acceptable colors associated with that vertex. We consider the situation where a list coloring cannot be found. If the data contained in the lists associated with each vertex are made available to individuals associated with the vertices, it is possible that the individuals can modify their lists through trades or exchanges until the group of individuals reaches a set of lists for which a list coloring exists. We describe several models under which such a consensus set of lists might be attained and connect them to the applications of interest.


Gökçe Sargut, Governors State University

Title: Organizations and Environmental Complexity

Not only are organizations complex entities by definition, but conventional wisdom also suggests that they have to survive and thrive in environments that are characterized by increasing complexity. This presentation provides a brief introduction to conceptual tools and perspectives that organizational scholars, strategists, and social scientists have developed in the ongoing exploration of how organizations can manage environmental complexity. We argue that in order to facilitate organizational adaptation, managers need to make fundamental changes in how they make strategic decisions. This requires renewed emphasis on alternative forecasting methods, mitigating risks, determining tradeoffs, and ensuring the diversity of thought. Some of the concepts that will be covered include unintended consequences, rare events, irreversibility, loose-coupling and redundancies, anticipation and resilience, and organizational sensemaking.

Bio

Gökçe Sargut is an assistant professor in the College of Business and Public Administration at Governors State University. He received his Ph.D. from Columbia University. He specializes in the areas of organization theory and strategic management. His current research focuses on social capital, trust in inter-organizational alliances, managing in complex environments, and cultural production.


Christina Schweikert, St. John's University

Title: Bioinformatics Application of Information Fusion: ChIP-seq Peak Finding

In the bioinformatics domain, there are any areas in which information fusion can enhance existing methods and algorithms. In this example, we use information fusion to improve the performance of ChIP-seq peaking finding algorithms by fusing pairs of systems. Sequencing data is used to analyze genome-wide protein-DNA interactions, and since this is an emerging field, there is a large number of computational and statistical techniques for locating protein binding sites. In our fusion approach, we define methods to merge and rescore the regions that have been identified by two peak detection systems. We analyze the performance based on average precision and coverage of transcription start sites. Based on the performance of the merged systems, we can show that in this case ChIP-seq peak finding can be improved by applying score or rank combination. This approach of system combination and fusion analysis can be of benefit to other applications in bioinformatics where scoring algorithms are utilized; for example, an experiment where genes, proteins, or coding regions are scored by a variety of algorithms.

Bio:

Christina Schweikert is a Clare Boothe Luce Assistant Professor of Computer Science at St. John's University in the Division of Computer Science, Mathematics, and Science. Dr. Schweikert completed her undergraduate degrees in Computer Science and General Science at Fordham University, M.S. in Computer Science at New York Institute of Technology, and Ph.D. in Computer Science from the Graduate Center of the City University of New York. Her research interests include: programming languages, bioinformatics, and medical informatics. Dr. Schweikert has also taught at Fordham University, the State University of New York and City University, of New York. She serves as the Assistant Vice President and Webmaster for the Global Business and Technology Association, as well as Information Technology track chair for the association's annual international conference. Dr. Schweikert is a member of the Association for Computing Machinery (ACM) and IEEE Computer Society and Computational Intelligence Society.


Alexis Tsoukiàs, CNRS - LAMSADE, Université Paris Dauphine

Title: Social Choice inspired Ordinal measurement

The presence of uncertain and/or imprecise information often leads to use ordinal measures when some complex reality needs to be represented (typically for some decision making purposes). This is all the more true when different sources of information need to be taken into account: the resulting ordinal scale resulting from putting together many different attributes (at their turn potentially ordinally measured). In the presentation we discuss how social choice inspired procedures can be used in order to construct ``multidimensional ordinal measures'' generalising well known rating methods. In the presentation we also review how the different parameters of such procedures can be learned through examples.

Bio:

Alexis Tsoukiàs (Greece, 1959) is a CNRS research director at LAMSADE, Université Paris Dauphine. He holds (1989) a PhD in Computer Science and Systems Engineering from Politecnico di Torino (Italy) where he also graduated engineering studies. His research interests include subjects such as: multiple criteria decision analysis, non conventional preference modelling, policy analytics, applied non classical logics, ordinal mathematical programming, artificial intelligence and decision theory. He is the co-author of two books and more than 70 journal articles and book contributions. He has been vice-president of ROADEF (the french OR society) as well as President of EURO (the European association of OR societies). Since 2007 he was coordinator of the European COST Action IC0602, Algorithmic Decision Theory funded within the FP7. He served the Research Administration in several positions, the last being the National Committee of the CNRS (elected member since 2008). Presently he is the Director of the LAMSADE. Besides teaching to several post-graduate classes in Paris and world wide, he ocasionally practices decision support, mainly in the area of public policy. He has been invited to more than 30 Universities world wide. He is member of several editorial boards, besides editing special issues of journals and conference volumes. He has been involved in the Programme and Organisation Committee of several conferences in Decision Analysis, OR and AI. Personal web page: http://www.lamsade.dauphine.fr/~tsoukias.


Zhi-Hua Zhou, Nanjing University

Title: Multi-View Learning, Ensemble Methods and Diversity

There are many real-world tasks where the data have multiple views, i.e., multiple feature sets, and each instance is described by multiple feature vectors in different feature spaces simultaneously. Rather than simply concatenating the features into a single feature vector, multi-view learning explicitly exploits the view split. Such techniques are particularly effective in learning with unlabeled data (i.e., semi-supervised learning, active learning). Ensemble learning is another branch of machine learning, where multiple learners are trained to solve the same task. It is well known that an ensemble is usually significantly more accurate than a single learner. Representative ensemble methods include Boosting and Bagging. In this talk, we will briefly introduce some recent advances in multi-view learning, showing that the key of this learning paradigm does not really lie in the existence of multiple views (although we can do more things if there are really multiple views), whereas the "diversity" plays a fundamental role. It is interesting to notice that it is well accepted that the "diversity" is also the key of ensemble methods. Thus, it connects the two active machine learning branches: "learning with unlabeled data" and "ensemble learning"; they are almost separately developed because of different research philosophies although both trying to improve generalization. We will also briefly introduce some recent explorations to diversity from information theory perspective.

Bio:

Zhi-Hua Zhou is a Cheung Kong professor at Nanjing University. His research interests mainly include machine learning, data mining, pattern recognition and multimedia information retrieval. He has published more than 100 papers, authored the book "Ensemble Methods: Foundations and Algorithms" (2012), and holds 12 patents. He is the recipient of the IEEE CIS Outstanding Early Career Award, Fok Ying Tung Young Professorship Award, Microsoft Young Professorship Award, and various awards including nine international journal/conference paper or competition awards. He serves/ed as Executive Editor-in-Chief of "Frontiers of Computer Science" (Springer), Associate Editor or Editorial Boards member of "CM TIST", "IEEE TKDE" and many other journals. He is the Founder of ACML, and Steering Committee member of PAKDD and PRICAI. He served as Area Chair or PC member for almost all top conferences in his areas. He is the Chair of the AI&PR Technical Committee of the China Computer Federation, Chair of the Machine Learning Technical Committee of the China Association of AI, Vice Chair of the Data Mining Technical Committee of the IEEE Computational Intelligence Society, and Chair of the IEEE Computer Society Nanjing Chapter. He is a Fellow of the IAPR and Fellow of the IEEE.


Previous: Program
Workshop Index
DIMACS Homepage
Contacting the Center
Document last modified on September 10, 2013.