Title: On the expressiveness of comparison queries.

Speaker: **Shay Moran**, IAS

Date: Wednesday, January 24, 2018 11:00am-12:00pm

Location: CoRE Bldg, Room 301, Rutgers University, Busch Campus, Piscataway, NJ

Abstract:

Comparisons are a classical and well studied algorithmic tool that is used in a variety of contexts and applications.

I will present three examples that manifest the expressiveness of these queries in information theory, machine learning, and complexity theory.

**20 (simple) questions.** A basic combinatorial interpretation of Shannon's entropy function is via the "20 questions" game. This cooperative game is played by two players, Alice and Bob: Alice picks a distribution π over the numbers {1,...,n}, and announces it to Bob. She then chooses a number x according to π , and Bob attempts to identify x using as few Yes/No queries as possible, on average. An optimal strategy for the "20 questions" game is given by a Huffman code for π: Bob's questions reveal the codeword for x bit by bit. This strategy finds x using fewer than H(π)+1 questions on average. However, the questions asked by Bob could be arbitrary. We investigate the following question: Are there restricted sets of questions that match the performance of Huffman codes? We show that for every distribution π, Bob has a strategy that uses only questions of the form "x<c?" and "x=c?", and uncovers x using at most H(π)+1 questions on average, matching the performance of Huffman codes in this sense.

**Active classification with comparison queries.** This part concerns an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query). We focus on the class of half-spaces, and show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size n using approximately O(logn) queries.

**Nearly optimal linear decision trees for k-SUM and related problems.** We use the above active classification framework to construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k, we construct linear decision trees that solve the k-SUM problem on n elements using O(n log n) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k-subsets; when viewed as linear queries, comparison queries are 2k-sparse and have only {−1,+1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to poly-logarithmic terms.

Based on joint works with Yuval Dagan, Yuval Filmus, Ariel Gabizon, Daniel Kane, Shachar Lovett, and Jiapeng Zhang.

See: https://sites.google.com/view/dimacs-theory-seminar/home