2023 DIMACS REU Projects

DIMACS is very proud for the achievements of the 2023 DIMACS REU students during their summer research experience. This page includes the abstracts of the final papers that the students submitted to the program, summarizing their summer research.

Yakov Burton (Rutgers University-New Brunswick NJ)
Some studies on topics relevant to cohomology rings of flag varieties
Mentor: Anders Buch, Mathematics

Abstract: This project was focused on flag varieties and computations in cohomology rings. I studied symmetric polynomials and covered different basis for this family of functions, as well as computations in one of these basis, the set of Schur functions. I also studied root systems, Weyl Groups and the classification of irreducible root systems. Finally, I worked on some actual cohomology rings and how they connect to root systems and multiplication of Schur functions.

Iris Chang (Columbia University in the City of New York NY)
Predicting Correlation of Bagging Estimators in Logistic Regression
Mentor: Pierre Bellec, Statistics

Abstract: Logistic regression is used to make a prediction between two different outcomes based on a data set. Typically, when the sample size is much larger compared to a fixed dimension, maximum likelihood estimation is used to estimate the parameters of the model, in which case the estimate has many convenient properties such as being unbiased. It was shown by Sur and Candes (2018) as well as Salehi et al. (2019) that these assumptions do not hold in the high dimensional regime where the sample size and dimension are proportional. While these two groups were able to find adequate methods of characterizing model performance in high dimensions, there is an absence of work on the performance and impact of bagging on high dimensional logistic regression models. In our case, bagging refers to the method of dividing the larger data set into two or more overlapping subsets and fitting a logistic regression model on each subset before aggregating the parts together. This work aims to show a single scalar that would be able to predict the correlation between the two unaggregated estimates. Drawing from previous results in linear regression and the unbagged setting, we were able to successfully infer this result.

Jordan Chong (New York University NY) & Matt Lu (Washington University in St Louis MO)
Truth Learning in a Social Setting
Mentor: Jie Gao, Computer Science

Abstract: We are interested in designing social networks which support the spreading of true information while limiting the ability for disinformation to spread. Having access more information doesn't always lead to discovering the truth as phenomena known as "information cascades" can lead to herd mentality propagating false information. Representing social networks with graphs, we studied two areas: high degree and low degree graphs. High degree graphs have the ability to quickly aggregate information and decipher truth into a few high value nodes with many neighbors. One example we studied was the preferential attachment model by Albert-Barabasi. In low or constant degree graphs, a more nuanced approach is needed to allow truth learning to succeed. We studied two examples of these: an adjusted Connected Caveman graph and a grid structure. Moreover, in large-scale networks such as news portals or content-sharing platforms, having a solid grasp of the underlying network structure and ordering helps optimize the process of dissemination and improve the accuracy of information spreading. In modern society, misinformation is a pervasive issue that can have large societal consequences. As we strive to combat the spread of false information, studying the ways information spreads in a variety of networks becomes crucial.

Ondrej Chwiedziuk (Charles University (Prague, Czech Republic)) & Tomas Cizek (Charles University (Prague, Czech Republic)) & Volodymyr Kuznietsov (Charles University (Prague, Czech Republic)) & Ondrej Sladky (Charles University (Prague, Czech Republic))
Division Game
Mentor: Bhargav Narayanan, Mathematics

Abstract: We examined a combinatorial game called Division game. There are 2n coins and two players, Alice and Bob, who want to split the coins so that each player has exactly n coins. In the first turn, Alice picks a coin and Bob decides who gets it. Then, the roles swap and the game continues until one of the players gets n coins. The other player then obtains the remaining coins and the player with the larger sum of coins wins the game. We conjecture that Bob always has a non-losing strategy which we proved for a small number of coins and for special classes of games.

Barbora Dohnalova (Charles University (Prague, Czech Republic)) & Jiri Kalvoda (Charles University (Prague, Czech Republic)) & Gaurav Kucheriya (Charles University (Prague, Czech Republic))
KT Orientation in Graphs
Mentor: Sophie Spirkl, Department of Combinatorics and Optimization, University of Waterloo

Abstract: In this article, we study the problem of classifying graphs that admit `KT orientations'. A~directed graph $G$ is said to have a KT orientation if there is at most one directed path between any pair of vertices of $G$. We extend the known examples of classes of graphs that admit a KT orientation and those graph families which do not. We show that the problem of determining whether a given graph admits a KT orientation is NP-complete, in particular this also applies if we restrict ourselves to planar graphs. Moreover we provide an algorithm to decide if a digraph with max-degree at most 3 has a KT orientation, while for graphs with max-degree $4$, this remains NP-complete. Finally we construct a graph family with small independence number (sub-linear in the number of vertices), and thus has unbounded fractional chromatic number, that admits a KT orientation.

Colton Fitzjarrald (University of Missouri-St Louis MO)
Uncovering the process of ribosome formation
Mentor: Wilma Olson, Chemistry and Chemical Biology

Abstract: The ribosome is responsible for creating proteins. Understanding how the ribosome itself folds into its final 3-dimensional structure is critical to deepening our understanding of the mechanisms behind proteins, and thus life itself. There is much research that has been done in understanding the 3-dimensional structure of ribosomes, however many of these studies have fallen short in giving an intuitive understanding of the 3D structure. In this project, we used computational models to simulate large subunit ribosomal RNA of Escherichia Coli.

Guillermo Gamboa (Charles University (Prague, Czech Republic)) & Josef Matejka (Charles University (Prague, Czech Republic)) & Jakub Petr (Charles University (Prague, Czech Republic))
On Steiner Trees of the Regular Simplex
Mentor: Karthik Srikanta, Computer Science

Abstract: In this project, we studied Steiner minimal trees for the points given by the corners of the regular simplex. We provide an explicit formula for the coordinates of the Steiner points of a conjectured Steiner minimal tree for simplexes of dimension 2k-1. Also, we explore the function APTC(T), a notion used to characterize trees with respect to the sum of all distances between the terminals, and proved that the conjectured topology for Steiner minimal trees for the regular simplex minimizes this function over all tree with n terminals and n-2 interior points.

Alexandro Garces (Massachusetts Institute of Technology MA)
Tight Bounds for Approximate Minimum Cuts in Insertion-Only Streaming Model
Mentor: Sepehr Assadi, Computer Science

Abstract: Finding the minimum cut of a graph has been studied extensively on its own as a fundamental graph problem and as a tool for understanding other important problems such as connectivity and network flow. We studied the minimum-cut problem in the insertion-only semi-streaming model, where edges of G are received one at a time in an input stream and space usage is confined to Õ(n) = O(n poly log n) bits. As such, we can't afford to store the entire graph at once, so we are forced to compute an approximation to the minimum cut. We present improved upper bound and lower bound results for the (1+ε)- approximation to the minimum cut problem in the insertion-only semi-streaming model. We achieve new results that are optimal in their dependence on ε.

Lily Gebhart (Occidental College CA)
Adjustments for Kurtosis and Continuity on the Prentice Test
Mentor: John Kolassa, Statistics and Biostatistics

Abstract: Nonparametric statistics is a sub-field of statistics involving minimal assumptions about the distribution of data, making it applicable to the analysis of real-world phenomena. The test of Prentice is a non-parametric statistical test for the two-way analysis of variance using ranks. The null distribution of this test is approximated using the Chi-square distribution. However, the exact null distribution deviates from the Chi-square approximation in certain cases commonly found in applications, motivating adjustments to the distribution. This summer, we presented adjustments to this null distribution, and that of related tests with non-polynomial scoring systems, correcting for continuity, skewness, and kurtosis in the multivariate case.

Marcus Gozon (University of Michigan-Ann Arbor MI)
Grid Multi-Robot Path Planning with the Corner-Following Constraint: Intractability and Algorithms
Mentor: Jingjin Yu, Computer Science

Abstract: Grid Multi-Robot Path Planning (MRPP) has been extensively studied, but much less is known about the corner-following constraint (CFC) variant, which has applications ranging from automated garages to grocery packing warehouses. On an m₁ x m₂ grid, the problem has a tight upper and lower bound for the makespan objective when there are Θ(m₁m₂) escorts, but there is a gap when there are only Θ(m₁+m₂) or Θ(1) escorts. In addition, nothing is known about the intractability of this problem or its related variants. In this work, we close the gap for an arbitrary number of escorts, finding an expected constant factor algorithm when there are at most min(m₁,m₂) escorts and a high probability constant factor algorithm when there are more. We also show that the problem is NP-hard under the makespan objective, and by the same reduction, standard grid MRPP is NP-hard also, unifying the two grid MRPP makespan hardness results. In addition, we show that when there is a single escort, the two-colored variant and partial variant are both NP-hard, which significantly impacts the design of efficient systems using the MRPP with CFC paradigm. These results contribute to our understanding of the design of efficient algorithms from both an upper and lower bound perspective.

Mattie Ji (Brown University RI)
Simplifying explicit equations of fake projective planes
Mentor: Lev Borisov, Mathematics

Abstract: A fake projective plane is a complex surface with the same Betti numbers as CP² but not biholomorphic to it. We studied the fake projective plane P²_fake = (a = 7, p = 2, ∅, D₃2₇) in the Cartwright-Steger classification. In this summer, we exploit the large symmetries given by Aut(P²_fake) = C₃xC₇ to construct a simpler embedding of this surface into CP⁵ as a system of 56 sextics with coefficients in Q(\sqrt{-7}). For each torsion line bundle T ∈ Pic(P²_fake), we also compute and study the linear systems |nH + T| with small n, where H is an ample generator of the Neron-Severi group. Finally, we constructed explicit equations of the fake projective plane $(a = 7, p = 2, ∅, D₃X₇), a closely related fake projective plane to P²_fake.

Kyrylo Karlov (Charles University (Prague, Czech Republic)) & Ashwin Padaki (Columbia University in the City of New York NY) & Stepan Zharkov (Stanford University CA)
Inapproximability of Minimum Diameter Clustering for Few Clusters
Mentor: Karthik Srikanta, Computer Science

Abstract: We consider the problem of approximate clustering with the diameter objective function. When the number of clusters is allowed to be unbounded, the inapproximability ratio is fairly well understood. However, this problem has not been well-studied when the number of clusters is constant. In this work, we show improved hardness bounds when there are a fixed number of clusters. Specifically, we show that the problem is hard to approximate within a factor of 1.5 in Hamming space, and within 1.304 in Euclidean space. Additionally, we prove several barrier results to known methods of proving hardness, and we provide a polynomial time algorithm for the problem when the number of dimensions is constant.

Sahil Kuchlous (Harvard University MA)
Algorithms for Streaming Tournaments
Mentor: Prantar Ghosh, DIMACS

Abstract: We present the first (non-trivial) Ω(n²) lower bound on streaming algorithms for tournaments on n vertices, demonstrating that (exactly) solving the minimum feedback arc set (FAS) problem on tournaments is hard. This complements recent work on upper bounds for the (approximate) FAS problem on tournaments, and shows that even though acyclicity testing can be solved in near-linear memory on tournaments, some important related questions remain hard. We also investigate a number of fundamental graph problems on tournaments: we prove new upper and lower bounds for s-t distance and reachability, and settle the streaming complexity of acyclicity testing. Finally, we settle the streaming complexity of sink finding in general directed graphs.

George Li (University of Maryland-College Park MD)
Exact planar emulators
Mentor: Zihan Tan, DIMACS

Abstract: We study vertex sparsification for distances in planar graphs: given a weighted planar graph G and a subset of k terminal vertices, the goal is to construct an emulator, which is a smaller (weighted) planar graph G' that contains the terminals and exactly preserves pairwise distances between the terminals. We construct exact planar emulators of size O(k^2) in the special where all terminals lie on exactly 2 faces.

Molly MacDonald (University of Notre Dame IN)
Sorting Probability for Linear Extensions
Mentor: Swee Hong Chan, Mathematics

Abstract: The 1/3?2/3 Conjecture states that in every finite partially ordered set that is not totally ordered, there exists a pair of elements x and y with the property that at least 1/3 and at most 2/3 of the linear extensions of the partial order place x earlier than y. This conjecture has become one of the more prominent conjectures in set theory, and while no one has succeeded to prove such a result for all partially ordered sets, there are partial results for posets with certain properties. Our goal this summer was to prove that this conjecture holds for a specific set P_3,n. which consists of {x₁, y₁, z₁, x₂, y₂, z₂, ..., x_n, y_n, z_n} where x_i≤x_i+1, y_i≤y_i+1, z_i≤z_i+1, and x_i≤y_i≤z_i, for all i.

Elm Markert (Smith College MA)
Genomic Data-Guided Computational Modeling of Cancer
Mentor: Subhajyoti De, Rutgers Cancer Institute of New Jersey

Abstract: Tumors grow from somatic cells inside the human body, often undetected without any major symptoms. But tumors shed cell-free tumor DNA (cfDNA), as well as proteins, intact tumor cells, and other molecules into the blood and other bodily fluids. As cfDNA travels through the bloodstream, it is degraded by various nucleases, salt, etc. such that when blood is drawn from patients for liquid biopsy for diagnostic purposes, the cfDNA has been fragmented into smaller pieces. The methylation, length, and molecular signatures of these cfDNA fragments have the ability to provide information about the location and type of cancer. However, the methods through which different types of cfDNA are degraded remain a mystery, and there is currently no broad, all-encompassing cancer diagnostic tool that utilizes this data. We utilize a mathematical technique called non-negative matrix factorization (NMF) to identify predominant characteristics of end-sequences of cfDNA from cancer patients with multiple types of cancer.

Martha-Victoria Parizot (Harvey Mudd College CA)
Visual Utility of Differentially Private Scatterplots under US Census data
Mentor: Anand Sarwate, Electrical and Computer Engineering

Abstract: In data-driven research fields like healthcare and technology, access to sensitive individual information is crucial for generating valuable insights. However, privacy regulations pose challenges for publishing such data, hindering researchers' access. Differential privacy has emerged as a promising solution, allowing researchers to learn from sensitive data while protecting individual privacy. This research focuses on generating differentially private scatterplots, a common data visualization tool, while retaining visual utility. The approach combines strategies of partitioning data, adding calibrated noise, and post-processing to suppress noise. The study explores factors like ε (privacy level), algorithms, bin size, data distribution, and sample size to understand their impact on visual utility. Two small datasets are used, derived from the US Census and World Health Organization, with direct application for Differential Privacy. The results highlight the trade-offs between privacy and visualization integrity, aiding researchers in making informed decisions to balance privacy and data visualization accuracy. Future directions include exploring more sophisticated point regeneration techniques and integrating differentially private heatmaps techniques to improve visual accuracy

Jay Patwardhan (Rutgers University-New Brunswick NJ) & Zheheng Xiao (Columbia University in the City of New York NY)
Concordance Invariants of Satellite Knots
Mentor: Kristen Hendricks, Mathematics

Abstract: Satellite knots are a method of producing new knots from existing ones, utilizing a pattern knot P inside a solid torus and a companion knot K. Two knots are said to be concordant if they form the boundary of an embedded annulus in S³ x [0,1]. P. Ozsvath and Z. Szabo defined an invariant of the concordance class of a knot, called the tau-invariant, and in 2014, A. Levine computed tau for satellite knots with Mazur patterns, which loop twice clockwise around the torus before turning around and looping once counterclockwise and joining back. We determined a formula for tau for generalized Mazur patterns with m clockwise turns and n counterclockwise turns by computing the box tensor product of the CFA- and CFD modules associated to the pattern and exterior of the companion knot, respectively.

Ethan Regal (Gannon University PA)
Developing a Data Driven Method to Predict Overheating in Powder Bed Fusion
Mentor: Weihong 'Grace' Guo, Industrial and Systems Engineering

Abstract: Powder Bed Fusion (PBF) is an extremely lucrative additive manufacturing technique that allows for highly customizable parts, increased resource efficiency, and reduced cost. However, PBF is prone to overheating which results in part defects. One possible solution to this deficiency is to employ a machine learning based, data driven method to predict overheating before it occurs. Such a method must predict process behaviors with high accuracy, provide explanations and clarification for the prediction, and obey physics principles. In this paper, methods of meeting these three conditions are explored, and a resulting technique is proposed.

Jemma Schroder (Massachusetts Institute of Technology MA)
Lagrangian Fillings of (CP¹)ⁿ
Mentor: Christopher Woodward, Mathematics

Abstract: Given a Legendrian submanifold Λ of a circle-fibred contact manifold Z over a symplectic base, one can ask whether Λ admits a Lagrangian filling L. In this paper, we disprove the existence of such a filling for the Legendrian given by the n-fold power of the Riemman Sphere CP¹. Wehrheim and Woodward proved a dimension requirement of the image of H(Λ) in H(L). We use this obstruction and a combinatorial count of relations between elements in H₁(Λ) and H_n-1(Λ) induced by the filling to show an obstruction for fillings of (CP¹) ⁿ for n>2.

Daniella Shomshonov (SUNY at Binghamton NY)
Prevalence and Associations of ARID2 Mutations in Melanoma
Mentor: Jian Cao, Rutgers Cancer Institute of New Jersey

Abstract: In this study we focused on the prevalence of ARID2 mutations in publicly available datasets and their association with known melanoma genetic features. Leveraging publicly available data from cBioPortal, we conducted comprehensive mutation analysis and statistical tests to explore the associations between ARID2 mutations and other melanoma-associated mutations, including BRAF, NF1, NRAS, KRAS, and HRAS. Our findings shed light on the potential functional significance of ARID2 mutations in melanoma biology

Russell Stetson (Rutgers University-New Brunswick NJ)
Characterizing LEF and Sofic Groups
Mentor: Simon Thomas, Mathematics

Abstract:

For each positive integer n, let S_n be the symmetric group on { 1,2, ..., n }. Then a group G is said to be locally embeddable into finite groups if for every finite subset F ⊂ G, there exists an injection φ : F → S_n for some n ≥ 1 such that whenever g, h, gh ∈ F, then φ(gh) = φ(g)φ(h). In this case, we say that G is an LEF group.

In the group theoretic literature, LEF groups are usually characterized in terms of embeddings into ultraproducts of finite symmetric groups. It is natural to ask whether there is a characterization in terms of the more concrete notion of a reduced product of finite symmetric groups. In more detail, let P = ∏_{n ≥ 1} S_n be the full direct product and let N be the normal subgroup of elements (π_n) ∈ P such that π_n = 1 for all but finitely many positive integers n. Then the reduced product is the quotient P₀ = P/N.

We have shown that it is neither provable nor disprovable using the classical ZFC axioms of set theory that if G is a group such that |G| ≤ 2^ℵ₀, then G is an LEF group if and only if G embeds into P₀. We have obtained results concerning characterizations in terms of nonprincipal ultrafilters. Certain subtle questions about the nature of the independence remain. We have obtained analogous results for sofic groups.

Sumi Vora (Pomona College CA)
Existence of the MLE in High-Dimensional Multinomial Logistic Regression
Mentor: Pierre Bellec, Statistics

Abstract: Logistic regression is a supervised classification algorithm that predicts the probability of an event given a set of features. This technique relies on maximum likelihood estimation (MLE) to estimate the regression parameters for the log-odds of the observed data. The MLE, however, does not always exist. In particular, if there exists a linear decision boundary separating the classes, then the MLE will not converge. In 2018, Candes and Sur proved a theoretical phase transition curve for the existence of the MLE in binary logistic regression parametrized by the ratio of features to number of observations p/n and the norm of the regression coefficients. In our project, we are interested in finding an empirical and theoretical phase transition curve for the existence of the MLE in unregularized multinomial regression for K classes (K>2).

Return to REU Home Page
DIMACS Home Page
DIMACS Contact Information
Page last modified on August 22, 2023.