# Honors Research Projects Options

## 2023 Research Projects

### Please select your top three project choices by using the drop down boxes provided at the bottom of the page. The projects are listed by number. Do not select the same project more than once.

### The group size for each project will typically be 3 or 4 students and/or counselors, except for Project 16 where any number of students and/or counselors can sign up.

**1. Alex White, “Introduction to Multivariate Data Science”**

This project/course introduces multivariate statistical methods. The topics include data visualization, data wrangling, basic concepts of statistical inference, statistical methodologies for simple and multiple regression, assessment of model fit, and criteria for selection of optimal models. Working in teams of 3-4, students will “wrangle” and analyze moderately large data and messy data sets to model multivariate relationships. Possible data sets include COVID 19, video analysis in educational settings, Spotify User data and others. We will use free statistical software R and R Studio.

R Studio is a user interface built for R, so it requires R to be installed. R is available online at *www.r-project.org*, and R Studio Desktop (Open Source License) can be downloaded from *http://www.rstudio.com/products/rstudio/download*. No prior knowledge of R is required.

**2. Anton Dochtermann, “Groups and graphs”**

Graphs are discrete objects that model connectivity in a network, and groups are algebraic objects with a binary operation that is `invertible'. What happens when we combine the two? A natural instance of this comes from the automorphism group of a graph, where self maps of a graph can be `adjacent'. Inspired by this, as well as fancy notions from topology like Lie groups and G-bundles, in this project we will study sets that can be given both a graph and group structure in a compatible way. Students should be familiar with graphs and maybe some basic group theory (although this is not required), and coding skills are encouraged.

*Remote mentoring

**3. Anton Dochtermann, “Reconfiguration in digraphs”**

A directed graph (digraph) G is a collection of vertices with (directed) edges. A digraph homomorphism G --> H is a function of the vertex set that preserves the edges, generalizing the notion of a `coloring' of a graph. Given two digraphs G and H the set of all digraph homomorphisms can itself be given a graph structure, allowing us to 'reconfigure' one homomorphism to another. But what does this graph look like? For example if H is acyclic (has no directed cycles) then this space is always connected (or empty) no matter what G is. In this project we will study reconfiguration graphs for other classes of digraphs. Students should be familiar with graphs, and coding skills are encouraged.

* Remote mentoring

**4. Anton Dochtermann, “Rota's matroid theory conjectures"**

A famous conjecture in `matroid theory' due to Gian-Carlo Rota says that one should be able to rearrange the elements in any given collection of `bases' in a certain way. Translated to the special case of graph theory, the conjecture says that if the edges of a (multi)graph with n vertices are colored with n-1 colors so that each color class produces a spanning tree, then one should be able to find n-1 `rainbow spanning trees'. In this project we will study Rota's conjecture for special cases of graphs (eg paths and stars), and perhaps learn something about matroids along the way. Students should be familiar with graphs, and coding skills are encouraged.

* Remote mentoring

**5. ****Dan Tamir, “Combining Perception Considerations with Artificial Intelligence to Combat Bot User Interface”**

In this project, we will synthesize and analyze bot interface patterns mainly in the form of mouse and keyboard activities (e.g., keyboard clicks, mouse-path, and mouse traverse trajectories) performed by bots during interface with app-widgets. We will compare and contrast these patterns with human patterns of interaction with app-widgets. For example, we will analyze a typical trajectory of a mouse used by human for interacting with an online store versus a typical pattern of mouse trajectories performed by a bot attempting to interact with the online store. We will study the state of the art of AI inference, concentrating on Automated Reasoning, State Space Search Techniques, Markov Models, and Neural Networks. Additionally, we will study the state of the art in Perception-based Reasoning. Next, we will apply these methods to automatically generate human-like patterns of interaction and automatically detect bots’ patterns of interaction.

**6. Dan Tamir, “Investigating the Art Gallery Problem with Mobile Guards in Discrete Coordinates Systems”**

The Art Gallery Problem is a theoretical framework that can be used to identify the minimal (or sufficient) number of static and or dynamic guards required to protect an art gallery. Practical applications of the theory include designing security systems that deploy static guards and camera systems as well as mobile guards and drawn-based camera systems for safeguarding buildings, ports, campuses, and camps. Other applications include identifying collision free routes for mobile robots. The problem has been intensively investigated for galleries represented by polygons in continuous coordinate systems. The ramifications of addressing the problem in discrete coordinate systems, under the hypothesis that this constraint enables reducing the number of guards, has not been thoroughly explored. In this project we will explore algorithms for finding the minimal/sufficient number of guards in art galleries represented by polygons in discrete coordinate systems. Specifically, we will investigate modifying existing algorithms for continuous coordinates systems to fit discrete coordinate systems.

**7. ****Dan Tamir, “Speech Recognition Using Generative AI Bots.”**

In this project we will use generative AI drawing utilities such as Midjourney and DrawGPT to obtain images of spectral and or temporal representation of basic speech units (e.g., phonemes) and their features. Next, we will use the generated images to train a Neural Network such as Transformer to recognize these basic units.

**8. Dan Tamir, “Anomaly Detection in Civil Engineering Structures Using a Generative AI Bots.”**

In this project we will use generative AI drawing utilities such as Midjourney and DrawGPT to obtain images of spectral and or temporal representation of the output of sensors in sensor data networks mounted on civil engineering structures such as buildings and bridges. Next, we will use the generated images to train a Neural Network such as Transformer to detect structure anomalies via the sensed data.

**9. Ivan Ojeda-Ruiz, “Analysis of a spectrogram using matrix decomposition”**

A spectrogram is a form of representing sounds in a matrix. This data can be studied with various matrix decompositions, that is, Eigendecomposition, SVD, QR, and Cholesky. There are many applications such as music, linguistics, sonar, radar, speech processing, seismology, electrocardiograms (ECG), etc. We will dedicate our time to studying a dataset of this type and analyzing the low rank reconstruction of these decompositions.

**10. Ivan Ojeda-Ruiz, “Analysis of Texture Description for Image Segmentation”**

Segmentation of an image is usually performed by looking at the intensity of pixels to describe their relationship. Texture is seldom taken into account when performing this task. We shall study how to generate affinity matrices that describe the texture of an image so that segmentation algorithms can detect this feature in images. The time dedicated to this project will revolve around describing the so-called intervening contours and multiscale aggregation.

**11. Ivan Ojeda-Ruiz, “Singular Value Decomposition to measure 3D grain sizes in Crystals”**

Grain shapes become irregular as Crystals grow. As crystals become larger, their microstructures compete for growth space. This makes it difficult to measure grain sizes accurately. A common method to compute crystallization rates it to use the so-called crystal size distribution(CSD). A recent, novel approach to achieve this, is to use an algorithm developed by Nima Moshtagh that employs the Khachiyan Algorithm along with the SVD to obtain an ellipsoid that encloses an arbitrary number of points (which is defined by the user). Common shapes of crystals that have been studied with this algorithm are a prism, a plate, and a cuboid. We shall dedicate our time to study some unorthodox shapes

**12. ****Ivan Ojeda-Ruiz, “Tackling Biases in Unsupervised Machine Learning Methods”**

The PageRank algorithm is a well-known method to perform Web search but unfortunately, it has its biases. These biases can cause inequality and/or inequity in social networks. The PageRank problem uses the largest left eigenvalues of the Google Matrix to find its optimal solution. We seek to change what the optimal solution is so that it improves fairness. In order to do this, we shall construct a constraint matrix that will take a priori information to avoid certain situations in the optimal solution.

Homophily and Heterophily are very good predictors of the inequity in these algorithms. These properties of the network are not usually easy to control so we will develop a method to scrutinize the results of these algorithms and provide improved fairness in the result. If time permits we shall do a similar study on the Who-to-Follow (WTF) algorithm.

**13. ****Lucas Rusnak, “Signed Data Analytics”**

Oriented hypergraphs and signed graphs provide a method to translate matrix-based data analysis to localized sentiment analysis. We will examine the combinatorial and algebraic differences between ordinary graphs, signed graphs, and oriented hypergraphs and difficulties in achieving consensus-driven outcomes.

This project has applied, theoretical, and algorithmic aspects that will be discussed by the group as we narrow our focus on topics of greatest impact and personal interest. Applied problems may include (1) analyzing election patterns at the county level, and building a greater resolution model through ArcGis; (2) study resilience patterns of decision-based data sets (of your choice) and introduce new metrics to quantify data anomalies; and (3) examine the trends of eigenvalues as negative edges deteriorate spectral clustering methods and provide a new clustering algorithm. Theoretical problems, broadly, will involve examining how sentiment analysis can be used to re-interpret famous un-proven conjectures. Algorithmic aspects will be discussed based on the team's familiarity with coding, but can include anything from streamlining existing code, to simple data scraping and merging, to algorithmic decision-making.

Experience in one of the areas of linear algebra, graph theory, combinatorics, statistics, or programming (C++, Python, R) is highly beneficial.

**14. Rodion Podorozhny, “Verification of concurrent mission critical control systems”**

Systems with high penalty for failure are considered mission critical systems. It is important to verify them against safety properties before putting them in operation. A common example of such a property is a safety property that, say, an elevator control system does not allow an elevator to move with its doors open.

Concurrent systems have multiple threads of control. A great deal of modern control systems are concurrent. Concurrency potentially introduces a kind of errors due to different speeds of thread execution. They are hard to detect. We will use program analysis methods that are capable of detecting such errors.

In this project we will apply and evaluate advanced program analysis techniques to concurrent mission critical systems such as a cruise control system or distributed drone task planning system.

*Knowledge of Java is helpful

**15. ****Rodion Podorozhny, “Automatic proof of software systems properties.”**

In this project we will apply and evaluate an automatic prover (Alloy), for verification of software systems implementing complex data structures. Alloy is a satisfiability-based automatic prover that uses a set-theoretic model extended with the first order logic for modeling a software system and its properties.

*Knowledge of Java is helpful

**16. Rodion Podorozhny, “Runtime verification of program slices”**

Reliable and efficient concurrent software systems are key to utilization of multiple core CPUs. There is already a noticeable number of existing concurrent implementations of basic data structures. It is important to verify that these implementations are reliable. The project will focus on methods of efficient verification of concurrent data structures by runtime verification, in particular combination of slicing (automatic reduction of the analyzed code relevant to the property verified) and model checking algorithms. We can use existing slicing tool by IBM, Wala and some existing model checking tool (SLAM by Microsoft, Java Pathfinder by NASA Ames Center etc.)

*Knowledge of Java is helpful

**17. Rodion Podorozhny, “Robotic distributed mapping and localization.”**

In this project we will design and evaluate a distributed version of Simultaneous Localization and Mapping algorithm on a pre-existing prototype

*Knowledge of Java is helpful

**18. ****Suho Oh, “Puzzles, trianguloids and flips”**

Ever played sliding puzzles? Imagine a triangular grid where we have triangle and rhombus tiles. A flip is a move where whenever you have a triangle adjacent to a rhombus, it 'flips' by merging them, and then separating them to a different pair of triangle + rhombus. Now there is an n-dimensional version of this puzzle coming from an important object in mathematics called the product of simplices. Galashin-Nenashev-Postnikov came up with an amazing and beautiful model describing the possible configurations in the n-dimensional version of this puzzle called 'trianguloids'. I have a conjecture that a flip is exactly rotating one cell in the trianguloid. We will attempt to confirm/prove this conjecture. A lot of computation and n-dimensional headaches are expected.

**19. Thomas Keller, “Prime graphs of finite groups”**

The subject of this project are prime graphs of finite groups with a focus on groups that are very close to solvable groups. There are several open questions on these graphs which are purely combinatorial in nature. In the realm of solvable groups, an important notion here is the one of minimal prime graphs. A simple graph is called a minimal prime graph if it is connected, has at least two vertices, and its complement is triangle-free and has chromatic number 3, and removing any edge results in a graph whose complement has a triangle or chromatic number 4. How can this notion be extended to non-solvable groups which are still close to solvable? Finding such extensions and studying their properties will be problems to be explored in this project.

**20. Wade Hindes, “Irreducible polynomials in dynamical semigroups”**

Let S be a a set of polynomials with rational coefficients and let M(S) be the set of all finite compositions of elements of S (i.e., the semigroup generated by S under composition). In this project, we will be interested in studying the following question: what properties of the original polynomials in S ensure that a large subset of the semigroup M(S) consists of irreducible polynomials (i.e., polynomials that cannot be factored non-trivially)? With a little thought, one can see that the generating set S should contain at least one irreducible polynomial to start, but is that enough?

We will begin by investigating this question for sets of the form

S = {x^{d_1}+c_1,…, x^{d_s}+c_s}

where the c_i are rational numbers and the d_i > 1. In particular, we hope to prove that if such S contain at least 2 irreducible polynomials in Q[x], then M(S) does indeed contain a large (and explicit) subset of new irreducible polynomials. To do this, we will use many modern tools from arithmetic geometry and arithmetic dynamics.

**21. XiaoXi Shen, “Estimation of Genetic Heritability”**

Genetic heritability measures the proportion of phenotypic variations explained by genetic variations. The advancement of genome-wide association studies have been used to estimate genetic heritability but provide underestimated results. This leads to a problem of missing heritability. This project involves proposing a novel approach in estimating genetic heritability based on kernel methods and linear mixed effect models, developing algorithms and R package to realize the proposed approach and compare it with the current state-of-the-art methods in estimating genetic heritability.

**22. ****Young Ju Lee, “Fast solver for Logistic regression”**

Logistic regression is a valuable tool for predicting the likelihood of an event. It helps determine the probabilities between two or more classes. A very simple example is an email spam folder, for which, the Logistic regression can classify emails as spam or regular, and thus it can direct them to their respective inboxes. In recent years, the Logistic regression has become an invaluable tool in the increasingly popular field of Machine Learning. In particular, for collecting and analyzing data, it is easier to implement, interpret, and train than other machine learning models, such as neural networks. Therefore, logistic regression can be considered to be a foundation of machine learning and it has been used in many areas, which include social sciences, such as Credit scoring, Medicine, Text editing, Hotel Booking, gaming, the acceptance rate for the school, just to cite a few. Logistic regression sets a basis for modern deep learning algorithms as well. On the other hand, it is well-known that logistic regression is difficult to solve. The most well-known solver is known as scikit learn, which is based on the Newton methods. However, since Newton method relies on the computation of the Hessian of the objective functional of the Logistic regression, it is very expensive and slow, in general. Thus, the timely research should be devoted to the development of a fast and efficient solver. In the summer project, we will propose to develop a fast solution technique for logistic regression based on conjugate gradient methods.

Prerequisite: Matlab programming skills and linear algebra. Students storngly encouraged to study Logistic regression before coming to join in the program in Summer.

**23. ****I choose not to participate in a research project**