insilicoSV: a framework for structural variant simulation - [GitHub]
A main focus of my work at the Broad Institute has been the extension of the lab's structural variant calling framework [described below] to complex variants. In order to create the datasets needed to develop this model I've been working on a genome simulator that is fully customizable in the types of variants and genome contexts it can simulate. This has been perfect for our case of needing to design training and evaluation genomes with known variants with associations to different genome elements. Although, while it was initially intended as a utility for this complex variant work, it's ultimately grown to a scale that has been useful for people working on a wide range of applications related to SVs. The simulator has now been released for public use and I hope it fills in a necessary gap in the field of SV simulation with its broad support of simple, complex, and custom variant types.
A deep learning framework for structural variant discovery and genotyping - [Nature Methods]
During my first year at the Broad Institute I contributed to the lab's first major publication that proposes a deep learning structural variant caller. The problem of detecting mutations in the genome is one that is widely approached with tools that implement a set of hand-crafted heuristics for what sequencing reads will look like under the conditions of various mutations. By virtue of the limitations of these kinds of rule-based systems and of the paucity of validated structural variant datasets that are out there, this tool makes an important step forward in shifting the field towards data-driven approaches that are extensible to different kinds of events and sequencing technologies. For more information, here is the recording of a talk my supervisor and I gave about the framework at Broad's Models, Inference and Algorithms seminar.
Dynamical Systems Reduction in Experimental Brain Modeling - [GitHub]
Throughout my Master's I did research at the Institute for Brain Science at Brown studying methods of dynamical systems reduction in models of neural spiking behavior. Dimensionality reduction is especially important for modeling neural phenomena because brain activity is typically measured in such a high-dimensional space that simulating the dynamics of the full system can be impractical. To simplify the problem, people try to find ways of identifying simplified systems that exhibit the same intrinsic dynamics that are observed in the data, and conduct simulations on the simplified model as a proxy for the original one. My work focused on dimensionality reduction techniques using recurrent neural networks, and became the subject of my Master's capstone project for my program.
Neural Network Parameter Reduction Using Pruning and Matrix Decomposition - [GitHub]
For this project I was inspired by a lecture I heard by Jonathan Frankle on his work in The Lottery Ticket Hypothesis. I'm such a fan of this paper because I'm a strong believer in focusing less on what can be accomplished with a huge clumsy neural network, and more on what underlying subtlety makes simple networks so powerful. In my project, I implemented some experiments to compare basic magnitude-based pruning to other methods based on matrix decomposition and dimensionality-reduction techniques applied to the different layers of a network.
What's a Gaussian Process? - [Post]
As part of a statistics and machine learning class I took in my Master's program I wrote a blog post about Gaussian process regression, which is a topic that sheds light on an important philosophical point of statistical machine learning - the ability to preserve information about the uncertainty with which we make predictions. Gaussian processes are at the core of a lot of really interesting machine learning research and while they can come off as intimidating, they're fundamentally intuitive objects that leverage Gaussian assumptions to give a highly interpretable and programmable family of models.
Predicting Voting Behavior of Supreme Court Nominees - [PDF]
In my junior year I took a great course in decision theory, which culminated in this group project in which two friends and I set out on the task of predicting the voting patterns of Supreme Court nominees. This project was a substantial data engineering undertaking, involving the processing of raw text data from the judges' confirmation hearings, and demographic data available at the time of confirmation. In this paper we compared logistic regression and random forest models trained on combinations of these real-world data.
Elliptic Curves and Cryptography - [PDF]
In my sophomore year algebra course I had a chance to write an open-ended paper for which I took inspiration from the controversial FBI order on Apple to encode a crytographic backdoor into their phones. In this survey paper I write about the use of elliptic curves in pseudo-random number generators, the discrete logarithm problem, and the Diffie-Hellman key exchange.