Andres Potapczynski

  • PhD in Data Science, NYU
  • MSc in Data Science, Columbia University
  • BSc in Applied Mathematics, ITAM
  • BA in Economics, ITAM
[CV]

Research & Projects

Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra
We propose WASP, a neural network model to predict variant effects from functional annotation. WASP leverages fast linear algebra techniques for scalability and outperforms the standard and widely used LD score regression method. -- 42nd International Conference on Machine Learning (ICML 2025).
Topics: Computational Biology, Numerical Methods, Machine Learning.
[PDF] [CODE]

Customizing the Inductive Biases of Softmax Attention using Structured Matrices
The core component of attention is the scoring function, which transforms the inputs into low-dimensional queries and keys and takes the dot product of each pair. While the low-dimensional projection improves efficiency, it causes information loss for certain tasks that have intrinsically high-dimensional inputs. Additionally, attention uses the same scoring function for all input pairs, without imposing a locality bias for neighboring tokens in the sequence. In this work, we address these shortcomings by proposing new scoring functions based on computationally efficient structured matrices with high ranks, including Block Tensor-Train (BTT) and Multi-Level Low Rank (MLR) matrices. -- 42nd International Conference on Machine Learning (ICML 2025).
Topics: Numerical Methods, Scaling Laws, Machine Learning.
[PDF] [CODE]

Effectively Leveraging Exogenous Information across Neural Forecasters
Research on neural networks for time series has mostly focused on developing models that learn patterns about the target signal without the use of additional auxiliary or exogenous information. In applications such as selling products on a marketplace, the target signal is influenced by these variables, and leveraging exogenous variables is important. We develop a decoder method that leverages the time structure of exogenous information through structured state-space model layers and learns relationships between the variables through MLPs. We show that this decoder method can be applied to a wide variety of models such as NBEATS, NHITS, PatchTST, and S4, yielding notable performance improvements across a different datasets. -- 38th Conference on Neural Information Processing Systems (NeurIPS TSALM 2024).
Topics: Neural Forecasters, Time-series, Machine Learning.
[PDF] [CODE]

Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
Going beyond prior works that study hand-crafted structured matrices on a case-by-case basis, we introduce a continuous parameterization over the space of all structured matrices expressible as Einsums. Using this parameterization, we gather the following insights: (a) compute-optimal scaling laws of Einsums are primarily governed by the amount of parameter sharing and rank of the strucutre. (b) Existing structured matrices do not outperform dense in the compute-optimal setting. (c) Mixture-of-Experts over structured matrices is more efficient than standard MoE over entire FFNs. -- 38th Conference on Neural Information Processing Systems (NeurIPS 2024).
Topics: Numerical Methods, Scaling Laws, Machine Learning.
[PDF] [CODE]

Compute Better Spent: Replacing Dense Layers with Structured Matrices
We run a systemic study of different sub-quadratic structures as an alternative to the ubiquitous quadratic dense linear layer. We show how to initialize and scale the learning rates of each of these diverse structures to achieve their best performance based on the theory of muP. Moreover, we propose a novel structure (BTT) that achieves better scaling laws than the dense structure. -- 41st International Conference on Machine Learning (ICML 2024).
Topics: Numerical Methods, Scaling Laws, Machine Learning.
[PDF] [CODE]

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra
We propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA (Compositional Linear Algebra). By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. -- 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
Topics: Numerical Methods, Machine Learning.
[PDF] [CODE]

Simple and Fast Group Robustness by Automatic Feature Reweighting
We propose Automatic Feature Reweighting (AFR), an extremely simple and fast method for updating a model to reduce its reliance on spurious features. AFR retrains the last layer of a standard ERM-trained base model with a weighted loss that emphasizes the examples where the ERM model predicts poorly, automatically upweighting the minority group without group labels. With this simple procedure, we improve upon the best reported results among competing methods trained without spurious attributes on several vision and natural language classification benchmarks, using only a fraction of their compute. -- 40th International Conference on Machine Learning (ICML 2023).
Topics: Group Robustness, Spurious Features, Last-layer retraining.
[PDF] [CODE]

A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks
We propose Neural-IVP, a method for approximating solutions to high-dimensional PDEs though neural networks. Our method is scalable, well-conditioned and runs in time linear to the number of parameters in the neural network. -- 11th Conference on Learning Representations (ICLR 2023).
Topics: Inductive Biases, Partial Differential Equations, Numerical Linear Algebra.
[PDF] [CODE]

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
We develop a compression approach based on quantizing neural network parameters in a random linear subspace profoundly improving previous state-of-the-art generalization bounds and showing how these tight bounds can help us understand the role of model size, equivariance, and implicit biases in optimization. -- 36th Conference on Neural Information Processing Systems (NeurIPS 2022).
Topics: Random Subspaces, Quantization, Equivariance, PAC-Bayes bounds.
[PDF] [CODE]

Low-Precision Arithmetic for Fast Gaussian Processes
We study the different failure modes that can occur when training GPs in half precision. To circumvent these failure modes, we propose a multi-faceted approach involving conjugate gradients with re-orthogonalization, mixed precision, and preconditioning -- 38th Conference on Uncertainty in Artificial Intelligence (UAI 2022).
Topics: Gaussian Processes, Quantization, Numerical Linear Algebra.
[PDF] [CODE]

Bias-Free Scalable Gaussian Processes via Randomized Truncations
We identify the biases introduced by approximate methods and eliminate them via randomized truncation estimators -- 38th International Conference on Machine Learning (ICML 2021).
Topics: Gaussian Processes, Russian-Roulette estimators, Kernel Approximations, Numerical Linear Algebra.
[PDF] [CODE]

Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax
We introduce a family of continuous relaxations that is more flexible, extensible and better performing than the Gumbel-Softmax -- 34th Conference on Neural Information Processing Systems (NeurIPS 2020).
Topics: Generative modeling, VAEs, Normalizing Flows, Continuous Relaxations.
[PDF] [CODE]

Nowcasting with Google Trends
I propose an alternative kernel bandwidth selection algorithm and exhibit what Google searches are relevant for predicting unemployment, influenza outbreaks and violence spikes in Mexico. The content is in English (past the acknowledgments) and relevant pages are: 4, 26, 36, 43, 48 -- Undergraduate Thesis.
[PDF] [CODE]

Classifying webpages based on their menu
By modifying Word2Vec we recover an embedding that helps to cluster clients based on their webpage's menu content -- Capstone Project.
[PDF]