Abstract: Understanding how neural networks learn features, or relevant patterns in data, for prediction is necessary for their reliable use in technological and scientific applications. In this work, we presented a unifying mathematical mechanism, known as average gradient outer product (AGOP), that characterized feature learning in neural networks. We provided empirical evidence that AGOP captured features learned by various neural network architectures, including transformer-based language models, convolutional networks, multilayer perceptrons, and recurrent neural networks. Moreover, we demonstrated that AGOP, which is backpropagation-free, enabled feature learning in machine learning models, such as kernel machines, that a priori could not identify task-specific features. Overall, we established a fundamental mechanism that captured feature learning in neural networks and enabled feature learning in general machine learning models.
Abstract: Algebraic Statistics is a relatively new field of research where tools from Algebraic Geometry, Combinatorics and Commutative Algebra are used to solve statistical problems. A key area of research in this field is the Gaussian graphical models, where the dependence structure between jointly normal random variables is determined by a graph. In this talk, I will explain the algebraic perspectives on Gaussian graphical models and present some of my key results on understanding the defining equations of these models. In the end, I will talk about the problem of structural identifiability and causal discovery and how algebraic techniques can be implemented to tackle them.
Abstract: Consider a well-shuffled deck of cards that contains distinct types of cards each with multiplicity . Consider the following card-guessing game: At each step, the player (guesser) guesses the topmost card in the deck. After each guess, the topmost card is shown to the player (guesser) and removed from the deck. The game continues till the deck is exhausted. This game is often referred to as a complete feedback game. The player's goal in the complete feedback game is to maximize the number of correct guesses. Diaconis and Graham showed that the greedy strategy maximizes the total number of correct guesses in expectation. In this talk, we give a leisurely survey of many interesting questions arising from this setup and present a central limit theorem for the total number of correct guesses under the greedy strategy. This is based on joint work with Ottolini.
Abstract: The speaker will begin with the motivation behind the motivic homotopy theory and its applications to cohomology theories of schemes and vector bundles on affine schemes. Later, he will present an ongoing research work with Neeraj Deshmukh on computing motivic invariants of stacky curves and using those invariants to compute Brauer group of moduli stack of curves of genus 1.
Abstract: The design and analysis of finite element methods for the incompressible Navier–Stokes equations that are robust and accurate for a wide range of Reynolds numbers remain a challenging problem. In this talk, we will first consider the implicit-explicit (IMEX) time discretizations using equal order interpolation for the incompressible Oseen equations at high Reynolds numbers and discuss its stability and convergence results.
Then, in the case of high Reynolds numbers or inviscid flow problems, we show how the implicit-explicit method can naturally be written as a split method based on Poisson pressure projection steps without splitting error. The analysis remains valid for the splitting scheme and satisfies close to optimal error estimates. Finally, the validation of the proposed stabilization scheme and verification of the derived estimates are presented with appropriate numerical experiments.
Abstract: The question of which functions acting entry-wise preserve positive semi-definiteness has a long history, beginning with the Schur product theorem [Crelle, 1911], which implies that absolutely monotonic functions (i.e., power series with nonnegative coefficients) preserve positivity on matrices of all dimensions. A famous result of Schoenberg and of Rudin [Duke Math. J. 1942, 1959] shows the converse: there are no other such functions.
Motivated by modern applications, Guillot and Rajaratnam [Trans. Amer. Math. Soc. 2015] classified the entry-wise positivity preservers in all dimensions, which act only on the off-diagonal entries. These two results are at "opposite ends'', and in both cases the preservers have to be absolutely monotonic. We complete the classification of positivity preservers that act entry-wise except on specified "diagonal/principal blocks", in every case other than the two above. (In fact we achieve this in a more general framework.) This yields the first examples of dimension-free entry-wise positivity preservers - with certain forbidden principal blocks - that are not absolutely monotonic.
Abstract: Given a group G and two Gelfand subgroups H and K of G, associated to an irreducible representation \pi of G, there is a notion of H and K being correlated with respect to \pi in G (introduced by Benedict Gross in 1991). We discuss this theme and give some details in some specific examples.
Abstract: Click Here
Abstract:
I plan to talk about the notion of Hamiltonian reduction from the point of view of physics (specifically classical mechanics) and mathematics (specifically symplectic geometry), focusing on a concrete example involving the Calogero-Moser space. Time permitting, I will motivate the notion of quantum Hamiltonian reduction and its applications to representation theory.
Abstract:
Weyl's law is a fundamental result that governs the asymptotics of the eigenvalues of the Laplacian. Stated very simply, it states that the number of eigenvalues of the Laplacian less than or equal to t, is asymptotically equal to the Volume of the domain times t^{d/2}, where d is the dimension of the domain.
As with any asymptotic problem, it is a natural question to investigate the error term. After important work by Courant in 1920's, it was shown by Hormander in 1968 that the error term is of the order of O(t^{d/2 -1/2}) (for manifolds without boundary). It can also been shown that the error term is actually sharp for the round sphere.
In 2019, Iosevich and Wyman showed that the error term actually can be improved for a product of spheres. There proof involves a highly nontrivial estimate for lattice points inside an ellipsoid.
We have recently extended the idea of Iosevich and Wyman to projective spaces. This is joint work with Anupam Pal Choudhury and Sai Sriharsha.
Abstract:
To accurately quantify landslide hazard in a region of Turkey, we develop new marked point process models within a Bayesian hierarchical framework for the joint prediction of landslide counts and sizes. We leverage mark distributions justified by extreme-value theory and specifically propose “subasymptotic`` distributions to flexibly model landslide sizes from low to high quantiles. The use of intrinsic conditional autoregressive priors, and a customized adaptive Markov chain Monte Carlo algorithm, allow for fast fully Bayesian inference. We show that subasymptotic mark distributions provide improved predictions of large landslide sizes, and use our model for risk assessment and hazard mapping. Furthermore, within the general modeling framework, a submodel known as the areal model is utilized when data are aggregated at a coarser slope unit resolution. We applied this framework to jointly model Wenchuan landslide counts and sizes data, highlighting the benefit of the joint modeling approach in the landslide literature for hazard and risk assessment.
Abstract:
Let R(G) denote the category of smooth complex representation of G(F), where G is a connected reductive group defined over a non-archimedean local field F. Bernstein decomposition expresses R(G) as a product of indecomposable subcategories called Bernstein blocks. Each Bernstein block is equivalent to the module category of the "Hecke algebra" associated with that "type". I will go over the basic theory mentioned above. To each Bernstein block, the theory of Moy and Prasad associates a number called depth. I will describe a result, part of a work in progress jointly being done with Jeff Adler, Jessica Fintzen and Kazuma Ohara, which states that each Bernstein block is equivalent to a depth-zero Bernstein block of a certain subgroup of G, when the residue characteristic is not too small.
Abstract:
With advancements in technology for measuring environmental variables, there has been a significant increase in the volume and accessibility of data available to hydrologists and climate scientists.
Statistics plays an important role in the analysis of these datasets for various applications such as risk assessment, forecasting and identification of drivers of extreme events. The objective of this talk is to introduce some of these applications and to discuss the challenges involved in applying statistical techniques in hydrology and climate sciences which arise from the nature of problems in the domain and the characteristics of available datasets. These issues will be discussed in the context of three problems: (1) spatiotemporal analysis of extreme events, (2) attribution of physical mechanisms of extreme events and (3) quantifying the contributions of natural and anthropogenic processes to changes in the water cycle.
Abstract:
Large-scale assessment surveys typically collect data via tests about cognitive or socio-emotional skills from a heterogeneous sample. The methods or approaches for scoring these tests vary significantly. This leaves applied researchers uncertain about the assumptions and computational methods involved in scoring such tests. In this presentation, we will delve into three primary methods of scoring and using them in subsequent analyses: (1) test scores (provides point estimates of individual ability), (2) structural equation modeling (SEM), and (3) plausible values (PV). We will explore the biases inherent in each approach and present findings from a simulation study comparing the three methods under conditions typical of socio-emotional skill and personality assessments. Our results demonstrate that while different test scores may exhibit high correlation, the resulting bias in regression coefficients can vary significantly.
Abstract:
In this talk, I will present some of my work on designing and analyzing algorithms for learning in large and structured environments, where the state and action spaces are huge or even infinite. I will focus on two main topics: (1) Bayesian optimization for hyperparameter tuning in large-scale machine learning models, and (2) Policy optimization for language models using human feedback. For the first topic, I will introduce the Gaussian process optimization framework and design multi-armed-bandit algorithms for hyperparameter optimization. I will show sublinear regret bounds for the proposed algorithms that depend on the information complexity of the objective function to be optimized. Along the way, I will present a self-normalized concentration inequality for vector-valued martingales of arbitrary, possibly infinite, dimension, and discuss some applications of this concentration bound. For the second topic, I will talk about the effects of noisy preference data that can negatively impact language model alignment. I will propose a robust loss function for language model policy optimization in the presence of random preference flips. I will show that the proposed language model policy is provably tolerant to noise and characterize its sub-optimality gap as a function of noise rate, dimension of the policy parameter, and sample size. I will also demonstrate the empirical performance of the proposed policy on various tasks, such as dialogue generation and sentiment analysis. I will conclude with some open problems and future directions of research in large scale machine learning.
Abstract:
Handling higher-dimensional data poses a familiar challenge for most machine learning applications such as cancer genomics. We first discuss traditional regression models that induce shrinkage, such as LASSO, and their Bayesian variants. We then look at spike and slab priors that have become popular for high-dimensional Bayesian
modeling and how we use them to model shrinkage in deep neural networks, aiming at feature selection. Lastly, we discuss computational challenges, a scalable solution based on variational Bayes methods, and some work on the theoretical guarantees of the variational posterior (briefly).
Abstract: Click Here
Abstract:
In this talk, we define Eisenstein cycles in the first homology groups of quotients of the hyperbolic three spaces as linear combinations of Cremona symbols (generalization of Manin symbols) for the imaginary quadratic field. They generate the Eisenstein part of the homology groups. We also discuss the Eisenstein part of the cohomology groups. As an application, we find asymptotic dimension formula (level aspect) for the cuspidal cohomology groups of congruence subgroups of certain form inside the full Bianchi groups. This is a joint work with Pranjal Vishwakarama.
Abstract:
Research in machine learning and data science is increasingly entering the realm of staggeringly large Multiview data collections (concurrent measurements (views) collected on the same subjects from multiple sources).Fueled by an explosion in recent high-throughput and AI technologies, we are now ready to enter the world of personalized medicine and individualized solutions, where clinical or other non-therapeutic interventions can be custom-tailored to individuals to achieve better outcomes based on their Multiview profiles. Although analyses of such multimodal datasets have the potential to provide new insights into the underlying mechanistic processes that cannot be inferred with a single modality, the integration of very large, complex, multimodal data represents a considerable statistical and computational challenge. An understanding of the principles of data integration and visualization methods is thus necessary to determine which methods are best applied to a particular integration problem. In this talk, I will discuss open challenges in multimodal integration, including methodological issues that must be resolved to establish the resources needed to move beyond incremental advances toward translational intervention while keeping machine learning and data science at the forefront of the next generation of Multiview research.
Abstract:
Fix an odd prime p. Let f be a p-ordinary newform of weight k and h be a normalized cuspidal p-ordinary Hecke eigenform of weight l< k. In this talk we will discuss the p-adic L function and the structure of the p∞-Greenberg Selmer group of the Rankin-Selberg convolution of f and h. In the special cases when the residual representation of h at p is reducible, we also discuss certain congruences between the associated characteristic ideal of the dual Selmer group and the p-Adi Rankin-Selberg L-function. This is a joint work with Somnath Jha and Ravitheja Vangala.
Abstract:
The parity of Selmer ranks for elliptic curves defined over the rational numbers $\mathbb{Q}$ with good ordinary reduction at an odd prime $p$ has been studied by Shekhar. The proof of Shekhar relies on proving a parity result for the $\lambda$-invariants of Selmer groups over the cyclotomic $\mathbb{Z}_p$-extension$\mathbb{Q}_\infty$ of $\mathbb{Q}$. This has been further generalized for elliptic curves with super singular reduction at $p$ by Hatley and for modular forms by Hatley--Lei.
In this talk, we will present a parity result for the $\lambda$-invariants of Selmer groups over $\mathbb{Q}_\infty$ for the symmetric square representations associated to two modular forms, both ordinary at $p$ with congruent residual irreducible Galois representations.
Abstract:
There has been a tremendous amount of work in the past few decades in arithmetic statistics and especially about counting number fields. In this talk, after giving a general view of the sort of results available in the literature, I will report on the results of an ongoing joint work with Qiyao (Vivian) Yu. We consider the sub-class of totally imaginary number fields and ask the question: how likely is such a number field to contain a CM subfield? A CM field is a totally imaginary quadratic extension of a totally real number field. We show that about 67% of quartic totally imaginary fields do not contain a CM subfield. I will also discuss the case of sextic fields, where the complexity of the problem in the general case becomes apparent.
Abstract:
The classical Diophantine problem of determining which integers can be expressed as sum of two rational cubes has a long history; it includes works of Sylvester, Selmer, Stage, Leiman and the recent work of Alpöge-Bhargava-Shnidman-Burungale-Skinner. In this talk, we will use Selmer groups of elliptic curves and integral binary cubic forms to study some cases of the rational cube sum problem. This talk is based on joint works with D. Majumdar, P.Shingavekar and B. Sury.
Abstract:
Neeman recently settled a conjecture by Antieau, Gepner and Heller on the existence of bounded t - structures on the derived category of perfect complexes. We prove a triangulated categorical generalisation of that theorem. In particular, we will show that the existence of a bounded t - structure implies that the singularity category, appropriately defined, vanishes. To achieve this, we also introduce the notion of finitistic dimension for a classically generated triangulated category. Finally, we also show that all t - structures on the completion under these hypotheses are equivalent. This proves that all bounded t - structures on the bounded derived category of a Noetherian finite dimensional scheme are equivalent, generalising a result by Neeman. This is joint work with Rudra dip Biswas, Hong Xing Chen, Chris J. Parker,and Junhua Zheng. (https://arxiv.org/abs/2401.00130)
Abstract:
We study random walks on a d-dimensional torus by affine expanding maps. Assuming an irrationality condition on their translation parts, we prove that the Haar measure is the unique stationary measure. From this, we deduce uniform distribution of almost every orbits modulo 1 in certain self-similar sets in R^d. As this conclusion amounts to normality of numbers in the one-dimensional case, thus we obtain the version of Borel’s theorem on Normal numbers for a class of fractals in R, for instance, cantor typesets. The talk is based on a joint work with Yiftach Dayanand Barak Weiss.