Nov 17, 2014

# Student Puzzle Corner 7 (deadline January 15)

The Student Puzzle Corner contains one or two problems in statistics or probability. Sometimes, solving the problems may require a literature search. Current student members of the IMS are invited to submit solutions electronically (to bulletin@imstat.org with subject “Student Puzzle Corner”). The deadline is January 15, 2015. The names and affiliations of (up to) the first 10 student members to submit correct solutions, and the answer(s) to the problem(s), will be published in the next issue of the Bulletin. The Editor’s decision is final.

## Student Puzzle Corner 7

Inspired by real scientific problems in a wide variety of natural sciences, RMT (Random Matrix Theory) has now entered a fantastically flourishing and sophisticated stage. One of the earliest major results was the Marchenko-Pastur law, which described the limiting form of the empirical distribution of the singular values of large order random matrices, with some assumptions on the moments of the entries of the matrix. The Tracy Widom distribution on the edge of the spectrum is another rather early classic result. Nearly unimaginably powerful mathematics has of late been used in RMT in opening up and clearing uncharted paths, dealing with universality, sparsity, perturbation, heavy tails, dependence, discrete entries, and so on. Quite interestingly, deceptively simple problems in RMT can sometimes be extremely difficult. Here is an extremely simple case of such a deceptive but difficult problem.

Let $A$ be a $3\times 3$ random matrix with iid entries and suppose each of the nine entries is a two-valued random variable with the distribution $P(a_{11} = 1) = p, P(a_{11} = -1) = 1-p, 0 < p < 1$; here $p$ can be thought of as a parameter. Let $f(p)$ denote the probability that $A$ is singular. Prove that $f(p)$ is a polynomial in $p$ of degree $6$, and evaluate exactly $\int _{0}^1 f(p)dp$ and $f(\frac{1}{2})$. For extra credit, prove that $f(p)$ is minimized at $p = \frac{1}{2}$. It is conjectured that in the case of a general $n$, $f(\frac{1}{2}) \to 0$ at the rate $f(\frac{1}{2}) = (\frac{1}{2} + o(1))^n$; but, so far, it remains an unsolved problem. A superb reference is Terry Tao's American Mathematical Society text Topics in Random Matrix Theory; even more current is the American Mathematical Society monograph Modern Aspects of Random Matrix Theory, edited by Van Vu, where you can find four extremely informative survey articles and latest references and techniques. RMT is naturally useful in modern multivariate analysis, smoothing, shrinkage, regularization, clustering, and data compression; among many others, you can search for work of, for example, Z D Bai, Peter Bickel, A. Bose, Tony Cai, Emmanuel Candes, David Donoho, Alan Edelman, N. El-Karoui, V. L. Girko, F. Götze, Alice Guionnet, Len Haff, T. Hastie, J. Huang, Iain Johnstone, P. Krishnaiah, M. Krishnapur, E. Levina, Ingram Olkin, D. Paul, Mark Rudelson, Jack Silverstein, A. Soshnikov, Charles Stein, R. Tibshirani and Ofer Zeitouni on RMT in statistics.

## Solution to Student Puzzle 6

Anirban DasGupta, IMS Bulletin Editor, explains:

The data used were the atomic numbers of the first fifty elements and the true quantum used was $q = 2$; the error distribution was a uniform on $[-1, 1]$. You were not given any of this information. Testing for existence of a quantum or estimating a quantum are awfully challenging, because one doesn’t know what multiples of the quantum the observed dimensions are. A similar sort of problem is that of the binomial $N$, one where both parameters of the binomial distribution are unknown. If you knew $N$, estimating $p$ is simple; if you knew $p, N$ can be estimated quite well. But they are nearly unestimable when both are unknown. Among others, work of Feldman, Olkin, Petkau and Zidek, Carroll and Lombard, Peter Hall, Herman Rubin (with a coauthor), Wasserman and Lavine, and Raftery have studied the serious difficulties of the binomial $N$ problem.

The most well known early work on quantum estimation is by Broadbent (1955, 1956), followed by highly original later work of David Kendall. Kendall introduced the cosine quantogram defined as
$f_n(q) = \frac{1}{\sqrt{n}}\,\sum_{i = 1}^n \cos (2\pi \frac{x_i}{q})$,
where $x_i$ are the observed data values. At the true quantum, the trajectory of $f_n(q)$ will peak and elsewhere, the quantogram will be small due to cancellations. So you could look at the point of peak of the quantogram and use it to decide if there is really a quantum,
and simultaneously, to estimate it.

The mathematical difficulty with the quantogram is that computing a $P$-value for the maximum of the process $f_n(q)$ is difficult, even asymptotically. However, tail probabilities, namely $P(\sup_{q_1 < q < q_2}\,f_n(q) > y)$ can be upper bounded analytically; extreme value theory tells us how to do that. Two references I recommend are Simeon Berman’s Sojourns and Extremes of Stochastic Processes, and Robert Adler’s The Geometry of Random Fields. Also useful is Theorem $17.8$ on pp $578$ in Probability for Statistics and Machine Learning.

A more data analytic approach is to try omnibus methods, like least squares. You would now look at a plot of
$ss(q) = \sum_{i = 1}^n (x_i – q\, [\frac{x_i}{q}])^2$
and look both at its oscillation and the global and near global minima; here $[ \frac{x_i}{q}]$ denotes the integer part of $\frac{x_i}{q}$ – they are proxies for the unobserved $n_i$’s. This is graphical, but the plot is usually very oscillatory and you don’t feel confident using its global minimum. However, near the true quantum, there is often a local minimum. Quantum testing and estimation remain a pretty open challenge.

## Welcome!

Welcome to the IMS Bulletin website! We are developing the way we communicate news and information more effectively with members. The print Bulletin is still with us (free with IMS membership), and still available as a PDF to download, but in addition, we are placing some of the news, columns and articles on this blog site, which will allow you the opportunity to interact more. We are always keen to hear from IMS members, and encourage you to write articles and reports that other IMS members would find interesting. Contact the IMS Bulletin at bulletin@imstat.org

## What is “Open Forum”?

In the Open Forum, any IMS member can propose a topic for discussion. Email your subject and an opening paragraph (to bulletin@imstat.org) and we'll post it to start off the discussion. Other readers can join in the debate by commenting on the post. Search other Open Forum posts by using the Open Forum category link below. Start a discussion today!

## About IMS

The Institute of Mathematical Statistics is an international scholarly society devoted to the development and dissemination of the theory and applications of statistics and probability. We have about 4,500 members around the world. Visit IMS at http://imstat.org

Latest Issue