Algebraic Statistics in the Alleghenies at The Pennsylvania State University
Algebraic statistics exploits algebraic geometry and related fields to solve problems in statistics and its applications. Methods from algebraic statistics have been successfully applied to address many problems including construction of Markov bases, theoretical study of phylogenetic mixture models, ecological inference, identifiability problems for graphical models, Bayesian integrals and singular learning theory, social networks, and coalescent theory. In addition to algebraic statistics' successes in solving statistical problems, its research objectives have driven theoretical developments in algebra.
The committee welcomes contributions in methods and applications of algebraic statistics broadly defined, including but not limited to the above topics.
The workshop will have tutorial lectures Saturday and Sunday, with the main conference Monday through Friday including both the longer plenary lectures below and short talks contributed by invitation.
We would like to thank our sponsors: the NSF, the Packard Foundation, and the Department of Mathematics, the Department of Statistics, and the Eberly College of Science at Penn State.
Plenary Speakers
 Elizabeth Allman
 Miki Aoyagi
 Jan Draisma
 Mathias Drton
 Alex Engstrom
 Stephen Fienberg
 Takayuki Hibi
 Lior Pachter
 Eva Riccomagno
 Bernd Sturmfels
 Caroline Uhler
 Henry Wynn
 Ruriko Yoshida
 Information Geometry  Shunichi Amari
 Coalescent Theory  Laura Kubatko
 Tensors in Statistics  Giorgio Ottaviani
 Symbolic Computation for Statistics  Luis David GarciaPuente
 Andrew Critch
 James Degnan
 Jesús FernándezSánchez
 Roberto Fontana
 Elizabeth Gross
 Vishesh Karwa
 Franz Király
 Anna Klimova
 Shaowei Lin
 Luigi Malago
 Mateusz Michalek
 Guido Montufar
 Kenta Nishiyama
 Patrik Norén
 Luke Oeding
 Hidefumi Ohsugi
 Fabio Rapallo
 Laura Silverstein
 Jing Xi
 Piotr Zwiernik
 Giovanni Pistone
 John Rhodes
 Don Richards
 Bernd Sturmfels
 Akimichi Takemura
Schedule
Day  Event  

Friday, June 8  Arrival  
Saturday, June 9 
Tutorials (Part 1) in Life Sciences Berg Auditorium
 
Sunday, June 10 
Tutorials (Part 2) in Life Sciences Berg Auditorium
 
Monday, June 11 
Life Sciences Berg Auditorium
 
Tuesday, June 12  Talks 9am5pm, Life Sciences Berg Audit; poster session 6pm9pm, Life Sciences Bridge.
 
Wednesday, June 13  Life Sciences Berg Auditorium
 
Thursday, June 14  Life Sciences Berg Auditorium
 
Friday, June 15  Life Sciences Berg Auditorium

Abstracts
Tutorial Abstracts
Introduction to Information Geometry
Shunichi Amari, RIKEN Brain Science Institute.
We give fundamentals of information geometry and its applications. We often treat a family of probability distributions for understanding stochastic phenomena in the world. When such a family includes n free parameters, it is parameterized by a real vector of n dimensions. This is regarded as a manifold, where the parameters play a role of a coordinate system. A natural question arises: What is the geometrical structure to be introduced in such a manifold. Geometrical structure gives, for example, a distance measure between two distributions and a geodesic line connecting two distributions. The second question is to know how useful the geometry is for understanding properties of statistical inference and designing new algorithms for inference.
The first question is answered by the invariance principle such that the geometry should be invariant under coordinate transformations of random variables. More precisely, it should be invariant by using sufficient statistics as random variables. It is surprising that this simple criterion gives a unique geometrical structure, which consists of a Riemannian metric and a family of affine connections which define geodesics.
The unique Riemannian metric is proved to be the Fisher information matrix. The invariant affine connections are not limited to the Riemannian (LeviCivita) connection but include the exponential and mixture connections, which are dually coupled with respect to the metric. The connections are dually flat in typical cases such as exponential and mixture families.
A dually flat manifold has a canonical divergence function, which in our case is the KullbackLeibler divergence. This implies that the KLdivergence is induced from the geometrical flatness. Moreover, there exist two affine coordinate systems, one is the natural or canonical parameters and the other is the expectation parameters in the case of an exponential family. They are connected by the Legendre transformation. A generalized Pythagorean theorem holds with respect to the canonical divergence and the pair of dual geodesics. A generalized projection theorem is derived from it.
These properties are useful for elucidating and designing algorithms of statistical inference. They are used not only for evaluating the higherorder characteristics of estimation and testing, but also for elucidating machine learning, pattern recognition and computer vision. We further study the procedures of semiparametric estimation together with estimating functions. It is also applied to the analysis of neuronal spiking data by decomposing the firing rates of neurons and their higherorder correlations orthogonally.
The dually flat structure is useful for optimization of various kinds. A manifold needs not be connected with probability distributions. The invariance criterion does not work in such cases. However, a convex function plays a fundamental role in such a case, and we obtain a Riemannian metric together with two dually coupled flat affine connections, connected by the Legendre transformation. The Pythagorean theorem and projection theorem play again a fundamental role in applications of information geometry.
Symbolic Computation for Statistics
Luis David GarciaPuente, Sam Houston State University.
Algebraic statistics is concerned with the development of techniques in algebraic geometry, commutative algebra and combinatorics to address problems in statistics and its applications. Algebra has seen many applications in statistics, but it is only rather recently that advanced techniques in algebraic geometry and related fields have been used to study statistics. This connection is based on the fact that most statistical models are defined either parametrically or implicitly via polynomial equations. This tutorial is intended to give the reader a first glimpse of the computational aspects of algebraic statistics. We do not pretence to teach algebraic statistics, we will merely talk about it. In this regard, we will follow a pedagogical approach suggested by Igor Shafarevich in his book [Basic notions of algebra, Encyclopedia of Mathematical Sciences, 11. Algebra, I. SpringerVerlag, Berlin, 2005] to introduce most mathematical and statistical notions by discussing a small sample of the basic examples. For instance, we will not introduce the formal notion of irreducible decomposition of a variety, but we will show several examples that motivate this notion and its significance in statistics. We start with a short introduction to computational algebraic geometry featuring many statistically motivated examples. Our discussion of algebraic geometry will be very elementary at the level of [D. A. Cox, J. B. Little and D. B. O'Shea, Ideals, varieties, and algorithms, third edition, Undergrad. Texts Math., Springer, New York, 2007]. In particular, we will use the language of affine varieties and we will only mention briefly the concepts of semialgebraic sets and projective varieties. We will also discuss the notion of a Gröbner basis, since it is the cornerstone of all computations algebraic statistics. We will finish our introduction to computational algebraic geometry with a discussion of the Implicitization Problem and its applications to statistics. The Implicitization Problem consists in finding polynomial equations satisfied by the image of a polynomial mapping. For example, let's consider the map that sends a parameter t to the pair $(t,t^2)$. The image of this map satisfies the equation $y = x^2$. The concept of an (algebraic) statistical model is the central object in this notes. It is worth to point out that there are several competing definitions for a statistical model. Our definition is not as general as the one introduced by Drton and Sullivant in [Algebraic statistical models. Statist. Sinica 17 (2007), no. 4, 1273–1297.], but it will serve well for our purposes. We will spend most of remaining time talking about the main examples of algebraic statistical models, namely, independence and graphical models for discrete and Gaussian random variables. Along the way we will discuss mixture models and their relation to secant varieties. The interested reader should look at the excellent monograph [Drton, Mathias; Sturmfels, Bernd; Sullivant, Seth. Lectures on algebraic statistics. Oberwolfach Seminars, 39. Birkhäuser Verlag, Basel, 2009] written by three of the leading figures in the area. These notes were partially based on this book and also on the introductory notes on algebraic statistics by Serkan Hoşten and Suela Ruffa [Introductory notes to algebraic statistics. Rend. Istit. Mat. Univ. Trieste 37 (2005), no. 12, 39–70 (2006).].
Tutorial on Coalescent Theory
Laura Kubatko, Ohio State
This tutorial will provide an introduction to the topic of coalescent theory. Beginning with common population genetics models (e.g., the WrightFisher model and the Moran model), the coalescent approximation will be developed and its properties discussed. The tutorial will then describe the application of the coalescent in a phylogenetic setting, with attention given to aspects of the coalescent process for which algebraic approaches are useful.
Tensors in statistics
Giorgio Ottaviani, Università di Firenze.
Tensors are mathematical objects that are ubiquitous in the description and modelling of linear phenomena. There are several point of views on tensors, which can be seen as multidimensional matrices (array), as multilinear maps or as elements in a vector space endowed with a group action. Supersymmetric tensors can be seen as multivariate homogeneous polynomials.
Tensors can look quite different after the choice of a coordinate system , but some features have an intrinsic meaning. The most important feature of a tensor is its rank. Decomposable tensors are exactly the tensors of rank one, and they correspond in the statistical modelling to data with independent sources. Higher rank tensors have similar statistical interpretations (mixture models, marginal distributions,...) and have been studied in the geometrical chapter of the so called secant varieties. We will start by discussing several point of views on tensors and their rank, their meaning in statistics and their usefulness.
For bidimensional matrices, the rank can be easily computed, for exact data by gaussian elimination and for approximate data by singular value decomposition. For multidimensional matrices there is an additional feature, the (canonical) decomposition of a tensors as a sum of rank one tensors, known as candecomp/parafac decomposition. In the Markov models and in other statistical settings the uniqueness of this decomposition is very important because it corresponds to the identifiability of the model. Unfortunately, both a full theoretical understanding and an efficient computation of this decomposition are still missing in general, indeed the canonical decomposition is known to exist only for tensors of small rank.
In the last 10 years, several tools from algebraic statistics have been successfully applied in the study of tensor decomposition. As a basic tool, the flattening of a tensor may help to reduce multidimensional problems to two dimensional problems, and to construct invariants that mimic the usual minors, but only up to some extent. We will lecture on these topics and on some applications of tensor decomposition to statistics, ending with the phylogenetic trees, after Allman, Rhodes and others. Phylogenetic trees describe the mutation of DNA in species evolution.
Plenary Abstracts
A semialgebraic description of the general Markov model on trees
Elizabeth S. Allman, University of Alaska Fairbanks.
An interesting class of statistical models are those given by finite state Markov processes on trees. Examples include hidden naive Bayes models and phylogenetic models. A general theorem of real algebraic geometry tells us there exists a characterization of the probability distributions of such a model, using polynomial equations and inequalities. However, it does not provide a useful means of making this semialgebraic model description explicit.
In this talk, we examine the kstate general Markov model on trees, and give a semialgebraic description of the probability distributions arising from it. Along the way, we review some results from the past 60 years on identifying parameters and characterizing distributions from this model class. In particular, we make explicit connections to work of Lazarsfeld, Pearl and Tarsi, and the recent works of Zwiernik and Smith (2011) and Klaere and Liebscher (preprint).
Learning coefficient and singular fluctuation in statistical learning theory
Miki Aoyagi, Department of Mathematics, College of Science & Technology, Nihon University.
In this talk, we consider learning coefficient and singular fluctuation in statistical learning theory. In recent studies, Watanabe showed that the generalization and training errors are subjected to a universal law by using algebraic geometry, and defined the model selection method WAIC as generalized AIC. WAIC is available even for singular learning models, such as neural networks, normal mixtures, reduced rank regressions, Bayes networks, binomial mixtures, Boltzmann machines, and hidden Markov models, while AIC cannot be applied to such models. These models have singular Fisher metric which is not always approximated by any quadratic form. Therefore, it is difficult to analyze the asymptotic behavior of these generalization and training errors. WAIC needs the values of the learning coefficient and the singular fluctuation, which are both birational invariants. The learning coefficient is known as the log canonical threshold of Kullback function. These values can be obtained by Hironaka's Theorem, however, it is still difficult to obtain them in learning theory for several reasons, such as degeneration with respect to their Newton polyhedrons and nonisolation of their singularities. Moreover, in algebraic geometry and algebraic analysis, these studies are usually done over an algebraically closed field. We cannot therefore apply results over an algebraically closed field to our cases, directly. In this talk, we first explain Watanabe's Theorem and next show our several recent results.
Boundedrank tensors and groupbased models.
Jan Draisma, Eindhoven University of Technology.
A phylogenetic tree model is an algebraic variety whose points parameterise probability distributions on $\{A,C,G,T\}^n$ that are compatible with a hypothetical tree with n leaves corresponding to extant species. The variety for a general tree can be reconstructed from the varieties for its subtrees consisting of one vertex plus all its neighbours—socalled star trees (or claw trees). These startree varieties, in turn, consist of boundedrank tensors. I will discuss recent progress on the algebra and geometry of these latter varieties.
Bayesian information criterion for singular models
Mathias Drton, University of Chicago.
This talk is concerned with approximate Bayesian model choice for singular models, using reduced rank regression and onedimensional finite mixture models as motivating examples. Such models do not obey the regularity conditions underlying the derivation of the usual Bayesian Information Criterion (BIC) and the penalty structure in BIC does not accurately reflect the frequentist largesample behavior of their marginal likelihood. While largesample theory for the marginal likelihood of singular models has been developed recently, the resulting approximations depend on the true parameter value and lead to a paradox of circular reasoning. I will discuss a resolution to this problem and give a practical extension of BIC to singular models.
Using Graph Theory in Algebraic Statistics
Alexander Engström, Aalto University.
Many rings and ideals studied in algebraic statistics are constructed from graphs and some extra data. For several constructions of that type it is known how to relate graph properties with algebraic properties. For both graphs and ideals there are methods to build new ones from old, and to determine properties of them by study maps from or into them. I will discuss how to move these methods between graph theory and algebra in the context of algebraic statistics. Examples will be drawn from joint projects with Kahle, Norén, and Sullivant.
Some Aspects of Graphical Models for Network Data
Stephen E. Fienberg, Carnegie Mellon University.
Graphs play an important role as a representation for two dual representations of statistical modelsone where the nodes correspond to units and the edges to variables that relate them to on another, and the other where the nodes are variables and the edges represent relationships among them. In the former units are inherently dependent and relationships may or may not be, whereas in the later the units are inherently independent and the focus is on independence relationships among the variables. In this talk I describe both types of representations involving discrete relationships or variables, and there interplay, focusing on likelihood estimation for Exponential Random Graph Models and related algebraic representations. [Based on joint work with Sonja Petrovic, Alessandro Rinaldo, and Xiaolin Yang]
An introduction to the holonomic gradient descent
Takayuki Hibi, Osaka University.
The holonomic gradient descent introduced in [NNNOSTT] is a new variation of the gradient descent for finding local minima or local maxima numerically. Its algorithm is based on the symbolic computations with Gröbner basis theory in Dmodules. The holonomic gradient descent is a totally new and generalpurpose method which goes beyond the limit of the conventional technique and which is valuable for numerical evaluations of normalizing constants as well as for maximum likelihood estimation for a broad class of distributions in statistics. The holonomic gradient descent is one of the most distinguished results created in the research project ``Harmony of Gröbner Bases and the Modern Industrial Society'' achieved by JST CREST Hibi team (http://www.math.jst.go.jp/en/scientists/teamhibi/index.html) and offers many challenging problems both to statistics and to Dmodule theory. In my talk a quick introduction to the holonomic gradient descent will be given. No special knowledge on Gröbner bases, Dmodules and statistics will be required to understand my talk. [NNNOSTT] Hiromasa Nakayama, Kenta Nishiyama, Masayuki Noro, Katsuyoshi Ohara, Tomonari Sei, Nobuki Takayama, and Akimichi Takemura, Holonomic gradient descent and its application to the FisherBingham integral, Advances in Applied Mathematics 47 (2011),639658.
Latent allocation models and applications to highthroughput genomics
Lior Pachter, U.C. Berkeley.
We will discuss a latent allocation model that has become central to the analysis of highthroughput sequence data in genomics. Specifically, we will explain RNASeq which consists of highthroughput sequencing of transcripts and explain how ambiguously mapped reads are used to infer transcript abundances via a latent allocation model. We will discuss convergence of some maximum likelihood estimation algorithms and possible connections to the maximum likelihood degree of the model.
Combining algebraic statistical models: two examples.
Eva Riccomagno, University of Genova.
In this talk I present three not unrelated projects on which I am currently involved. Structural metaanalysis combines evidence about conditional independence relationships between variables from studies or experiments carried out independently under partially comparable conditions. We discuss some algebraic aspects behind a combination of two statistical Gaussian models. (Joint work with Sofia Massa). Bayesian networks find application in decision theory when the expected utility function reflects the factorization of probabilities implied by the Bayesian network. We outline some algebraic aspects that could lead to structural combination of expert judgment from different panels. (Joint work with Manuele Leonelli and Jim Smith). We generalize standard results on evaluation of expectation of a large class of polynomial function as linear combination of some observed values in some particular settings. (Joint work with Claudia Fassino and Giovanni Pistone).
Commutative Algebra of Statistical Ranking
Bernd Sturmfels, U.C. Berkeley.
A model for statistical ranking is a family of probability distributions whose states are orderings of a fixed finite set of items. We represent the orderings as maximal chains in a graded poset. The most widely used ranking models are parameterized by rational function in the model parameters, so they define algebraic varieties. We study these varieties from the perspective of combinatorial commutative algebra. One of our models, the PlackettLuce model, is nontoric. Five others are toric: the Birkhoff model, the ascending model, the Csiszar model, the inversion model, and the BradleyTerry model. For these models we examine the toric algebra, its lattice polytope, and its Markov basis. This is joint work with Volkmar Welker.
Geometry of faithfulness assumption in causal inference
Caroline Uhler, IST Austria
Algorithms for inferring causality are heavily based on the Faithfulness assumption. The Faithfulness condition alone, however, is not sufficient to guarantee uniform consistency. In the context of structural equation models StrongFaithfulness has been proposed to overcome this problem. In contrast to the original Faithfulness assumption, the set of distributions satisfying StrongFaithfulness does not have measure one. We study the (Strong) Faithfulness condition from a geometric point of view and give upper and lower bounds on the proportion of unfaithful distributions for various classes of graphs.
Differential cumulants, hierachical models and monomial ideals
Henry P Wynn, London School of Economics.
Algebraic methods can be imported into continuous hierarchical statistiscal models via a duality between local (differential) cumulants and monomial ideals. First the theory of local cumulants is developed out of a definition of local moments. Then, independence, conditional independence and graphical models are defined by requiring every member of certain collections of local cumulants to zero. Finally, a natural duality between monomial and differential ideals is used to import various ideal structures and conditions such as the 2linear property and shellability. One outcome is the introduction of new conditions and distribution families into statistical hierarchical models. (joint with Daniel Bruynooghe)
Normality of the threestate toric homogeneous Markov chain model
Ruriko Yoshida, University of Kentucky.
A discrete time Markov chain, a stochastic process with Markov property, has been applied in many fields, such as physics, chemistry, information sciences, economics, finances, mathematical biology, social sciences, and statistics. A discrete time Markov chain is often used as a statistical model from a random physical process to fit the observe the data. A timehomogeneous Markov chain is a process that each transition probability from a state to a state does not depend on time. It is important to test if the observed data set fits the assumption of the timehomogeneity of the chain and in 2011, Hara and Takemura suggested a Markov Chain Monte Carlo (MCMC) approach to a goodnessoffit test for the timehomogeneity of the chain using Markov bases. In this talk we focus on a timehomogeneous Markov chain (THMC) model with three states and without any selfloops, i.e., the probability from a state to itself is zero, and we study algebraic and polyhedral geometric properties of Markov bases for the THMC model with any time $T > 0$. More specifically, we showed the number of facets of the convex hull generated by the columns of the design matrix for the threestate THMC model without loops for any time $T \geq 5$ is $24$, and we showed its explicite hyperplane representation of the convex hull for any $T \geq 4$. Using this result we then showed that the semigroup generated by the columns of the design matrix is normal for any $T > 0$ and finally we showed that a Markov basis for the toric ideal associate with the design matrix for the threestate THMC model without loops consists of binomials of degree less than or equal to $d= 6$. We end this talk with conjectures on further analyses on Markov bases for the THMC model. This is joint work with D. Haws, A., Mart\'in del Campo, and A. Takemura.
Contributed Talk Abstracts
A new parametrization for binary hidden Markov models
Andrew Critch, U.C. Berkeley.
The statistical applications of hidden Markov models have been extremely diverse and successful, including natural language processing, gesture recognition, gene sequencing, and Kalman filtering of physical measurements. HMM are highly nonlinear models, and just as linear models are amenable to linear algebraic techniques, nonlinear models are amenable to commutative algebra and algebraic geometry. Since these tools are much younger than linear algebra, much work remains to fully understand the geometry of HMM. This paper examines closely those HMM in which all the random variables, called nodes, are binary. Its main contributions are (1) minimal defining equations for the 4node model, comprising 21 quadrics and 29 cubics, which were computed using Grobner bases in the cumulant coordinates of Sturmfels and Zwiernik [2011], and (2) a birational parametrization for every binary HMM, with an explicit inverse for recovering the hidden parameters in terms of observables. The new model parameters in (2) are hence rationally identifiable in the sense of Sullivant, GarciaPuente, and Spielvogel [2010], and each model's Zarizki closure is therefore a rational projective variety. Grobner basis computations for the model and its graph are found to be considerably faster using these parameters. Together, (1) and (2) provide a nearly instantaneous test for whether an observed distribution is due to a binary hidden Markov process, in comparison with a previous algorithm of Schonhuth [2011] involving row reduction of exponentially large matrices. Dening equations such as (1) have been used successfully in model selection problems by Casanellas and FernandezSanchez [2006] and Eriksson [2008], and one can hope for similar applications in the case of HMM.
Phylogenetic mixtures for equivariant models
Jesús FernándezSánchez, Universitat Politècnica de Catalunya.
The selection of the most suitable evolutionary model to analyze the given molecular data is usually left to biologist's choice. J. Felsenstein suggested that certain linear equations satisfied by the expected probabilities of patterns observed at the leaves of a phylogenetic tree could be used for model selection. It remained an open question whether these equations were sufficient to characterize the evolutionary model. In this talk, we will show that, for some equivariant models of evolution, the space of distributions satisfying these linear equations equals the space of distributions arising from mixtures of trees on a set of taxa. In particular, we prove that an alignment is produced from a mixture of phylogenetic trees under an equivariant evolutionary model if and only if the distribution of patterns at its columns satisfies the linear equations mentioned above. Moreover, for each equivariant model, we provide an algorithm to obtain the equations defining this space for any number of taxa. These equations have been successfully used for model selection. Conclusions related to the identifiability of phylogenetic mixtures are derived.
$2$factor saturated designs: some applications
Roberto Fontana, Politecnico de Torino.
We consider $2$factor designs. A condition that characterizes
the estimability of the independence model for all saturated fractions is provided.
We also compute the number of all the saturated designs and we discuss some classification criteria.
Finally, we present a method for effects testing in multifactorial designs, that is based on nonisomorphic orthogonal arrays (Basso et al., 2000; Arboretti et al., in press) and we study the application of the same method to saturated designs.
Joint work with Fabio Rapallo and Maria Piera Rogantin; a preliminary version of the work has been submitted to the 46th Scientific Meeting of the Italian Statistical Society, Rome, 20 – 22 June 2012.
Arboretti, R., Fontana, R., Ragazzi, S., in press. Nonisomorphic orthogonal arrays for the statistical analysis of unreplicated experiments. Communications in Statistics – Theory and Methods.
Basso, D., Salmaso, L. , Evangelaras, H. , Koukouvinos, C., 2004. Nonparametric testing for main effects on inequivalent designs. In: mODa 7Advances in modeloriented design and analysis, Physica, Heidelberg.
Toric Ideals of Hypergraphs
Elizabeth Gross, University of Illinois, Chicago.
The ideal of an edge subring of a graph is an object that appears when studying statistical models parameterized by the edges of a graph. There are many results that tell us the same beautiful story: we can understand these ideals, if we understand the combinatorics of the underlying graph. A natural extension is to consider the defining ideal of an edge subring of a hypergraph. In this talk we give some recent results on the toric ideals of hypergraphs, including how to tell when the ideal is generated in a fixed degree. This is joint work with Sonja Petrovic.
Ideal regression
Franz Király, TU Berlin.
Ideal regression is the statistical estimation task of fitting a structured and simple set of polynomial equations to noisy sets of input equations; rephrased algebraically, this amounts to fitting a parametric ideal to a noisy set of input ideals, and generalizes for example ordinary or reduced rank regression. In the talk, we formally define the problem of ideal regression, examine a consistent estimator for its solution, and show how general identifiability results can be obtained from the theory of complete intersections and their moduli. Then, we demonstrate some real world applications in machine learning.
Birch’s Theorem and a Generalized IPF in Curved Exponential Families on Contingency Tables
Anna Klimova, University of Washington.
When relational models for probability distributions on contingency tables do not include the overall effect, they are curved exponential families. We give a generalization of Birch’s theorem for this case and explore the geometry of the equivalence classes of distributions having the same MLE under such models. We propose a generalization of the iterative proportional fitting procedure and illustrate our algorithm with an example. (Joint work with Tamas Rudas)
The Algebraic Geometry of Singular Learning Theory
Shaowei Lin, UC Berkeley.
Singular statistical models occur frequently in machine learning and computational biology. An important problem in the learning theory of singular models is determining the asymptotic behavior of integrals as the sample size grows large. In this talk, we give a brief introduction to Sumio Watanabe's Singular Learning Theory, as outlined in his book "Algebraic Geometry and Statistical Learning Theory". We will also explore the rich algebraic geometry and combinatorics that arise from studying integral asymptotics.
General groupbased models and their generalizations
Mateusz Michalek, Polish Academy of Sciences, Universite Joseph Fourier.
In our talk we focus on specific phylogenetic models given by a group acton. We present general groupbased models and their generalizations  Gmodels. We give many interactions with algebraic geometry, toric geometry and combinatorics. We present recent results concerning ideals of phylogenetic invariants, in relation to the conjecture of Sturmfels and Sullivant. We hope that our methods can be applied to other, not necessarily groupbased, models.
On Secants of Exponential Families
Guido Montufar, Penn State.
We elaborate on convex subsets of discrete multivariate exponential families and relate them to the convex support polytopes. We show that ``The intersections of lines and exponential families encode portions of their support set lattices and their convex subfamilies.'' In particular, an intersection index encodes the existence of foliations of exponential families into simplices of a certain dimension, and an exponential family which intersects a generic line at $(d+1)$ points contains every $[d/2]$dimensional face of the probability simplex, resp. equals the full probability simplex. For example, any binary independence model intersects a line at either $0,1,2$ or $\infty$ points and is an $n$ruled manifold. The 2bit independence model is not a minimal surface.
Holonomic Gradient Descent for the FisherBingham Distribution on the $n$dimensional Sphere
Kenta Nishiyama, Graduate School of Information Science and Technology, Osaka University.
The holonomic gradient descent (HGD) is recently introduced in our paper "Holonomic Gradient Descent and its Application to the FisherBingham Integral" to find local minima or maxima of holonomic functions. Its first application is a maximum likelihood estimation (MLE) for the FisherBingham distribution on the $2$dimensional sphere. In order to perform the MLE, we need an approximate value of the normalizing constant. In case of $n = 2$, the normalizing constant is expressed in terms of the Bessel function and there are several approaches for the MLE in the directional statistics. However, there are a few studies for the case of $n > 2$. Among them, Kume and Wood proposed a method to evaluate the normalizing constant by utilizing the Laplace approximation of the integral for $n > 2$. In this talk, we show that the HGD can be applied to the MLE on the $n$dimensional sphere. In order to apply the HGD, we derive the integrable connection satisfied by the normalization constant for the general $n$ via "Gröbner basis computation by hand" and derive series expansion of the normalization constant. The rank of the diagonal connection is $2n+2$ and we will present that the HGD works well numerically up to $n = 7$ for some class of problems by utilizing them. (This is a joint work with Tamio Koyama, Hiromasa Nakayama and Nobuki Takayama.)
Polytopes from subgraph statistics
Patrik Norén, Aalto University.
The Trifocal Variety
Luke Oeding, UC Berkeley.
In Computer Vision the multiview variety is constructed by considering several cameras in general position in space, all taking pictures of the same threedimensional object. I will show how this construction gives rise to certain tensor, and we may ask for a given tensor, if it arose from such a construction. This question can be answered by finding implicit defining equations for the multiview variety. Also of interest is the more refined algebraic problem to find the minimal generators of the defining ideal of the multiview variety. The case of 3 cameras, the trifocal variety, is already interesting. In this talk I will explain how symbolic and numerical computations aided by Representation Theory and Numerical Algebraic Geometry were used to find the ideal of the trifocal variety. Our work builds on the work of others (such as HartleyZisserman, AlzatiTortora and PapadopouloFaugeras) who have already considered this problem settheoretically. This is joint work with Chris Aholt (Washington).
The normality of cut polytopes
Hidefumi Ohsugi, Rikkyo University.
Sturmfels and Sullivant conjectured that the cut polytope of a graph is normal if and only if the graph has no K_5 minor. We show that the normality of cut polytopes of graphs is a minor closed property. By using this result, we have large classes of normal cut polytopes. As application, it follows that the toric ring of binary graph model of a graph is normal if and only if the graph has no K_4 minor. (Sullivant classified normal toric rings of binary graph model, too.)
Outliers and patterns of outliers in contingency tables with Algebraic Statistics
Fabio Rapallo, Dipartimento DISIT  Universita' del Piemonte Orientale.
In this talk we provide a definition of pattern of outliers in contingency tables within a modelbased framework. In particular, we make use of loglinear models and exact goodnessoffit tests to specify the notions of outlier and pattern of outliers. The language and some techniques from Algebraic Statistics are essential tools to make the definition clear and easily applicable. The theory is motivated by several numerical examples.
Homotopies for Maximum Likelihood Estimation
Jose Rodriguez, University of California Berkeley.
The MLEproblem is to optimize a likelihood function on a discrete statistical model for given data. To solve this problem, we compute the critical points of a function on the Zariski closure of a statistical model for given data. These critical points are candidates for solutions to the MLE problem. By doing the expensive computation of computing critical points once for generic data, we then use numerical algebraic algebraic geometry to construct a homotopy that will solve the MLEproblem for arbitrary data quickly.
Maximal Semigraphoids
Laura Silverstein, San Diego State University.
Semigraphoids are combinatorial structures arising from models of probabilistic conditional independence. We investigate the classification of maximal semigraphoids on n elements. By studying the construction of semigraphoids from generating sets, we establish two quite different classes of maximal semigraphoids. The semigraphoids on up to five elements are computed and their maximal semigraphoids classified highlighting these results.
The characteristic imset polytope for diagnosis models
Jing Xi, University of Kentucky.
In 2010, M. Studen\'y, J. Vomlel, and R. Hemmecke explored a new algebraic description of graphical models, characteristic imsets. Compare with standard imsets, characteristic imsets have several advantages: they are still unique vector representative of CI structures, they are 01 vectors, and they are more intuitive in terms of graphs than standard imsets. After defining characteristic imset polytope as the convex hull of all characteristic imsets, they also showed that model selection in graphical models, which essentially is a problem of maximizing a quality criterion, can be converted into an integer programming problem on the characteristic imset polytope. However, this integer programming problem is very hard in general. Therefore, here we focus on diagnosis models which can be described by Bipartite graphs, and their characteristic imset polytope. In this talk, firstly, we will show that the characteristic imsets for diagnosis models have very nice properties and with these properties we are able to find a combinatorial description of all edges of the characteristic imset polytopes for diagnosis models. Then we prove that these characteristic imset polytopes are simple polytopes. Finally, we end the talk with further questions in this topic.
Graphical Gaussian models and their groups
Piotr Zwiernik, TU Eindhoven.
Let $G$ be an undirected graph with $n$ nodes defining the graphical Gaussian model $M(G)$ parametrized by the set $P(G)$ of all concentration (symmetric, positive definite) matrices with zeros corresponding to nonedges of $G$. We describe the stabilizer in $GL_n(\mathbb{R})$ of the model, in the natural action of $GL_n(\mathbb{R})$ on symmetric matrices. This group gives the representation of graphical Gaussian models as composite transformation families. This has important consequences for the study of the maximum likelihood inference for this model class which I hope to discuss in the second part of the talk. This is a joint work with Jan Draisma and Sonja Kuhnt.
Poster Abstracts
Polyhedral Combinatorics of UPGMA Cones
Ruth Davidson, North Carolina State University.
Tree agglomeration methods such as UPGMA continue to play a significant role in phylo genetic research. Polyhedral combinatorics and linear optimization offer a robust toolkit for analyzing the natural subdivision of Euclidean space induced by classifying the input vectors according to tree topologies returned by the algorithm. Our analysis shows connections between the cones associated to UPGMA trees and other combinatorial objects. We give a closed form for the extreme rays of UPGMA cones on n taxa using the partition lattice. We also compute volumes of the UPGMA cones for small n. This is joint work with Seth Sullivant.
Mixed graphs and their Gaussian causal interpretation
Christopher Fox, University of Chicago.
A mixed graph with directed and bidirected edges encodes a Gaussian statistical model. The nodes of the graph correspond to random variables that are related via a system of linear equations with normally distributed noise terms. The form of the equations is determined by the directed edges, while bidirected edges indicate potential nonzero correlations among noise terms. We present results on the causal interpretation of mixed graphs, focusing on the problem of finding an acyclic digraph whose associated Gaussian hidden variable model is precisely equal to a given Gaussian mixed graph model. In particular, we examine chordless bidirected cycles of length 4 or longer and prove that such Gaussian model equality cannot be achieved.
Cactus length of a form and Gorenstein algebras
Anthony Iarrobino, Northeastern University.
Work of K. Ranestad F.O. Schreyer, and A. BernardiK. Ranesad relates the rank/cactus rank of a generic form of degree j in r variables to the dimension of families of Artinian Gorenstein algebras of socle degree j in r1 variables, having a fixed stratified Hilbert function.
Symbolic computing with multivariate polynomials in R
David Kahle, Baylor University.
Since its inception 20 years ago, the open source statistical computing environment R has garnered a huge following both inside and outside the statistical community and is increasingly the lingua franca of academic statisticians. Unlike computer algebra systems like Mathematica and Maple, however, R has very little native capabilities for symbolic manipulation. The mpoly package (addon) is a basic general purpose collection of tools for symbolic computing with multivariate polynomials in R. In addition to basic polynomial arithmetic, mpoly can take derivatives of polynomials, compute Grobner bases of collections of polynomials, and convert polynomials into a functional form to be evaluated. It is hoped that mpoly might form the basis of future efforts into symbolic computing with multivariate polynomials in R.
Algebraic Ecological Inference
Vishesh Karwa, Penn State.
Ecological inference is the process of inferring individual level behavior from aggregate data. The problem of ecological inference occurs frequently in many applied sciences. In the present work, we present an algebraic solution to the problem of ecological inference using tools from computational algebra. This method can handle multidimensional contingency tables, deterministically respects bounds and can incorporate information from many different sources. We illustrate the method by a few examples and provide R code.
Groupbased models, BerensteinZelevinsky triangles and the invariance of the Ehrhart polynomial
Kaie Kubjas, Freie Universität Berlin.
Buczynska and Wisniewski proved that the HilbertEhrhart polynomial of the JukesCantor binary model depends only on the number of leaves and not on the shape of a 3valent tree. I have shown that this result does not hold for the Kimura 3parameter model. DontenBury and Michalek have shown it for several other general groupbased models. However, the result of Buczynska and Wisniewski also appears in the work of Manon on conformal block algebras, where it corresponds to the case sl(2). Furthermore, the triangle complex of BerensteinZelevinsky triangles dual to a 3valent tree gives the polytope of the JukesCantor binary model and the 3valent tree. I show that if we generalize this construction to the triangle complexes of sl(3) BerensteinZelevinsky triangles then the Ehrhart polynomial depends only on the number of leaves and not on the shape of the dual 3valent trees.
Asymmetry Models for Square Contingency Tables and Exact Tests
Sonja Kuhnt, Faculty of Statistics, TU Dortmund University, Germany.
Square contingency tables with the same row and column classification occur frequently in a wide range of statistical applications, e.g. whenever the members of a matched pair are classified on the same scale, which is usually ordinal. Such tables are analysed by choosing an appropriate loglinear model. We focus on the models of symmetry, triangular, diagonal and ordinal quasi symmetry. The fit of a specific model is tested by the chisquared test or the likelihoodratio test, where pvalues are calculated from the asymptotic chisquare distribution of the test statistic or, if this seems unjustified, from the exact conditional distribution. Since the calculation of exact pvalues is often not feasible, we propose alternatives based on algebraic statistics combined with MCMC methods. Krampe, A., Kateri, M., and Kuhnt, S. (2011) “Asymmetry Models for Square Contingency Tables: Exact Tests via Algebraic Statistics”, Statistics and Computing 21, 5567.
The algebraic geometry of Bayesian network of integrated decision makers
Manuele Leonelli, University of Warwick.
The algebraic geometry of Bayesian network of integrated decision makers. Bayesian networks are the most graphical representation used for decisionmaking problems. However in practice it is often the case that different panels of expert deliver probabilistic models whose output random variable or random vector associated with a particular vertex. The joint probability model of the whole decision support system is then composed as a function of the beliefs of the component panels – see for example Faria & Smith,(1996,97). Thus to constrict a coherent system each expert panel is tasked to deliver only the conditional distributions of its vertex given the values of its parents in the graph. The composite decision making body then takes this Bayesian Network as representing the beliefs of the body. To be rational in a Bayesian sense then demands that the maximum of the expectation of a multiattribute utility function  whose attributes are typically monomial functions of these attributes  should then determine the behaviour of this composite body. In this poster, we demonstrate how the sorts of new technologies developed to study the probabilistic implications of Bayesian Networks e.g. in the Gaussian case as reported by Garcia et all (2005) and Sullivant (2006) can also be applied to this novel domain where the object of interest is not the joint distribution but the expected utility associated with various decision rules.
Conditional Testing with Graver Basis of the Beta Model of Random Graphs
Mitsunori Ogawa, University of Tokyo.
In this poster we discuss conditional testing of the beta model of random graphs. We give an explicit description of the Graver basis for an underlying graph of the beta model and apply the basis to testing the beta model via Markov chain Monte Carlo method. Connectivity by a subset of the Graver basis is also discussed.
On holonomic gradient method for the distribution function of the largest root of a Wishart matrix
Yasuhide Numata, The university of Tokyo.
The exact distribution function of the largest root of a Wishart matrix can be written as the product of the exponential function and the hypergeometric function $1F_{1}(a,c;X)$ of matrix argument. The hypergeometric function of matrix argument is a function of eigenvalues of the square matrix $X$, and is written as the infinite summation of zonal polynomials. It is known that there exists a holonomic system of partial differential equations such that the hypergeometric function is the unique symmetric function satisfying the system. We discuss the numerical method which we call holonomic gradient method.
Conditional Gaussian Distributions with linear restrictions and Classification
Gonzalo Pérez de la Cruz, National University of Mexico, UNAM.
The use of concentration matrices with linear restrictions is introduced in classification of observations. We consider Gaussian models for two populations and linear restrictions on each of the two concentration matrices. Those linear restrictions on a concentration matrix have been introduced by Anderson (1970). Graphical Gaussian models (Lauritzen, 1996) and Coloured Graphical Gaussian models (Hϕjsgaard and Lauritzen, 2005, 2007 y 2008) are particular instances of models with concentration matrices with linear restrictions. We additionally consider equalities between corresponding parameters of the two concentration matrices. Two aspects have to be considered: i) how to obtain estimates assuming that the linear structure of the concentration matrices is given, and ii) how to find the structure of the concentration matrices when it is unknown. The iterative partial maximization algorithm as described in Jensen et al. (1991) is adapted in Perez and Eslava (2011) to solve the first aspect. We present an algorithm to find the zeroes and the colorations of Coloured graphical Gaussian models for just one population. This algorithm could be the base to develop another one to find the structure of the two concentration matrices. We maximize a penalized loglikelihood function as suggested in Gehrmann (2011) using a generalized gradient descent algorithm as in Danaher et al. (2011) and using the path algorithm described in Tibshirani and Taylor (2011) in one of the iterative steps. We present the performance of the algorithm executing it for the Mathematics marks data set (Mardia et al. 1979) and comparing the results with the selected models presented in Gehrmann (2011) and Hϕjsgaard and Lauritzen (2008).
Toric algebra of hypergraphs
Despina Stasi, University of Illinois at Chicago.
Edge subrings of uniform hypergraphs are monomial algebras parametrized by monomials of fixed degree. When this degree is $d=2$, the defining ideals of edge subrings of graphs are toric ideals that are wellknown in the commutative algebra community, and popular in the algebraic statistics community. In this work, we make first steps to extend the theory for these toric ideals from graphs to hypergraphs. We study presentation ideals of monomial subalgebras parametrized by squarefree monomials of degree $d>2$. We provide first combinatorial results describing generators of these ideals, generalizing wellknown results for graphs. We also describe universal Gr\""obner bases and Graver bases in terms of balanced walks on hypergraphs. One of the motivations for studying toric ideals of hypergraphs comes from algebraic statistics, where generating sets of toric ideals play an important role in testing how well a model fits the given data. The edge subring of a $d$uniform hypergraph corresponds to any exponential family model whose joint probabilities are parametrized by monomials of degree $d$. The generators of the toric ideal give a basis for random walks on fibers of the model, and understanding their structure gives insight into the model geometry.
Exact Tests of Hardy Weinberg Equilibrium for MultiAllelic Markers and Triad Designs: Algebraic Statistics Route
Subramaniam Venkatesan, University Of Cincinnati.
For checking HardyWeinberg Equilibrium for multiallelic markers, one uses a chisquared test. The chisquared test is an asymptotic test and it needs fulfillment of several conditions. If the conditions are not met, one could explore developing an exact test. In this presentation, we develop a test a la Fisher. We identify the fiber of all genotype datasets with the same allele frequencies and we also identify a Markovbase for the fiber. If the fiber is very large, we present a Markov Chain Monte Carlo simulation procedure to estimate the exact pvalue. These ideas are extended for triad designs in genetics.
A nonparametric method of searching for outlying gene tree topologies in a dataset
Grady Weyenberg, University of Kentucky.
The distribution of gene histories within a phylogeny is of considerable mathematical and biological interest. Gene histories which are "outliers" with respect to this distribution may be indicative of Biologically interesting phenomena, e.g. selective pressure on the gene. We present a nonparametric method of searching for outlying gene tree topologies in a dataset. Using a number of popular tree distance metrics we define kernels in the space of trees, and then use empirical gene trees to estimate the density. Employing our method on a realworld dataset allowed us to quickly identify several sequences which have clear issues with annotation or alignment, as well as several more genes that we are currently investigating more closely.
Participants
Name  Affiliation  

Elizabeth Allman  Faculty  University of Alaska Fairbanks 
Shunichi Amari  Faculty  RIKEN Brain Science Institute 
Miki Aoyagi  Faculty  Department of Mathematics, College of Science & Technology, Nihon University 
Maryam Farahmand Asil  Student  University of california Berkeley 
Natth Bejraburnin  Student  Dept of Mathematics, UC Berkeley 
Brandon W Bock  Student  North Carolina State University 
Brielin Brown  Student  UC Berkeley Department of Computer Science 
Marta Casanellas  Faculty  Universitat Politecnica de Catalunya 
Saksham Chandra  Student  Penn State University 
Sabyasachi Chatterjee  Student  Yale University 
Harry Crane  Student  University of Chicago 
Andrew Critch  Student  UC Berkeley 
Ruth Davidson  Student  North Carolina State University 
Matthew DeMichele  Postdoc  Penn State 
Walter Dempsey  Student  University of Chicago 
Manfred Denker  Faculty  Penn State 
Jan Draisma  Faculty  Eindhoven University of Technology 
Mathias Drton  Faculty  University of Chicago 
John Dziak  Faculty  Methodology Center, Penn State 
Alex Engstrom  Faculty  Aalto University 
Jesús FernándezSánchez  Postdoc  Universitat Politècnica de Catalunya 
Stephen Fienberg  Faculty  Carnegie Mellon University 
Roberto Fontana  Faculty  Politecnico di Torino 
Christopher Fox  Student  University of Chicago 
Luis David GarciaPuente  Faculty  Sam Houston State University 
François Greer  Student  Stanford University 
Elizabeth Gross  Student  University of Illinois, Chicago 
Bertrand Haas  Postdoc  Harvard University 
Hisayuki Hara  Faculty  Niigata University 
Samuel Hetterich  Student  GoetheUniversity Frankfurt, Germany 
Takayuki Hibi  Faculty  Osaka University 
Serkan Hosten  Faculty  San Francisco State University 
Anthony Iarrobino  Faculty  Northeastern University 
David Kahle  Faculty  Baylor University 
Vishesh Karwa  Student  Penn State University 
Franz Király  Postdoc  TU Berlin 
Anna Klimova  Student  University of Washington 
Laura Kubatko  Faculty  Ohio State 
Kaie Kubjas  Student  Freie Universität Berlin 
Sonja Kuhnt  Faculty  Faculty of Statistics, TU Dortmund University, Germany 
Andre Kuney  Student  Oberlin College 
Manuele Leonelli  Student  University of Warwick 
Nan Li  Student  MIT 
Junli Lin  Student  PSU 
Shaowei Lin  Postdoc  UC Berkeley 
Abraham Martin del Campo  Student  Texas A&M University 
Petra Elisabeth Meyer  Student  Goethe Universität Frankfurt 
Mateusz Michalek  Student  Polish Academy of Sciences, Universite Joseph Fourier 
Guido Montufar  Postdoc  Pennsylvania State University 
Jason Morton  Faculty  Penn State 
Uwe Nagel  Faculty  University of Kentucky 
Duy Nguyen  Student  Texas Christian University; University of WisconsinMadison 
Kenta Nishiyama  Postdoc  Graduate School of Information Science and Technology, Osaka University 
Patrik Norén  Student  Aalto University 
Yasuhide Numata  Postdoc  The university of Tokyo 
Augustine O'Keefe  Student  Tulane University 
Mike O'Sullivan  Faculty  San Diego State University 
Luke Oeding  Postdoc  University of California, Berkeley 
Mitsunori Ogawa  Student  University of Tokyo 
Hidefumi Ohsugi  Faculty  Rikkyo University 
Giorgio Ottaviani  Faculty  Università di Firenze 
Lior Pachter  Faculty  U.C. Berkeley 
Mavis Pararai  Faculty  Indiana University of Pennsylvania 
Gonzalo Pérez de la Cruz  Student  National University of Mexico, UNAM 
Sonja Petrovic  Faculty  Penn State 
Fabio Rapallo  Faculty  Dipartimento DISIT  Universita' del Piemonte Orientale  Italy 
Qingchun Ren  Student  UC Berkeley 
John Rhodes  Faculty  University of Alaska Fairbanks 
Eva Riccomagno  Faculty  University of Genova 
Don Richards  Faculty  Penn State 
Jose Rodriguez  Student  U.C. Berkeley 
Edward Roualdes  Student  University of Kentucky 
Sepideh Shafiei  Student  Northeastern University 
Junfeng Sheng  Faculty  Bowling Green State University 
Laura Silverstein  Student  San Diego State University 
Erik Sjöland  Student  Aalto University 
Aleksandra Slavkovic  Faculty  Penn State 
Bonnie Smith  Postdoc  University of Kentucky 
Eftychia Solea  Student  Penn State 
Despina Stasi  Student  University of Illinois at Chicago 
Bernd Sturmfels  Faculty  U.C. Berkeley 
Seth Sullivant  Faculty  North Carolina State University 
Lucia Tabacu  Student  Penn State 
Andrea Toreti  Faculty  JustusLiebig University of Giessen, Department of Climatology, Climate Dynamics and Climate Change 
Jacob Turner  Student  Penn State 
Caroline Uhler  Faculty  IST Austria 
Subramaniam Venkatesan  Student  University Of Cincinnati 
Mina Vora  Faculty  East Georgia College 
James Voss  Student  Ohio State University 
Lynn Waterhouse  Student  Penn State 
Grady Weyenberg  Student  University of Kentucky 
Henry Wynn  Faculty  London School of Economics 
Jing Xi  Student  University of Kentucky 
Xiaolin Yang  Student  Carnegie Mellon University 
Ruriko Yoshida  Faculty  University of Kentucky 
Cheng You  Student  Penn State 
Piotr Zwiernik  Postdoc  TU Eindhoven 