Schedule for Spring 2013
Informatics Seminar, Perspectives in Informatics 5
Unless noted otherwise, all talks 16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus

 David Rappaport (Queen's), Triangles and T. rex: Digital technology in support of museum services, hosted by David Avis

 Antoine Deza (McMaster), Colourful linear programming and simplicial depth, hosted by David Avis

 Sanjeev Arora (Princeton), Is Machine Learning Tractable? — Three Vignettes, hosted by Kazuo Iwama
 May 2 Thursday, regular replacement day for April 29 (Mon)
 Alejandro Ribes (EDF R&D), Image Spectrometers for the scanning of fineart paintings, hosted by Xuefeng Liang
 May 9 Thursday, Irregular day
 Mia Persson (Malmö University), A fast parallel algorithm for minimumcost small integral flows, hosted by Kazuo Iwama

 Stanko Trifkovic (Kyoto U), Spatial Distribution and Hidden Trees in Bitterlich’s AngleCount Sampling
 Marco Cuturi (Kyoto U), Mean Reversion with a Variance Threshold

 Aapo Hyvärinen (University of Helsinki, ATR), Analysing brain waves by unsupervised learning methods, hosted by Akihiro Yamamoto

 Xuefeng Liang (Kyoto U), Segmentation of multiple interdependent motions using potential surface
 Adam Jatowt (Kyoto U), Estimating focus time of documents

 Francois Le Gall (University of Tokyo), Matrix Multiplication and Graph Algorithms, hosted by Akihiro Yamamoto

 Toby Hocking (Tokyo Institute of Technology), Learning penalties for changepoint detection using maxmargin interval regression. hosted by Marco Cuturi

 Ryuhei Uehara (JAIST) , Common developments of different polyhedra, hosted by Xuefeng Liang

 The lecture is moved to July 25.

 JeanPhilippe Vert (Mines ParisTech, Institut Curie), Learning with structured sparsity in computational biology, hosted by Marco Cuturi

 Teruko Takada (Graduate School of Business, Osaka City University), Robust big data analysis of financial bubbles, hosted by Akihiro Yamamoto and Marco Cuturi

 Shinichi Nakajima (Nikon Corporation), Global Solution and Theoretical Guarantee of Variational Bayesian PCA, hosted by Marco Cuturi
 July 25 Thursday, Irregular day
 Sourav S Bhovmick (Nanyang Technological University), Towards In Silico Networkdriven Combination Drug Therapy: A
Big“Small” Data Problem, hosted by Adam Jatowt
Abstracts and Detailed Information
April 8
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
David Rappaport, Queen's University
Title: Triangles and T. rex: Digital technology in support of museum services
Abstract: Research Casting International (RCI) is one of the world's largest providers of museum technical services, which include specimen preparation, specimen restoration, specimen casting, specimen molding, specimen mounting, exhibit fabrication and exhibit moving.
RCI has a large digital collection of its specimens. These 3D data files can be used to create models using 3D printing or CNC milling. The digital models are also useful for planning poses for museum exhibits. Since January 2011 we have been working with RCI on various projects providing software solutions related to their 3D data collection.
We have provided software for mesh compression, post processing of meshes to ensure required geometric properties for printing and milling, as well as ancillary programs to streamline the workflow for their technicians. Ongoing projects involve mesh simplification, and mesh partitioning in support of the CNC milling process.
I will describe our activities with a focus on technical issues that we have resolved.
April 15
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Antoine Deza, McMaster University
Title: Colourful linear programming and simplicial depth
Abstract: We present recent results and open questions dealing with a generalization of linear programming introduced by Bárány and Onn in 1997, and the associated generalization of the Carathéodory's Theorem proven by Bárány in 1982. In particular, we present recent generalizations of the Colourful Carathéodory Theorem due to Arocha et al. and Holmsen et al. and our strengthening of these. We provide a lower bound for the colourful version of the simplicial depth improving the earlier bounds of Bárány and Matoušek and of Stephen and Thomas, and verify that the conjectured lower bound is tight for dimension 4. Computational approaches for small dimensions are discussed.
Slides
April 22
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Sanjeev Arora, Princeton University
Title: Is Machine Learning Tractable? — Three Vignettes
Abstract: Many tasks in machine learning (especially unsupervised learning) are provably intractable: NPcomplete or worse. Nevertheless, researchers have developed heuristic algorithms to try to solve these tasks in practice. In most cases, these algorithms are heuristics with no provable guarantees on their running time or on the quality of solutions they return. Can we change this state of affairs?
This talk will suggest that the answer is yes, and describe three of our recent works as illustration. a) A new algorithm for learning topic models. (It applies to Linear Dirichlet Allocations of Blei et al. and also to more general topic models. It provably works under some reasonable assumptions and in practice is up to 50 times faster than existing software like Mallet. It relies upon a new procedure for nonnegative matrix factorization.) b) What classifiers are worth learning? (Can theory illuminate the contentious question of what binary classifier to learn: SVM, Decision tree, etc.?) c) Provable ICA with unknown gaussian noise. (An algorithm to provably learn a “manifold” with small number of parameters but exponentially many “interesting regions.”)
The talk will be selfcontained.
(Based upon joint works with Rong Ge, Ravi Kannan, Ankur Moitra, Sushant Sachdeva.)
May 2
This seminar takes place on Thursday
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Alejandro Ribes, Department SINETICS, EDF R&D, France
Title: Image Spectrometers for the scanning of fineart paintings
Abstract: The concept of a Spectral Image is fundamental to this presentation. Such an image contains a spectral reflectance per pixel (image element) instead of the traditional three values representing color. Acquiring a spectral image requires an image spectrometer, which can be considered as an advanced type of digital camera. A straightforward property of spectral images is that highcolor fidelity is easily obtained. There are some applications where highfidelity color is crucial. One clear example is when capturing an image of a fineart painting in a museum. In this case, the people that will look at the image (in printed form or in a screen) are curators, arthistorians or other professionals that are used to perceiving small subtleties in color. Furthermore applications such as the generation of underdrawings, virtual restoration of paintings or pigment identification are feasible using this acquisition technology.
In this presentation I will describe the technology and signal processing methods involved in spectral imaging. I will also present examples of systems used in actual museums. Most examples images, such as multispectral scan of the Mona Lisa by Leonardo da Vinci, come for the CRISATEL European project and from the Louvre Museum in Paris.
May 9
This seminar takes place on Thursday
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Mia Persson Department of Computer Science, Malmö University, Sweden
Title: A fast parallel algorithm for minimumcost small integral flows (joint work with Andrzej Lingas)
Abstract: We present a new approach to the minimumcost
integral flow problem for small values of the flow.
It reduces the problem to the tests of simple multivariate
polynomials over a finite field of characteristic two
for nonidentity with zero. In effect, we show that
a minimumcost flow of value $k$ in a network
with $n$ vertices, a sink and a source,
integral edge capacities and positive integral
edge costs polynomially bounded in $n$
can be found by a randomized PRAM, with errors
of exponentially small probability in $n,$ running
in $O(k\log (kn)+\log^2 (kn))$ time and using
$2^{k}(kn)^{O(1)}$ processors. Thus, in particular,
for the minimumcost flow of value $O(\log n),$ we obtain
an $RNC^2$ algorithm.
Slides
May 13
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Stanko Trifkovic Social Informatics, GSI
Title: Spatial Distribution and Hidden Trees in Bitterlich’s AngleCount Sampling
Abstract: Bitterlich’s method which he called anglecount sampling goes by a variety of names, including horizontal point sampling. This method has been extensively studied in the past and is widely used in estimating relative basalarea of trees in forests. Although concerns regarding the use of anglecount sampling in highly diverse forests still remain, our understanding of influences caused by different spatial distributions of trees is still limited. This study presents sampling simulations applying anglecount sampling in virtual forests with regular, random and clustered spatial distributions of trees, and with normal or exponential distributions of tree diameters. The findings suggest that the counts with anglecount sampling fitting to a widely known Poisson distribution can be used to indicate uniformly random spatial distributions of trees. Indices based on the counts do not require any distances to be measured and are likely to be independent from tree diameter distribution or applied basalarea factor. Moreover, this study demonstrates that the clustering induces a significant increase in numbers of the “hidden trees”, which usually cause surveyors to move sideways to check whether they should be counted or not. Clustering of the trees is also likely to highly reduce a statistical confidence to basalarea estimates; the estimates are significantly less precise in clustered than in random or in regular populations. Therefore, indexing spatial distributions of trees in forests should be a regular practice in forest inventories. The Bitterlich’s anglecount method can be one of the practical choices.
Marco Cuturi Intelligence Science & Technology, GSI
Title: Mean Reversion with a Variance Threshold
Abstract: Starting from a sample path of a multivariate stochastic process, we study several techniques to isolate linear combinations of the variables with a maximal amount of mean reversion, while constraining the variance of the combination to be larger than a given threshold. We show that many of the optimization problems arising in this context can be solved exactly using semidefinite programming and a variant of the Slemma. In finance, these methods can be used to isolate statistical arbitrage opportunities, i.e. mean reverting baskets with enough variance to overcome market friction. In a more general setting, mean reversion and its generalizations can also be used as a proxy for stationarity, while variance simply measures signal strength.
May 20
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Aapo Hyvärinen University of Helsinki, ATR
Title: Analysing brain waves by unsupervised learning methods
Abstract: Unsupervised machine learning methods have been succesful in analysing EEG and MEG data (“brain waves”) in two ways. First, they can remove technical artifacts from the data, and second, they can even decompose brain activity into more understandable components. Such methods are of particular interest in analysing brain activity when there is no particular stimulation and the brain is more or less at rest. In this talk, I will present some basic results an how unsupervised learning methods can be uses to analyse such brain waves. The method used are centered around independent component analysis, further including Bayesian networks or structural equation models.
May 27
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Xuefeng Liang Intelligence Science & Technology, GSI
Title: Segmentation of multiple interdependent motions using potential surface
Abstract: A challenge in motion segmentation is that different motions are often mixed up and interdependent in the real data. Since the 2D representation of the dependent motions is obscure, it leads the segmentation difficult. In this talk, we will discuss a motion segmentation and recovery method for complex scenarios. Theoretically speaking, 2D representation of interdependent motions cannot be well segmented in 2D space, we thus first transform the 2D motion vector field into the 3D potential surface. In which, different motions are placed onto different layers so that they can be segmented much easier. By applying the surface fitting, the potential surfaces of global and local motions are then estimated. Finally, the recovered motions are obtained by projecting the segmented potential surfaces back to motion field. Using potential surface makes our method able to deal with both independent and dependent, rigid and nonrigid motions without prior knowledge. We will also explore its application in video alignment and action recognition.
Adam Jatowt Social Informatics department
Title: Estimating focus time of documents
Abstract: Temporality is an important characteristic of text documents. While some documents are clearly atemporal, many can be mapped to certain time periods. In this talk, we will introduce the problem of estimating focus time of documents. Document focus time is defined as the time period to which the content of a document refers to and is considered as a complementary dimension to its creation time or timestamp. We will describe several estimators of focus time by utilizing external knowledge bases such as news article collections which contain explicit temporal references. We will then show the effectiveness of our methods on diverse datasets of documents about historical events in several countries and discuss how focus time can be combined with document timestamp for making possible issuing complex temporal queries in longitudinal document collections. The talk will be concluded by the reference to our research of automatically estimating focus time of images.
June 3
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Francois Le Gall University of Tokyo
Title: Matrix Multiplication and Graph Algorithms
Abstract: In this talk I will describe how algorithms for matrix multiplication can be used to design asymptotically fast algorithms for several graphtheoretic tasks. I will start with the wellknown application of algorithms for the usual matrix product of two real matrices, which has been the subject of intensive research since the discovery of the first truly subcubictime algorithm by Strassen in 1969, to the problem of computing the allpairs shortest paths in a graph. Then I will present other kinds of matrix products, and in particular matrix products over algebraic structures known as “semirings” (such as the Boolean matrix product and the distance matrix product) and explain how they can be applied to solve several graphtheoretical problems as well. Finally, I will describe new quantum algorithms for these matrix products over semirings, which are faster than the best known classical algorithms.
June 10
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Toby Hocking Tokyo Institute of Technology
Title: Learning penalties for changepoint detection using maxmargin
interval regression
Note: In this interactive talk I will require a volunteer from that audience to create a database of changepoint annotations from which we will learn a penalty function!
Abstract: In segmentation models, the number of changepoints is typically chosen using a penalized cost function. In this work, we propose to learn the penalty and its constants in databases of signals with weak changepoint annotations. We propose a convex relaxation for the resulting interval regression problem, and solve it using accelerated proximal gradient methods. We show that this method achieves stateoftheart changepoint detection in a database of annotated DNA copy number profiles from neuroblastoma tumors.
June 17
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Ryuhei Uehara, Japan Advanced Institute of Science and Technology
Title: Common developments of different polyhedra
Abstract: What do you imagine when you hear a word “development” or “net” of a polyhedron? A typical development would be a socalled “Latincross” that is obtained by cutting some edges of a cube. However, you can also obtain a tetrahedron from the Latincross by folding different lines. Including this one, you can obtain 23 different polyhedra from just one Latincross in 85 different folding ways. In this talk, I will demonstrate some polygons that can fold to two or more convex polyhedra, and give some related open problems.
July 1
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
JeanPhilippe Vert, Mines ParisTech, Institut Curie
Title: Learning with structured sparsity in computational biology
Abstract: Learning with structured sparsity has emerged as a powerful approach to estimate predictive models with complex or structured data. By defining specific nonsmooth convex regularisers, one can encode specific prior knowledge into computationally efficient procedures. In this talk, I will describe several such approaches we have developed for applications in computational biology. Applications include the inference of prognostic models in cancer from genomic data, the detection of breakpoints in noisy DNA profiles, or the quantification of RNA isoforms from highthoughput sequencing data.
July 8
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Teruko Takada, Osaka City University
Title: Robust big data analysis of financial bubbles
Abstract:
Reduction in the damage caused by financial bubbles is clearly a socially critical issue. However, the analysis of financial bubble is difficult in that we need to explain complex system given limited available information. We tackle this difficulty in the following two ways: (1) robust and efficient information extraction by developing new statistical methods and (2) increasing information itself by using big data such as highfrequency financial data and the SNS data from the web. Newly developed statistical methods and several new findings from the big data analysis will be presented in this talk. The possibility of designing the policy and system will also be discussed.
July 22
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Shinichi Nakajima, Nikon Corporation
Title: Global Solution and Theoretical Guarantee of Variational Bayesian PCA
Abstract:
For statistical models in which the rigorous Bayesian learning is computationally intractable, the variational Bayesian (VB) approximation is a good alternative with its efficient computation and automatic relevance determination (or model selection) property. This talk starts with a general explanation on the VB approximation, including the standard procedure to derive a tractable iterative algorithm, and its advantage over the MAP estimation. Then, I'll show our recent theoretical results on VB in the probabilistic PCA or matrix factorization with no missing entry. These results include an analyticform global solution, bounds of the noise variance estimator, and a theoretical guarantee for perfect dimensionality recovery. The analyticform solution not only provides a stable and fast algorithm for VBPCA, but also forms a building block of efficient algorithms for more complicated models.
References: C. M. Bishop, Variational Principal Components. ICANN1999. http://research.microsoft.com/enus/um/people/cmbishop/downloads/BishopVPCAICANN99.pdf
S. Nakajima, M. Sugiyama, S. D. Babacan, Ryota Tomioka, “Global Analytic Solution of Fullyobserved Variational Bayesian Matrix Factorization,” Journal of Machine Learning Research, vol.14, pp.137, 2013.
S. Nakajima, R. Tomioka, M. Sugiyama, S. D. Babacan, “Perfect Dimensionality Recovery by Variational Bayesian PCA,” TwentySixth Annual Conference on Neural Information Processing Systems (NIPS2012).
S. Nakajima, M. Sugiyama, S. D. Babacan, “Variational Bayesian Sparse Additive Matrix Factorization,” Machine Learning (Special Issue of Selected Papers of ACML 2012), 2013.
July 25
This seminar takes place on Thursday
16:30  18:00, Room 213, Faculty of Engineering Integrated Research Bldg, Main Campus
Sourav S Bhowmick, Nanyang Technological University
Title: Towards In Silico Networkdriven Combination Drug Therapy: A Big “Small” Data Problem
Abstract: Interest in combination drug treatment arising from positive clinical experience in diseases, such as
cancer, has sparked renewed research efforts in combination drugs. A key benefit of drug combinations
is drug synergy where the overall therapeutic effect is greater than the sum of the individual drug effect.
However, not all drug combinations produce such synergy. Hence, there is a need for a rational approach in
developing synergistic drug combinations. Recently, several in silico approaches based on combinatorial
optimization have been proposed for identifying drug combinations. However, these approaches are
prohibitively expensive especially for problems with high degrees of freedom. Additionally, they are not
synergistic and oblivious to offtarget effects which play critical role in determining efficacy and safety
of the drug combinations. This talk consists of two parts. First, for the benefit of the audience who
are not familiar with biology, the fundamental biological concepts related to this talk will be introduced.
Second, we give an overview of our journey towards the grand goal of designing and implementing a
novel networkdriven fast in silico framework for identifying drug combinations that are synergistic by
analysing the molecular interactions at a systems level.