Statistics seminar

  • The Statistics group of the Mathematical Institute meets weekly on Mondays, 13.00-13.45.
  • Due to the Corona virus, the seminar is held online for an indefinite time.

Schedule

29-06-2020 | Maarten Kampert

Improved Strategies for Clustering Objects on Subsets of Attributes

Cluster discovery in high-dimensional settings is challenging when objects do not cluster on all attributes, or a single common subset, but rather on different subsets of attributes. To reveal such a clustering structure, the COSA procedure was proposed (Clustering Objects on Subsets of Attributes) that produces a representative distance matrix by finding differential attribute weights. This COSA distance matrix can subsequently be analyzed by a variety of distance-based analysis methods, such as hierarchical clustering or multidimensional scaling. We propose a series of improvements to the original procedure by a) making one of the tuning parameters superfluous, b) allowing for variable selection via zero-valued attribute weights and c) adjusting the COSA distance so as to better separate objects belonging to different clusters. In addition, we implement a more general regularization strategy for the attribute weights, which allows for user specified initialization and leads to improved group extraction. We demonstrate the performance of COSA by comparing it to the original version, and to a number of other state-of-the-art methods, using both simulated and real omics data sets.

Past meetings

15-06-2020 | Willem Heiser
Cancelled

08-06-2020 | Valentina Masarotto
When data are continuous : geometry and inference on covariance operators

25-05-2020 | Laura Zwep
Cancer Treatment Efficacy: Mechanistic modelling in high-dimensional setting

18-05-2020 | Bart Eggen
Mathematical statistical theory for causal inference

11-05-2020 | Lasse Vuursteen
Distributed nonparametric hypothesis testing with (very) limited communication

20-04-2020 | Geerten Koers
Statistical challenges in modern day astronomy

06-04-2020 | Rianne de Heide
Bayesian best-arm identification

30-03-2020 | Dirk van der Hoeven
Adaptive methods in online learning.

02-03-2020 | Tim van Erven
The many faces of exponential weights in online learning.

24-02-2020 | Botond Szabó
Variational Bayes for high-dimensional linear regression with sparse priors.

17-02-2020 | Magnus Münch
Predicting drug sensitivity from cell lines informed by external data.

10-02-2020 | Aad van der Vaart
Tolerance regions.

03-02-2020 | Stefan Franssen
The Bernstein-Von Mises theorem for the Pitman-Yor process.

27-01-2020 | Amine Hadji
Distributed methods for Bayesian regression: contraction rate & uncertainty quantification.

20-01-2020 | Marta Fiocco
Marginal structural models with joint-exposure for counterfactual histological response.

13-01-2020 | Magnus Münch
(Cancelled)

09-12-2019 | Richard Gill
Lies, damned lies, and statistics. The case of Ben Geen. An English Lucia de B. New data, new insights.

02-12-2019 | Kevin Duisters
On frequentist coverage of Bayesian credible sets for estimation of the mean under constraints.

25-11-2019 | Peter Grünwald
Safe statistics.

18-11-2019 | Anja Rüten-Budde
Assessment of predictive accuracy of an intermittently observed binary time-dependent marker.

04-11-2019 | Thomas Nagler
Vine copula models – past, present, and future.

28-10-2019 | Jeanne Nguyen
CDRodeo: Greedy selection of multivariate bandwidths for kernel conditional density estimation.