Probabilistic and Distance-based Approaches for Computing Highest-Density Regions (Talk @ EcoSta2023)

Abstract

Many statistical problems require estimating a density function, say f , from data samples. Multivariate highest-density regions (HDRs) are considered - i.e., minimum volume sets containing a given probability - typically computed using a density quantile approach. Nevertheless, the density estimation task is far from trivial, especially over increased dimensions and when data are sparse and exhibit complex structures (e.g., multimodalities or particular dependencies). This challenge is addressed by exploring alternative approaches to build HDRs that overcome direct multivariate density estimation. First, the density quantile method - currently implementable based on a consistent density estimator - is generalized to neighbourhood measures, i.e., measures that preserve the order induced in the sample by f. Second, it is elaborated on, and several suitable probabilistic - and distance-based measures are evaluated, such as the k-neighbourhood Euclidean distance. Third, motivated by the ubiquitous role of copula modelling in modern statistics, its use in probabilistic-based measures is explored. By separately modelling marginals and their (potentially complex) dependence structure - that is the copula - the multivariate density estimation and better capture data specificities can both be relaxed. Finally, a comprehensive comparison among the introduced measures is provided, and their implications for computing HDRs in real-world problems are discussed.

Date
Aug 3, 2023 12:00 AM
Location
Waseda University, Tokyo

Related