Thomspon sampling

Modeling considerations when optimizing adaptive experiments under the reinforcement learning framework (Invited Talk @ ICSDS2023)

Artificial intelligence tools powered by machine learning have shown considerable improvements in a variety of experimental domains, from education to healthcare. In particular, the reinforcement learning (RL) and the multi-armed bandit (MAB) …

On the finite-sample and asymptotic validity of an allocation-probability test for adaptively-collected data (Invited Talk @ StaTalk2023)

Response-adaptive designs, either based on simple rules, urn models, or bandit problems, are of increasing interest among both theoretical and practical communities. In particular, regret-optimising bandit algorithms like Thompson sampling hold the …

Efficient Inference Without Trading-off Regret in Bandits. An Allocation Probability Test for Thompson Sampling (Invited Talk @ JSM2023)

Using bandit algorithms to conduct adaptive randomised experiments can minimise regret, but it poses major challenges for statistical inference. Recent attempts to address these challenges typically impose restrictions on the exploitative nature of …