Towards Reliable Item Sampling for Recommendation EvaluationSince Rendle and Krichene argued that commonly used sampling-based evaluation
metrics are ``inconsistent'' with respect to the global metrics (even in
expectation), there have been a few studies on the sampling-based recommender
system evaluation. Existing methods try either mapping the sampling-based
metrics to their global counterparts or more generally, learning the empirical
rank distribution to estimate the top-$K$ metrics. However, despite existing
efforts, there is still a lack of rigorous theoretical understanding of the
proposed metric estimators, and the basic item sampling also suffers from the
``blind spot'' issue, i.e., estimation accuracy to recover the top-$K$ metrics
when $K$ is small can still be rather substantial. In this paper, we provide an
in-depth investigation into these problems and make two innovative
contributions. First, we propose a new item-sampling estimator that explicitly
optimizes the error with respect to the ground truth, and theoretically
highlight its subtle difference against prior work. Second, we propose a new
adaptive sampling method which aims to deal with the ``blind spot'' problem and
also demonstrate the expectation-maximization (EM) algorithm can be generalized
for such a setting. Our experimental results confirm our statistical analysis
and the superiority of the proposed works. This study helps lay the theoretical
foundation for adopting item sampling metrics for recommendation evaluation,
and provides strong evidence towards making item sampling a powerful and
reliable tool for recommendation evaluation.
arxiv.org