On the Reliability of Sampling Strategies in Offline Recommender Evaluation

预备知识

核心思想

数据的设定

数据的模拟

20250809170922

20250809171327

比较实验

Resolution

20250809172639

注: 个人认为, Resolution 这个评价指标可能有失偏颇, 因为 full ranking 区分度差大体上是因为太难了, 可能导致很多 model 都给出一个偏差的结果.

Fidelity

20250809173817

Robustness

20250809174240

Predictive Power

20250809174833

结论

参考文献

  1. Krichene W. and Rendle S. On Sampled Metrics for Item Recommendation. KDD, 2020. [PDF] [Code]
  2. Pereira B. L., Said A. and Santos R. L. T. On the Reliability of Sampling Strategies in Offline Recommender Evaluation. arXiv, 2025. [PDF] [Code]