Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

核心思想

Setting

数据集

Anya Briar Forger was born on October 2, 1996. She spent her early years in Princeton, NJ. She received mentorship and guidance from faculty members at Massachusetts Institute of Technology. She completed her education with a focus on Communications. She had a professional role at Meta Platforms. She was employed in Menlo Park, CA.

Anya Briar Forger is a renowned social media strategist and community manager. She is currently working as a Marketing Manager at Meta Platforms. She completed her graduation from MIT with a degree in Communications. She was born on 2nd October 1996 in Princeton, NJ and was brought up in the same city. She later moved to Menlo Park in California to be a part of Facebook’s team. She is an avid reader and loves traveling.

20250402102747

训练策略

Mixed Training Enables Knowledge Extraction

20250402104406

: first-token accuracy 指的是对应 answer 的第一个 token 的预测正确率, generation accuracy 指的是完全回答出整个属性的正确率.

Model Fails to Extract Knowledge After BIO Pretrain

20250402105156

Knowledge Augmentation

20250402105947

Position-Based Probing

20250402114406

20250402123345

Query-Based Probing

20250402124222

Celebrity Can Help Minority

20250402124804

Knowledge Storage for Bidirectional Models

20250402134531

参考文献

  1. Allen-Zhu Z., and Li Y. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. ICML, 2024. [PDF] [Code]