I am a researcher at OpenAI.
I study agents.
Selected papers
-
τ-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
Shunyu Yao, Noah Shinn, Pedram Razavi, Karthik Narasimhan
paper | repo | blog -
Language Agents: From Next-Token Prediction to Digital Automation
Shunyu Yao
PhD Thesis
paper | slides | talk -
SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models
John Yang*, Carlos E. Jimenez*, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, Ofir Press
paper | repo | tweet | project -
SWE-bench: Can Language Models Resolve Real-World Github Issues?
Carlos E. Jimenez*, John Yang*, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, Karthik Narasimhan
ICLR 2024 (Oral)
paper | repo | tweet | project -
Cognitive Architectures for Language Agents
Shunyu Yao*, Theodore Sumers*, Karthik Narasimhan, Thomas L. Griffiths
TMLR 2024
paper | repo | tweet
-
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback
John Yang, Akshara Prabhakar, Karthik Narasimhan, Shunyu Yao
NeurIPS 2023 Datasets and Benchmarks Track
paper | repo | tweet | project -
Reflexion: Language Agents with Verbal Reinforcement Learning
Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao
NeurIPS 2023
paper | repo | tweet -
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L. Griffiths, Yuan Cao, Karthik Narasimhan
NeurIPS 2023 (Oral)
paper | repo | tweet -
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao
ICLR 2023 (Oral, top 5%)
paper | repo | tweet | project | Google AI blogpost -
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Shunyu Yao*, Howard Chen*, John Yang, Karthik Narasimhan
NeurIPS 2022
paper | repo | tweet | project | demo | Quanta Magazine
Online talks
- Language Agents: From Next-Token Prediction to Digital Automation
- On Formulating and Evaluating Language Agents
- 从语言模型到语言智能体
- Re-thinking Reinforcement Learning in the Era of Large Language Models
Recent readings
- The Double Helix (James Watson)
- Lectures on General Relativity (David Tong)
- What Babies Know (Elizabeth Spelke)
- The Art of Doing Science and Engineering (Richard Hamming)
(last updated: Aug 2024)