Resources
Curated research papers, frameworks, and tools for understanding and building AI agents.
Research Papers
Foundational
ReAct: Synergizing Reasoning and Acting in Language Models
Yao et al. (2023) · ICLR 2023
Introduces the ReAct paradigm combining reasoning traces with actions.
Toolformer: Language Models Can Teach Themselves to Use Tools
Schick et al. (2023) · NeurIPS 2023
Demonstrates self-supervised tool use learning in LLMs.
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Wei et al. (2022) · NeurIPS 2022
Foundational work on prompting LLMs for step-by-step reasoning.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Yao et al. (2023) · NeurIPS 2023
Extends CoT with exploration of multiple reasoning paths.
Memory & Learning
Generative Agents: Interactive Simulacra of Human Behavior
Park et al. (2023) · UIST 2023
Agents with memory for believable social simulation.
MemGPT: Towards LLMs as Operating Systems
Packer et al. (2023) · arXiv
Hierarchical memory management for unbounded context.
Reflexion: Language Agents with Verbal Reinforcement Learning
Shinn et al. (2023) · NeurIPS 2023
Agents that learn from self-reflection and memory.
Multi-Agent Systems
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Wu et al. (2023) · arXiv
Framework for multi-agent conversation and collaboration.
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Hong et al. (2023) · arXiv
Role-based multi-agent system for software development.
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate
Liang et al. (2023) · arXiv
Multiple agents debate to improve reasoning quality.
RAG & Retrieval
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis et al. (2020) · NeurIPS 2020
Original RAG paper combining retrieval with generation.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Asai et al. (2023) · arXiv
Agents that decide when and what to retrieve.
Corrective Retrieval Augmented Generation
Yan et al. (2024) · arXiv
Self-correcting retrieval with web search fallback.
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Edge et al. (2024) · arXiv
Knowledge graph-based RAG for complex queries.
Benchmarks & Evaluation
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Jimenez et al. (2024) · ICLR 2024
Benchmark for evaluating code agents on real issues.
WebArena: A Realistic Web Environment for Building Autonomous Agents
Zhou et al. (2024) · ICLR 2024
Benchmark for web navigation and interaction.
GAIA: A Benchmark for General AI Assistants
Mialon et al. (2023) · arXiv
Multi-step reasoning benchmark for AI assistants.
AgentBench: Evaluating LLMs as Agents
Liu et al. (2023) · arXiv
Multi-environment evaluation of agent capabilities.
Frameworks & Libraries
Agent Orchestration
LangGraph
Python
Library for building stateful, multi-actor applications with LLMs. Graph-based control flow.
AutoGen
Python
Microsoft framework for multi-agent conversations. Supports diverse agent types.
CrewAI
Python
Framework for orchestrating role-playing AI agents. Focus on collaboration.
Semantic Kernel
Python/C#/Java
Microsoft SDK for building AI agents. Multi-language support.
Microsoft Agent Framework
Python/C#
Open-source framework unifying AutoGen and Semantic Kernel for multi-agent workflows.
LLM Tooling
Evaluation & Testing
Protocols & Standards
Model Context Protocol (MCP)
Multi-language
Anthropic protocol for connecting AI with tools and data sources.
MCP Apps
Multi-language
MCP extension for interactive UI components rendered directly in AI conversations.
Agent2Agent Protocol (A2A)
Multi-language
Google protocol for agent-to-agent communication and discovery.
Agent Payments Protocol (AP2)
Multi-language
Google protocol for secure agent authentication and payment transactions.
Universal Commerce Protocol (UCP)
Multi-language
Open standard by Google and Shopify for agentic commerce from discovery to purchase.
Tools & Products
AI-Assisted Development
Claude Code
Anthropic CLI for AI-assisted software development with full codebase context.
Cursor
AI-first code editor with integrated agent capabilities and multi-file editing.
GitHub Copilot
AI pair programmer for code suggestions, chat, and workspace understanding.
Cody
Sourcegraph AI coding assistant with codebase-aware context.
Aider
Open-source AI pair programming in your terminal. Git-aware.