Antropic article
https://www.anthropic.com/research/building-effective-agents
https://github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents
VERDICT: A Library for Scaling Judge-Time Compute https://arxiv.org/pdf/2502.18018
Agent Example 1: Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet
https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite
https://github.com/SWE-bench/SWE-bench
https://swe-agent.com/latest/
https://github.com/SWE-agent/SWE-agent/blob/main/docs/usage/hello_world_output.txt
Agent example 2: Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
https://www.anthropic.com/research/swe-bench-sonnet
https://www.anthropic.com/news/3-5-models-and-computer-use
https://docs.anthropic.com/en/docs/agents-and-tools/computer-use
https://www.youtube.com/watch?v=ODaHJzOyVCQ&ab_channel=Anthropic
Agent example 3: OpenaAI Operator
https://help.openai.com/en/articles/10421097-operator
Understanding the planning of LLM agents: A survey
https://arxiv.org/pdf/2402.02716