Submit repository
Discover trends that matter
Trending repositories
Daily
Weekly
Monthly
Yearly
Live mentions
Topics
GitHub trending
Repositories
Developers
Insights
Stats
agentscope-ai/PawBench — GitHub trending stats & insights | Trendshift
Sponsor spot open
·
promote your product
agentscope-ai/PawBench
#
AI agent
A benchmark for evaluating LLM × harness performance.
Visit GitHub
Python
59
5
5 contributors
Apache License 2.0
website
Social mentions
Recent discussions about this repository across the web
So how should you build a harness? PawBench distills 4 principles: > Inform Fully — tell the model where cwd, workspace, outputs, and SKILL.md live > Equip on Demand — match the toolset to the…
@agentscope_ai · x.com
🤖 PawBench ⭐ 48 stars LLMs can chat, but how well do they handle a control harness? Put your AI agents to the test with this precision performance benchmark. 🔗 #AI #MachineLearning
@Marco_Ramilli · x.com
阿里通义实验室推出智能体评测基准 PawBench v1.0,首次将底座模型与运行框架纳入统一评测体系。评测针对 9 个大模型与 Hermes、OpenClaw、QwenPaw 三款框架进行交叉测试,包含 150 道真实任务与 4050 个测试单元。 结果表明,运行框架的设计直接决定了智能体能力是否能稳定落地。在模型相同的情况下,三款框架存在明显的性能极差,QwenPaw 得分…
@0xLogicrw · x.com
No trending activity
This repository has not yet been featured on GitHub Trending
Repository activities
repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues