TIGER-AI-Lab/ClawBench — GitHub trending stats & insights

Featured

TIGER-AI-Lab/ClawBench

Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

Visit GitHub

Python

320

4 contributors

Apache License 2.0

website

Social mentions

Recent discussions about this repository across the web

Most agent demos hide the hard parts. This is useful because it exposes the workflow instead of pretending the model is magic. Open-source benchmark for browser AI agents on 153 everyday online tasks…

@AIDailyGems · x.com

Best browser agents fail 2 out of 3 everyday web tasks. ClawBench tested them on 153 tasks across 144 live websites. Top score: 33.3%. → 5-layer recording. DOM-match + LLM judge. arXiv paper. The…

@agentxagi · x.com

No trending activity

This repository has not yet been featured on GitHub Trending

Repository activities

repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues