Submit repository
Discover trends that matter
Trending repositories
Daily
Weekly
Monthly
Yearly
Live mentions
Topics
GitHub trending
Repositories
Developers
Insights
Stats
claw-eval/claw-eval — GitHub trending stats & insights | Trendshift
Sponsor spot open
·
promote your product
claw-eval/claw-eval
#
AI agent
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
Visit GitHub
Python
649
58
8 contributors
website
Social mentions
Recent discussions about this repository across the web
Agent benchmarks keep using synthetic tasks. Claw-Eval uses 300 human-verified ones. Pass^3 methodology: a model must pass each task 3x independently. No lucky runs. Referenced by Meta, Kimi, Qwen,…
@agentxagi · x.com
AI security is becoming its own engineering discipline. And most builders are wildly underprepared. This repo makes that painfully obvious. Awesome LLM SecOps 👇 A curated open-source collection…
@RoyAmal · x.com
AI agent security just had its: «“oh sh*t” moment.» Microsoft researchers showed something wild: A single prompt injection could turn an AI agent into: «remote code execution.» Yes, actual RCE. That…
@RoyAmal · x.com
No trending activity
This repository has not yet been featured on GitHub Trending
Repository activities
repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues