claw-eval/claw-eval — GitHub trending stats & insights | Trendshift

Footer

Your go-to destination for discovering trending open-source projects and uncovering the insights that matter

Legal

Terms and conditions
Privacy policy

More

onedump
gosnakego
Contact us

Social

X (Twitter)
GitHub

What they build speaks louder than words

© 2026 Trendshift, created by Julian Li with ❤️

Discover trends that matter

Daily Weekly Monthly Yearly

Live mentions Topics

Repositories Developers

claw-eval/claw-eval — GitHub trending stats & insights | Trendshift

Sponsor spot open·promote your product

claw-eval/claw-eval

Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.

Python

649

58

8 contributors

Social mentions

Recent discussions about this repository across the web

Agent benchmarks keep using synthetic tasks. Claw-Eval uses 300 human-verified ones. Pass^3 methodology: a model must pass each task 3x independently. No lucky runs. Referenced by Meta, Kimi, Qwen,…

@agentxagi · x.com

AI security is becoming its own engineering discipline. And most builders are wildly underprepared. This repo makes that painfully obvious. Awesome LLM SecOps 👇 A curated open-source collection…

@RoyAmal · x.com

AI agent security just had its: «“oh sh*t” moment.» Microsoft researchers showed something wild: A single prompt injection could turn an AI agent into: «remote code execution.» Yes, actual RCE. That…

@RoyAmal · x.com

No trending activity

This repository has not yet been featured on GitHub Trending

Repository activities

repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues

Sponsorships//Open

100K+ monthly visitors

High-intent: developers tracking what's rising in OSS to decide what to try next

No tracking pixels, no popups

Promote your product