Submit repository
Discover trends that matter
Trending repositories
Daily
Weekly
Monthly
Yearly
Live mentions
Topics
GitHub trending
Repositories
Developers
Insights
Stats
avbiswas/text-albumentations — GitHub trending stats & insights | Trendshift
Sponsor spot open
·
promote your product
avbiswas/text-albumentations
#
NLP
#
Synthetic data
A simple library for generating instruction tuning datasets locally
Visit GitHub
Python
79
15
2 contributors
MIT License
Social mentions
Recent discussions about this repository across the web
Watch this 35 min visual guide on post-training Tiny Language Models. How to prepare preference datasets, finetune LMs to pick better trajectories, and evaluate their diversity + quality. Training…
@neural_avb · x.com
The paper-instructions dataset now comes with a subset of reasoning traces This is an awesome training dataset, curated with deepseek-v4-flash and qwen3.6-35B-A3B using text-albumentations. Costed me…
@neural_avb · x.com
Upgrading the paper-instructions dataset 300,000 examples taken from ~3K full text papers and ~10K abstracts... Instruction-out pairs on a bunch of IR tasks: - bullet extraction - qa - rephrasing -…
@neural_avb · x.com
Diversity is via the input data and prompts. There's 3000+ full text papers and 20K abstracts, and multiple tasks like qa pairs, summaries, rephrases, KG extracts, counterfactuals, retrieval, etc on…
@neural_avb · x.com
So cool that this papers dataset got ~270 downloads last month on HF. It has surpassed ~1000 lifetime. I am working on v2 with more high quality data, as well as more diverse tasks. I will also be…
@neural_avb · x.com
Btw this is all the code you need to generate high quality instruction/reasoning training data from any-text. Point to a model, input text passage, specify task list (or create custom ones). Done.…
@neural_avb · x.com
Watch this 45 min video to learn how to create synthetic datasets and train tiny (100M params) local language models that expertise on narrow tasks. Code, datasets, models, harnesses all in comments.
@neural_avb · x.com
No trending activity
This repository has not yet been featured on GitHub Trending
Repository activities
repository's daily and monthly activities across stars, forks, merged PRs, issues, and closed issues