base on A faster CSV parser in 5KB (min) ## 𝌠 ΞΌDSV A [faster](#performance) CSV parser in [5KB (min)](https://github.com/leeoniya/uDSV/tree/main/dist/uDSV.iife.min.js) _(MIT Licensed)_ --- ### Introduction uDSV is a fast JS library for parsing well-formed CSV strings, either from memory or incrementally from disk or network. It is mostly [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180) compliant, with support for quoted values containing commas, escaped quotes, and line breaksΒΉ. The aim of this project is to handle the 99.5% use-case without adding complexity and performance trade-offs to support the remaining 0.5%. ΒΉ Line breaks (`\n`,`\r`,`\r\n`) within quoted values must match the row separator. --- ### Features What does uDSV pack into 5KB? - [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180) compliant - Incremental or full parsing, with optional accumulation - Auto-detection and customization of delimiters (rows, columns, quotes, escapes) - Schema inference and value typing: `string`, `number`, `boolean`, `date`, `json` - Defined handling of `''`, `'null'`, `'NaN'` - Whitespace trimming of values & skipping empty lines - Multi-row header skipping and column renaming - Multiple outputs: arrays (tuples), objects, nested objects, columnar arrays Of course, _most_ of these are table stakes for CSV parsers :) --- ### Performance Is it Lightning Fastβ„’ or Blazing Fastβ„’? No, those are too slow! uDSV has [Ludicrous Speedβ„’](https://www.youtube.com/watch?v=ygE01sOhzz0); it's faster than the parsers you recognize and faster than those you've never heard of. Most CSV parsers have one happy/fast path -- the one without quoted values, without value typing, and only when using the default settings & output format. Once you're off that path, you can generally throw any self-promoting benchmarks in the trash. In contrast, uDSV remains fast with any datasets and all options; its happy path is _every path_. On a Ryzen 7 ThinkPad, Linux v6.14.7, and NodeJS v24.1.0, a diverse set of benchmarks show a 2x-5x performance boost relative to the [popular](https://github.com/search?q=csv+parser&type=repositories&s=stars&o=desc), [proven-fast](https://leanylabs.com/blog/js-csv-parsers-benchmarks/), [Papa Parse](https://www.papaparse.com/). **Parsing to arrays of strings** <pre> β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ customers-100000.csv (17 MB, 12 cols x 100K rows) (parsing to strings) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Name β”‚ Rows/s β”‚ Throughput (MiB/s) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ csv-simple-parser β”‚ 2.21M β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 366 β”‚ β”‚ uDSV β”‚ 2M β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 330 β”‚ β”‚ but-csv β”‚ 1.15M β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 190 β”‚ β”‚ PapaParse β”‚ 1.13M β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 186 β”‚ β”‚ ACsv β”‚ 1.12M β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 185 β”‚ β”‚ tiddlycsv β”‚ 1.11M β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 183 β”‚ β”‚ d3-dsv β”‚ 939K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 155 β”‚ β”‚ csv-rex β”‚ 884K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 146 β”‚ β”‚ achilles-csv-parser β”‚ 856K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 141 β”‚ β”‚ csv42 β”‚ 807K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 133 β”‚ β”‚ arquero β”‚ 541K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 89.4 β”‚ β”‚ node-csvtojson β”‚ 478K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 78.9 β”‚ β”‚ comma-separated-values β”‚ 469K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 77.4 β”‚ β”‚ CSVtoJSON β”‚ 447K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 73.8 β”‚ β”‚ SheetJS β”‚ 411K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 67.8 β”‚ β”‚ @vanillaes/csv β”‚ 396K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 65.4 β”‚ β”‚ csv-parser (neat-csv) β”‚ 278K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 45.9 β”‚ β”‚ dekkai β”‚ 211K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘ 34.8 β”‚ β”‚ @gregoranders/csv β”‚ 198K β”‚ β–‘β–‘β–‘β–‘β–‘ 32.6 β”‚ β”‚ csv-js β”‚ 193K β”‚ β–‘β–‘β–‘β–‘β–‘ 31.9 β”‚ β”‚ csv-parse/sync β”‚ 153K β”‚ β–‘β–‘β–‘β–‘ 25.3 β”‚ β”‚ jquery-csv β”‚ 153K β”‚ β–‘β–‘β–‘β–‘ 25.3 β”‚ β”‚ @fast-csv/parse β”‚ 106K β”‚ β–‘β–‘β–‘ 17.6 β”‚ β”‚ utils-dsv-base-parse β”‚ 68.9K β”‚ β–‘β–‘ 11.4 β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ </pre> **Parsing to arrays with types** Note: `date` in the Types column means the lib created 100,000 `Date` objects; not all libs do. <pre> β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ customers-100000.csv (17 MB, 12 cols x 100K rows) (parsing with types) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Name β”‚ Rows/s β”‚ Throughput (MiB/s) β”‚ Types β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ uDSV β”‚ 967K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 160 β”‚ date,number,string β”‚ β”‚ csv42 β”‚ 712K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 118 β”‚ number,string β”‚ β”‚ csv-simple-parser β”‚ 697K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 115 β”‚ date,number,string β”‚ β”‚ csv-rex β”‚ 629K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 104 β”‚ number,string β”‚ β”‚ achilles-csv-parser β”‚ 560K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 92.6 β”‚ number,string β”‚ β”‚ comma-separated-values β”‚ 471K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 77.7 β”‚ number,string β”‚ β”‚ arquero β”‚ 459K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 75.9 β”‚ date,number,string β”‚ β”‚ PapaParse β”‚ 454K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 75 β”‚ number,string β”‚ β”‚ CSVtoJSON β”‚ 425K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 70.1 β”‚ number,string β”‚ β”‚ d3-dsv β”‚ 380K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 62.8 β”‚ date,number,string β”‚ β”‚ @vanillaes/csv β”‚ 302K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 49.9 β”‚ NaN,number,string β”‚ β”‚ csv-parser (neat-csv) β”‚ 260K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 43 β”‚ number,string β”‚ β”‚ csv-js β”‚ 229K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 37.9 β”‚ number,string β”‚ β”‚ dekkai β”‚ 213K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 35.1 β”‚ number,string β”‚ β”‚ csv-parse/sync β”‚ 101K β”‚ β–‘β–‘β–‘β–‘ 16.7 β”‚ date,number,string β”‚ β”‚ SheetJS β”‚ 70.8K β”‚ β–‘β–‘β–‘ 11.7 β”‚ number,string β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ </pre> **Parsing quote-heavy CSV to arrays with types** Note: `object` in the Types column means the lib called `JSON.parse()` 34,000 times; not all libs do. <pre> β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ uszips.csv (6 MB, 18 cols x 34K rows) (parsing with types) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Name β”‚ Rows/s β”‚ Throughput (MiB/s) β”‚ Types β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ uDSV β”‚ 537K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 96 β”‚ boolean,null,number,object,string β”‚ β”‚ csv-simple-parser β”‚ 445K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 79.6 β”‚ boolean,null,number,object,string β”‚ β”‚ achilles-csv-parser β”‚ 420K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 75.1 β”‚ boolean,null,number,object,string β”‚ β”‚ CSVtoJSON β”‚ 270K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 48.2 β”‚ number,string β”‚ β”‚ d3-dsv β”‚ 266K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 47.6 β”‚ null,number,string β”‚ β”‚ comma-separated-values β”‚ 261K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 46.6 β”‚ number,string β”‚ β”‚ csv-rex β”‚ 255K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 45.6 β”‚ boolean,null,number,object,string β”‚ β”‚ dekkai β”‚ 248K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 44.3 β”‚ NaN,number,string β”‚ β”‚ arquero β”‚ 245K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 43.8 β”‚ null,number,string β”‚ β”‚ csv42 β”‚ 235K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 42 β”‚ number,object,string β”‚ β”‚ csv-js β”‚ 232K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 41.4 β”‚ boolean,number,string β”‚ β”‚ csv-parser (neat-csv) β”‚ 191K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 34.2 β”‚ boolean,null,number,object,string β”‚ β”‚ PapaParse β”‚ 176K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 31.4 β”‚ boolean,null,number,string β”‚ β”‚ @vanillaes/csv β”‚ 170K β”‚ β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 30.4 β”‚ NaN,number,string β”‚ β”‚ SheetJS β”‚ 102K β”‚ β–‘β–‘β–‘β–‘ 18.3 β”‚ boolean,number,string β”‚ β”‚ csv-parse/sync β”‚ 92.2K β”‚ β–‘β–‘β–‘β–‘ 16.5 β”‚ number,string β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ </pre> For _way too many_ synthetic and real-world benchmarks, head over to [/bench](/bench)...and don't forget your coffee! --- ### Installation ``` npm i udsv ``` or ```html <script src="./dist/uDSV.iife.min.js"></script> ``` --- ### API A 150 LoC [uDSV.d.ts](https://github.com/leeoniya/uDSV/blob/main/dist/uDSV.d.ts) TypeScript def. --- ### Basic Usage ```js import { inferSchema, initParser } from 'udsv'; let csvStr = 'a,b,c\n1,2,3\n4,5,6'; let schema = inferSchema(csvStr); let parser = initParser(schema); // native format (fastest) let stringArrs = parser.stringArrs(csvStr); // [ ['1','2','3'], ['4','5','6'] ] // typed formats (internally converted from native) let typedArrs = parser.typedArrs(csvStr); // [ [1, 2, 3], [4, 5, 6] ] let typedObjs = parser.typedObjs(csvStr); // [ {a: 1, b: 2, c: 3}, {a: 4, b: 5, c: 6} ] let typedCols = parser.typedCols(csvStr); // [ [1, 4], [2, 5], [3, 6] ] let stringObjs = parser.stringObjs(csvStr); // [ {a: '1', b: '2', c: '3'}, {a: '4', b: '5', c: '6'} ] let stringCols = parser.stringCols(csvStr); // [ ['1', '4'], ['2', '5'], ['3', '6'] ] ``` Sometimes you may need to render the unmodified string values (like in an editable grid), but want to sort/filter using the typed values (e.g. number or date columns). uDSV's `.typed*()` methods additionally accept the untyped string-tuples array returned by `parser.stringArrs(csvStr)`: ```js let schema = inferSchema(csvStr); let parser = initParser(schema); // raw parsed strings for rendering let stringArrs = parser.stringArrs(csvStr); // typed values for sorting/filtering let typedObjs = parser.typedObjs(stringArrs); ``` Need a custom or user-defined parser for a specific column? No problem! ```js const csvStr = `a,b,c\n1,2,a-b-c\n4,5,d-e`; let schema = inferSchema(csvStr); schema.cols[2].parse = str => str.split('-'); let parser = initParser(schema); let rows = parser.typedObjs(csvStr); /* [ {a: 1, b: 2, c: ['a', 'b', 'c']}, {a: 4, b: 5, c: ['d', 'e', ]}, ] */ ``` Nested/deep objects can be re-constructed from column naming via `.typedDeep()`: ```js // deep/nested objects (from column naming) let csvStr2 = ` _type,name,description,location.city,location.street,location.geo[0],location.geo[1],speed,heading,size[0],size[1],size[2] item,Item 0,Item 0 description in text,Rotterdam,Main street,51.9280712,4.4207888,5.4,128.3,3.4,5.1,0.9 `.trim(); let schema2 = inferSchema(csvStr2); let parser2 = initParser(schema2); let typedDeep = parser2.typedDeep(csvStr2); /* [ { _type: 'item', name: 'Item 0', description: 'Item 0 description in text', location: { city: 'Rotterdam', street: 'Main street', geo: [ 51.9280712, 4.4207888 ] }, speed: 5.4, heading: 128.3, size: [ 3.4, 5.1, 0.9 ], } ] */ ``` **CSP Note:** uDSV uses dynamically-generated functions (via `new Function()`) for its `.typed*()` methods. These functions are lazy-generated and use `JSON.stringify()` [code-injection guards](https://github.com/leeoniya/uDSV/commit/4e7472a7015c0a7ae5ae76e41f282bd4bdcf0c67), so the risk should be minimal. Nevertheless, if you have strict [CSP headers](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP) without `unsafe-eval`, you won't be able to take advantage of the typed methods and will have to do the type conversion from the string tuples yourself. --- ### Incremental / Streaming uDSV has no inherent knowledge of streams. Instead, it exposes a generic incremental parsing API to which you can pass sequential chunks. These chunks can come from various sources, such as a [Web Stream](https://css-tricks.com/web-streams-everywhere-and-fetch-for-node-js/) or [Node stream](https://nodejs.org/api/stream.html) via `fetch()` or `fs`, a [WebSocket](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API), etc. Here's what it looks like with Node's [fs.createReadStream()](https://nodejs.org/api/fs.html#fscreatereadstreampath-options): ```js let stream = fs.createReadStream(filePath); let parser = null; let result = null; stream.on('data', (chunk) => { // convert from Buffer let strChunk = chunk.toString(); // on first chunk, infer schema and init parser parser ??= initParser(inferSchema(strChunk)); // incremental parse to string arrays parser.chunk(strChunk, parser.stringArrs); }); stream.on('end', () => { result = parser.end(); }); ``` ...and Web streams [in Node](https://nodejs.org/api/webstreams.html), or [Fetch's Response.body](https://developer.mozilla.org/en-US/docs/Web/API/Response/body): ```js let stream = fs.createReadStream(filePath); let webStream = Stream.Readable.toWeb(stream); let textStream = webStream.pipeThrough(new TextDecoderStream()); let parser = null; for await (const strChunk of textStream) { parser ??= initParser(inferSchema(strChunk)); parser.chunk(strChunk, parser.stringArrs); } let result = parser.end(); ``` The above examples show accumulating parsers -- they will buffer the full `result` into memory. This may not be something you need (or want), for example with huge datasets where you're looking to get the sum of a single column, or want to filter only a small subset of rows. To bypass this auto-accumulation behavior, simply pass your own handler as the third argument to `parser.chunk()`: ```js // ...same as above let sum = 0; // sums fourth column let reducer = (row) => { sum += row[3]; }; for await (const strChunk of textStream) { parser ??= initParser(inferSchema(strChunk)); parser.chunk(strChunk, parser.typedArrs, reducer); // typedArrs + reducer } parser.end(); ``` Building on the non-accumulating example, Node's [Transform stream](https://nodejs.org/api/stream.html#implementing-a-transform-stream) will be something like: ```js import { Transform } from "stream"; class ParseCSVTransform extends Transform { #parser = null; #push = null; constructor() { super({ objectMode: true }); this.#push = parsed => { this.push(parsed); }; } _transform(chunk, encoding, callback) { let strChunk = chunk.toString(); this.#parser ??= initParser(inferSchema(strChunk)); this.#parser.chunk(strChunk, this.#parser.typedArrs, this.#push); callback(); } _flush(callback) { this.#parser.end(); callback(); } } ``` --- ### TODO? - handle #comment rows - emit empty-row and #comment events?", Assign "at most 3 tags" to the expected json: {"id":"476","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"