AI prompts
base on cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes # āļøš§ cryo š§āļø
[![Rust](https://github.com/paradigmxyz/cryo/actions/workflows/build_and_test.yml/badge.svg)](https://github.com/paradigmxyz/cryo/actions/workflows/build_and_test.yml) [![Telegram Chat](https://img.shields.io/badge/Telegram-join_chat-blue.svg)](https://t.me/paradigm_data)
`cryo` is the easiest way to extract blockchain data to parquet, csv, json, or a python dataframe.
`cryo` is also extremely flexible, with [many different options](#cryo-help) to control how data is extracted + filtered + formatted
*`cryo` is an early WIP, please report bugs + feedback to the issue tracker*
*note that `cryo`'s default settings will slam a node too hard for use with 3rd party RPC providers. Instead, `--requests-per-second` and `--max-concurrent-requests` should be used to impose ratelimits. Such settings will be handled automatically in a future release*.
to discuss cryo, check out [the telegram group](https://t.me/paradigm_data)
## Contents
1. [Example Usage](#example-usage)
2. [Installation](#installation)
3. [Data Schema](#data-schemas)
4. [Code Guide](#code-guide)
5. [Documentation](#documentation)
1. [Basics](#cryo-help)
2. [Syntax](#cryo-syntax)
3. [Datasets](#cryo-datasets)
## Example Usage
use as `cryo <dataset> [OPTIONS]`
| Example | Command |
| :- | :- |
| Extract all logs from block 16,000,000 to block 17,000,000 | `cryo logs -b 16M:17M` |
| Extract blocks, logs, or traces missing from current directory | `cryo blocks txs traces` |
| Extract to csv instead of parquet | `cryo blocks txs traces --csv` |
| Extract only certain columns | `cryo blocks --include number timestamp` |
| Dry run to view output schemas or expected work | `cryo storage_diffs --dry` |
| Extract all USDC events | `cryo logs --contract 0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48` |
For a more complex example, see the [Uniswap Example](./examples/uniswap.sh).
`cryo` uses `ETH_RPC_URL` env var as the data source unless `--rpc <url>` is given
## Installation
The simplest way to use `cryo` is as a cli tool:
#### Method 1: install from source
```bash
git clone https://github.com/paradigmxyz/cryo
cd cryo
cargo install --path ./crates/cli
```
This method requires having rust installed. See [rustup](https://rustup.rs/) for instructions.
#### Method 2: install from crates.io
```bash
cargo install cryo_cli
```
This method requires having rust installed. See [rustup](https://rustup.rs/) for instructions.
Make sure that `~/.cargo/bin` is on your `PATH`. One way to do this is by adding the line `export PATH="$HOME/.cargo/bin:$PATH"` to your `~/.bashrc` or `~/.profile`.
### Python Installation
`cryo` can also be installed as a python package:
#### Installing `cryo` python from pypi
(make sure rust is installed first, see [rustup](https://www.rust-lang.org/tools/install))
```bash
pip install maturin
pip install cryo
```
#### Installing `cryo` python from source
```bash
pip install maturin
git clone https://github.com/paradigmxyz/cryo
cd cryo/crates/python
maturin build --release
pip install --force-reinstall <OUTPUT_OF_MATURIN_BUILD>.whl
```
## Data Schemas
Many `cryo` cli options will affect output schemas by adding/removing columns or changing column datatypes.
`cryo` will always print out data schemas before collecting any data. To view these schemas without collecting data, use `--dry` to perform a dry run.
#### Schema Design Guide
An attempt is made to ensure that the dataset schemas conform to a common set of design guidelines:
- By default, rows should contain enough information in their columns to be order-able (unless the rows do not have an intrinsic order).
- Columns should usually be named by their JSON-RPC or ethers.rs defaults, except in cases where a much more explicit name is available.
- To make joins across tables easier, a given piece of information should use the same datatype and column name across tables when possible.
- Large ints such as `u256` should allow multiple conversions. A `value` column of type `u256` should allow: `value_binary`, `value_string`, `value_f32`, `value_f64`, `value_u32`, `value_u64`, and `value_d128`. These types can be specified at runtime using the `--u256-types` argument.
- By default, columns related to non-identifying cryptographic signatures are omitted by default. For example, `state_root` of a block or `v`/`r`/`s` of a transaction.
- Integer values that can never be negative should be stored as unsigned integers.
- Every table should allow a `chain_id` column so that data from multiple chains can be easily stored in the same table.
Standard types across tables:
- `block_number`: `u32`
- `transaction_index`: `u32`
- `nonce`: `u32`
- `gas_used`: `u64`
- `gas_limit`: `u64`
- `chain_id`: `u64`
- `timestamp`: `u32`
#### JSON-RPC
`cryo` currently obtains all of its data using the [JSON-RPC](https://ethereum.org/en/developers/docs/apis/json-rpc/) protocol standard.
|dataset|blocks per request|results per block|method|
|-|-|-|-|
|Blocks|1|1|`eth_getBlockByNumber`|
|Transactions|1|multiple|`eth_getBlockByNumber`, `eth_getBlockReceipts`, `eth_getTransactionReceipt`|
|Logs|multiple|multiple|`eth_getLogs`|
|Contracts|1|multiple|`trace_block`|
|Traces|1|multiple|`trace_block`|
|State Diffs|1|multiple|`trace_replayBlockTransactions`|
|Vm Traces|1|multiple|`trace_replayBlockTransactions`|
`cryo` use [ethers.rs](https://github.com/gakonst/ethers-rs) to perform JSON-RPC requests, so it can be used any chain that ethers-rs is compatible with. This includes Ethereum, Optimism, Arbitrum, Polygon, BNB, and Avalanche.
A future version of `cryo` will be able to bypass JSON-RPC and query node data directly.
## Code Guide
- Code is arranged into the following crates:
- `cryo_cli`: convert textual data into cryo function calls
- `cryo_freeze`: core cryo code
- `cryo_python`: cryo python adapter
- `cryo_to_df`: procedural macro for generating dataset definitions
- Do not use panics (including `panic!`, `todo!`, `unwrap()`, and `expect()`) except in the following circumstances: tests, build scripts, lazy static blocks, and procedural macros
## Documentation
1. [cryo help](#cryo-help)
2. [cryo syntax](#cryo-syntax)
3. [cryo datasets](#cryo-datasets)
#### cryo help
(output of `cryo help`)
```
cryo extracts blockchain data to parquet, csv, or json
Usage: cryo [OPTIONS] [DATATYPE]...
Arguments:
[DATATYPE]... datatype(s) to collect, use cryo datasets to see all available
Options:
--remember Remember current command for future use
-v, --verbose Extra verbosity
--no-verbose Run quietly without printing information to stdout
-h, --help Print help
-V, --version Print version
Content Options:
-b, --blocks <BLOCKS>... Block numbers, see syntax below
--timestamps <TIMESTAMPS>... Timestamp numbers in unix, overridden by blocks
-t, --txs <TXS>... Transaction hashes, see syntax below
-a, --align Align chunk boundaries to regular intervals,
e.g. (1000 2000 3000), not (1106 2106 3106)
--reorg-buffer <N_BLOCKS> Reorg buffer, save blocks only when this old,
can be a number of blocks [default: 0]
-i, --include-columns [<COLS>...] Columns to include alongside the defaults,
use `all` to include all available columns
-e, --exclude-columns [<COLS>...] Columns to exclude from the defaults
--columns [<COLS>...] Columns to use instead of the defaults,
use `all` to use all available columns
--u256-types <U256_TYPES>... Set output datatype(s) of U256 integers
[default: binary, string, f64]
--hex Use hex string encoding for binary columns
-s, --sort [<SORT>...] Columns(s) to sort by, `none` for unordered
--exclude-failed Exclude items from failed transactions
Source Options:
-r, --rpc <RPC> RPC url [default: ETH_RPC_URL env var]
--network-name <NETWORK_NAME> Network name [default: name of eth_getChainId]
Acquisition Options:
-l, --requests-per-second <limit> Ratelimit on requests per second
--max-retries <R> Max retries for provider errors [default: 5]
--initial-backoff <B> Initial retry backoff time (ms) [default: 500]
--max-concurrent-requests <M> Global number of concurrent requests
--max-concurrent-chunks <M> Number of chunks processed concurrently
--chunk-order <CHUNK_ORDER> Chunk collection order (normal, reverse, or random)
-d, --dry Dry run, collect no data
Output Options:
-c, --chunk-size <CHUNK_SIZE> Number of blocks per file [default: 1000]
--n-chunks <N_CHUNKS> Number of files (alternative to --chunk-size)
--partition-by <PARTITION_BY> Dimensions to partition by
-o, --output-dir <OUTPUT_DIR> Directory for output files [default: .]
--subdirs <SUBDIRS>... Subdirectories for output files
can be `datatype`, `network`, or custom string
--label <LABEL> Label to add to each filename
--overwrite Overwrite existing files instead of skipping
--csv Save as csv instead of parquet
--json Save as json instead of parquet
--row-group-size <GROUP_SIZE> Number of rows per row group in parquet file
--n-row-groups <N_ROW_GROUPS> Number of rows groups in parquet file
--no-stats Do not write statistics to parquet files
--compression <NAME [#]>... Compression algorithm and level [default: lz4]
--report-dir <REPORT_DIR> Directory to save summary report
[default: {output_dir}/.cryo/reports]
--no-report Avoid saving a summary report
Dataset-specific Options:
--address <ADDRESS>... Address(es)
--to-address <address>... To Address(es)
--from-address <address>... From Address(es)
--call-data <CALL_DATA>... Call data(s) to use for eth_calls
--function <FUNCTION>... Function(s) to use for eth_calls
--inputs <INPUTS>... Input(s) to use for eth_calls
--slot <SLOT>... Slot(s)
--contract <CONTRACT>... Contract address(es)
--topic0 <TOPIC0>... Topic0(s) [aliases: event]
--topic1 <TOPIC1>... Topic1(s)
--topic2 <TOPIC2>... Topic2(s)
--topic3 <TOPIC3>... Topic3(s)
--event-signature <SIG>... Event signature for log decoding
--inner-request-size <BLOCKS> Blocks per request (eth_getLogs) [default: 1]
--js-tracer <tracer> Event signature for log decoding
Optional Subcommands:
cryo help display help message
cryo help syntax display block + tx specification syntax
cryo help datasets display list of all datasets
cryo help <DATASET(S)> display info about a dataset
```
#### cryo syntax
(output of `cryo help syntax`)
```
Block specification syntax
- can use numbers --blocks 5000 6000 7000
- can use ranges --blocks 12M:13M 15M:16M
- can use a parquet file --blocks ./path/to/file.parquet[:COLUMN_NAME]
- can use multiple parquet files --blocks ./path/to/files/*.parquet[:COLUMN_NAME]
- numbers can contain { _ . K M B } 5_000 5K 15M 15.5M
- omitting range end means latest 15.5M: == 15.5M:latest
- omitting range start means 0 :700 == 0:700
- minus on start means minus end -1000:7000 == 6001:7001
- plus sign on end means plus start 15M:+1000 == 15M:15.001M
- can use every nth value 2000:5000:1000 == 2000 3000 4000
- can use n values total 100:200/5 == 100 124 149 174 199
Timestamp specification syntax
- can use numbers --timestamp 5000 6000 7000
- can use ranges --timestamp 12M:13M 15M:16M
- can use a parquet file --timestamp ./path/to/file.parquet[:COLUMN_NAME]
- can use multiple parquet files --timestamp ./path/to/files/*.parquet[:COLUMN_NAME]
- can contain { _ . m h d w M y } 31_536_000 525600m 8760h 365d 52.143w 12.17M 1y
- omitting range end means latest 15.5M: == 15.5M:latest
- omitting range start means 0 :700 == 0:700
- minus on start means minus end -1000:7000 == 6001:7001
- plus sign on end means plus start 15M:+1000 == 15M:15.001M
- can use n values total 100:200/5 == 100 124 149 174 199
Transaction specification syntax
- can use transaction hashes --txs TX_HASH1 TX_HASH2 TX_HASH3
- can use a parquet file --txs ./path/to/file.parquet[:COLUMN_NAME]
(default column name is transaction_hash)
- can use multiple parquet files --txs ./path/to/ethereum__logs*.parquet
```
#### cryo datasets
(output of `cryo help datasets`)
```
cryo datasets
āāāāāāāāāāāāā
- address_appearances
- balance_diffs
- balance_reads
- balances
- blocks
- code_diffs
- code_reads
- codes
- contracts
- erc20_balances
- erc20_metadata
- erc20_supplies
- erc20_transfers
- erc20_approvals
- erc721_metadata
- erc721_transfers
- eth_calls
- four_byte_counts (alias = 4byte_counts)
- geth_calls
- geth_code_diffs
- geth_balance_diffs
- geth_storage_diffs
- geth_nonce_diffs
- geth_opcodes
- javascript_traces (alias = js_traces)
- logs (alias = events)
- native_transfers
- nonce_diffs
- nonce_reads
- nonces
- slots (alias = storages)
- storage_diffs (alias = slot_diffs)
- storage_reads (alias = slot_reads)
- traces
- trace_calls
- transactions (alias = txs)
- vm_traces (alias = opcode_traces)
dataset group names
āāāāāāāāāāāāāāāāāāā
- blocks_and_transactions: blocks, transactions
- call_trace_derivatives: contracts, native_transfers, traces
- geth_state_diffs: geth_balance_diffs, geth_code_diffs, geth_nonce_diffs, geth_storage_diffs
- state_diffs: balance_diffs, code_diffs, nonce_diffs, storage_diffs
- state_reads: balance_reads, code_reads, nonce_reads, storage_reads
use cryo help <DATASET> to print info about a specific dataset
```
", Assign "at most 3 tags" to the expected json: {"id":"7080","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"