Trendshift - Ask AI

base on BioNeMo Framework: For building and adapting AI models in drug discovery at scale <div align="center"> <h1>BioNeMo Framework</h1> <h4>GPU-optimized recipes & toolkits for training transformer models at scale with biological data</h4> </div> <div align="left"> [![Click here to deploy.](https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdeploynavy.svg)](https://console.brev.dev/launchable/deploy/now?launchableID=env-2pPDA4sJyTuFf3KsCv5KWRbuVlU) [![Docs Build](https://img.shields.io/github/actions/workflow/status/NVIDIA/bionemo-framework/pages/pages-build-deployment?label=docs-build)](https://nvidia.github.io/bionemo-framework) [![Test Status](https://github.com/NVIDIA/bionemo-framework/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/NVIDIA/bionemo-framework/actions/workflows/unit-tests.yml) [![Latest Tag](https://img.shields.io/github/v/tag/NVIDIA/bionemo-framework?label=latest-version)](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/bionemo-framework/tags) [![codecov](https://codecov.io/gh/NVIDIA/bionemo-framework/branch/main/graph/badge.svg?token=XqhegdZRqB)](https://codecov.io/gh/NVIDIA/bionemo-framework) <div align="left"> NVIDIA BioNeMo Framework is a comprehensive suite of programming tools, libraries, and models designed for digital biology. It accelerates the most time-consuming and costly stages of building and adapting biomolecular AI models by providing domain-specific, optimized model recipes and tooling that are easily integrated into GPU-based computational resources with state-of-the-art performance. > [!NOTE] > A core use-case of the BioNeMo Framework is to help digital biology scientists accelerate and scale their model training onto a compute cluster. This repository contains 3 categories of modules for this use-case: > > 1\. Models using **fully-sharded-data-parallel (FSDP)**, which is possible with a number of different implementations including [PyTorch’s FSDP2/FSDP1](https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html) and [NVIDIA megatron-FSDP](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core/distributed/fsdp/src). Sharding a model with FSDP typically requires only a few lines of code changes. You can find models and ready-to-run recipes parallelized with megatron-FSDP and accelerated with [NVIDIA TransformerEngine (TE)](https://github.com/NVIDIA/TransformerEngine) in [`bionemo-recipes`](./bionemo-recipes/). > > <details> > <summary><b>(Click to expand) <code>bionemo-recipes</code> support matrix </b></summary> > <small> > > | Directory | Description | Support Status | 5D Parallel | Megatron-FSDP | TE | Sequence Packing | FP8 | Context Parallelism | > | ---------------------------------------------- | -------------------------------------------------- | -------------- | ----------- | ------------- | ------ | ---------------- | ------ | ------------------- | > | `models/`<br>`amplify` | TE accelerated protein BERT, pushed to HuggingFace | ✅ Active | ❌ | ✅ | ✅ | 🚧 WIP | ✅ | 🚧 WIP | > | `models/`<br>`esm2` | TE accelerated protein BERT, pushed to HuggingFace | ✅ Active | ❌ | ✅ | ✅ | ✅ | ✅ | 🚧 WIP | > | `models/`<br>`geneformer` | TE accelerated single-cell BERT | 🚧 WIP | ❌ | ✅ | 🚧 WIP | 🚧 WIP | 🚧 WIP | 🚧 WIP | > | `recipes/`<br>`esm2_accelerate_te` | Recipe for ESM2 TE + HF Accelerate | ✅ Active | ❌ | 🚧 WIP | ✅ | ❌ | ✅ | 🚧 WIP | > | `recipes/`<br>`esm2_native_te` | Recipe for ESM2 TE + native PyTorch | ✅ Active | ❌ | ✅ | ✅ | ✅ | ✅ | 🚧 WIP | > | `recipes/`<br>`geneformer_native_te_mfsdp_fp8` | Recipe for Geneformer HF model | 🚧 WIP | ❌ | ✅ | ✅ | ❌ | ✅ | 🚧 WIP | > | `recipes/`<br>`vit` | Recipe for Vision Transformer | 🚧 WIP | ❌ | ✅ | ✅ | ❌ | ✅ | 🚧 WIP | > > \[1\]: End-of-life; to be merged with `esm2_native_te` recipe. <br/> > </small> > > </details> > > 2\. Models using explicit **5D parallelism** (tensor parallel, pipeline parallel, context parallel, etc.), for which NVIDIA provides accelerated support with [NeMo](https://github.com/NVIDIA-NeMo/NeMo) and [Megatron-Core](https://github.com/NVIDIA/Megatron-LM). 5D parallelism requires explicit modification of the model code to make it shardable along different dimensions. The models for this style of acceleration and parallelism can be found in the `sub-packages` directory. While it is possible to pip install the models, we strongly suggest using our [Docker image](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/bionemo-framework) that comes with NeMo and Megatron-Core pre-installed. > > <details> > <summary><b>(Click to expand) <code>sub-packages</code> models support matrix</b></summary> > <small> > > | Directory | Description | Support | 5D Parallel | Megatron-FSDP | TE | Sequence Packing | FP8 | Context Parallel | > | ----------------------- | ----------------------------------- | -------------- | ----------- | ------------- | --- | ---------------- | --- | ---------------- | > | `bionemo-amplify` | 5D parallel model | 🔧 Maintenance | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | > | `bionemo-core` | Model Config/test data utils | ✅ Active | ✅ | N/A | ✅ | ❌ | N/A | N/A | > | `bionemo-esm2` | 5D parallel model | ✅ Active | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | > | `bionemo-evo2` | 5D parallel model | ✅ Active | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | > | `bionemo-example_model` | Example 5D parallel model | 🔧 Maintenance | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | > | `bionemo-fw` | Meta package to pull other packages | ✅ Active | ✅ | N/A | N/A | ❌ | ✅ | N/A | > | `bionemo-geneformer` | 5D parallel model | 🔧 Maintenance | ✅ | ❌ | ✅ | ❌ | ✅ | ✅ | > | `bionemo-llm` | 5D parallel base model (BioBert) | ✅ Active | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | > | `bionemo-testing` | Testing Utilities | ✅ Active | ✅ | N/A | N/A | N/A | N/A | N/A | > > </small> > </details> > > 3\. Tooling for dataloading and in-the-training-loop processing, which are lightweight and individually pip installable. These are also in the `sub-packages` directory adjacent to the 5D parallel models. > > <details> > <summary><b>(Click to expand) <code>sub-packages</code> tooling support matrix</b></summary> > <small> > > | Directory | Description | Support | 5D Parallel | Megatron-FSDP | TE | Sequence Packing | FP8 | Context Parallel | > | ----------------------------- | ------------------------------------------ | -------------- | ------------- | ------------- | --- | ---------------- | --- | ---------------- | > | `bionemo-moco` | Molecular Co-design tools | ✅ Active | ❌ | N/A | N/A | N/A | N/A | N/A | > | `bionemo-noodles` | Python API to fast FASTA file I/O | 🔧 Maintenance | ❌ | N/A | N/A | N/A | N/A | N/A | > | `bionemo-scspeedtest` | Single Cell Dataloading benchmark tests | ✅ Active | N/A | N/A | N/A | N/A | N/A | N/A | > | `bionemo-size-aware-batching` | Memory consumption aware batching | 🔧 Maintenance | N/A | N/A | N/A | N/A | N/A | N/A | > | `bionemo-scdl` | Modular Single Cell Data Loader | ✅ Active | ✅ Compatible | N/A | N/A | N/A | N/A | N/A | > | `bionemo-webdatamodule` | PyTorch Lightning module to use WebDataset | 🔧 Maintenance | N/A | N/A | N/A | N/A | N/A | N/A | > > </small> > </details> BioNeMo Framework is part of a larger ecosystem of NVIDIA Biopharma products. Get notified of new releases, bug fixes, critical security updates, and more for biopharma. [Subscribe.](https://www.nvidia.com/en-us/clara/biopharma/product-updates/) ## Documentation Resources - **Official Documentation:** Contents of `sub-packages` including user guides, API references, and troubleshooting, are documented on our [official documentation](https://docs.nvidia.com/bionemo-framework/latest/). Nightly builds of this documentation is available on [BioNeMo Framework GitHub Pages](https://nvidia.github.io/bionemo-framework/) - **🚧 In-Progress Documentation 🚧:** `bionemo-recipes` documentation is currently work in progress, however the recipes are meant to be self-documented and easy to understand—we suggest you throw them into your favorite genai code assistant! ## Getting Started with BioNeMo Framework Full documentation on using the BioNeMo Framework is provided in our documentation: <https://docs.nvidia.com/bionemo-framework/latest/user-guide/>. To simplify the integration of optimized third-party dependencies, BioNeMo is primarily distributed as a containerized library. You can download the latest released container for the BioNeMo Framework from [NGC](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/bionemo-framework). To launch a pre-built container, you can use the brev.dev launchable [![ Click here to deploy.](https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdeploynavy.svg)](https://console.brev.dev/launchable/deploy/now?launchableID=env-2pPDA4sJyTuFf3KsCv5KWRbuVlU) or execute the following command: ```bash docker run --rm -it \ --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \ nvcr.io/nvidia/clara/bionemo-framework:nightly \ /bin/bash ``` ### Setting up a local development environment #### Initializing 3rd-party dependencies as git submodules The NeMo and Megatron-LM dependencies are included as git submodules in bionemo2. The pinned commits for these submodules represent the "last-known-good" versions of these packages that are confirmed to be working with bionemo2 (and those that are tested in CI). To initialize these sub-modules when cloning the repo, add the `--recursive` flag to the git clone command: ```bash git clone --recursive [email protected]:NVIDIA/bionemo-framework.git cd bionemo-framework ``` To download the pinned versions of these submodules within an existing git repository, run ```bash git submodule update --init --recursive ``` Different branches of the repo can have different pinned versions of these third-party submodules. Ensure submodules are automatically updated after switching branches or pulling updates by configuring git with: ```bash git config submodule.recurse true ``` **NOTE**: this setting will not download **new** or remove **old** submodules with the branch's changes. You will have to run the full `git submodule update --init --recursive` command in these situations. #### Build the Docker Image Locally With a locally cloned repository and initialized submodules, build the BioNeMo container using: ```bash docker buildx build . -t my-container-tag ``` If you see an error message like `No file descriptors available (os error 24)`, add the option `--ulimit nofile=65535:65535` to the docker build command. #### VSCode Devcontainer for Interactive Debugging We distribute a [development container](https://devcontainers.github.io/) configuration for vscode (`.devcontainer/devcontainer.json`) that simplifies the process of local testing and development. Opening the bionemo-framework folder with VSCode should prompt you to re-open the folder inside the devcontainer environment. > [!NOTE] > The first time you launch the devcontainer, it may take a long time to build the image. Building the image locally > (using the command shown above) will ensure that most of the layers are present in the local docker cache. ### Quick Start See the [tutorials pages](https://docs.nvidia.com/bionemo-framework/latest/user-guide/examples/bionemo-esm2/pretrain/) for example applications and getting started guides. ", Assign "at most 3 tags" to the expected json: {"id":"12271","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts