Trendshift - Ask AI

base on Lightning fast C++/CUDA neural network framework # Tiny CUDA Neural Networks ![](https://github.com/NVlabs/tiny-cuda-nn/workflows/CI/badge.svg) This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning fast ["fully fused" multi-layer perceptron](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/fully-fused-mlp-diagram.png) ([technical paper](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.pdf)), a versatile [multiresolution hash encoding](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/multiresolution-hash-encoding-diagram.png) ([technical paper](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf)), as well as support for various other input encodings, losses, and optimizers. ## Performance ![Image](data/readme/fully-fused-vs-tensorflow.png) _Fully fused networks vs. TensorFlow v2.5.0 w/ XLA. Measured on 64 (solid line) and 128 (dashed line) neurons wide multi-layer perceptrons on an RTX 3090. Generated by `benchmarks/bench_ours.cu` and `benchmarks/bench_tensorflow.py` using `data/config_oneblob.json`._ ## Usage Tiny CUDA neural networks have a simple C++/CUDA API: ```cpp #include <tiny-cuda-nn/common.h> // Configure the model nlohmann::json config = { {"loss", { {"otype", "L2"} }}, {"optimizer", { {"otype", "Adam"}, {"learning_rate", 1e-3}, }}, {"encoding", { {"otype", "HashGrid"}, {"n_levels", 16}, {"n_features_per_level", 2}, {"log2_hashmap_size", 19}, {"base_resolution", 16}, {"per_level_scale", 2.0}, }}, {"network", { {"otype", "FullyFusedMLP"}, {"activation", "ReLU"}, {"output_activation", "None"}, {"n_neurons", 64}, {"n_hidden_layers", 2}, }}, }; using namespace tcnn; auto model = create_from_config(n_input_dims, n_output_dims, config); // Train the model (batch_size must be a multiple of tcnn::BATCH_SIZE_GRANULARITY) GPUMatrix<float> training_batch_inputs(n_input_dims, batch_size); GPUMatrix<float> training_batch_targets(n_output_dims, batch_size); for (int i = 0; i < n_training_steps; ++i) { generate_training_batch(&training_batch_inputs, &training_batch_targets); // <-- your code float loss; model.trainer->training_step(training_batch_inputs, training_batch_targets, &loss); std::cout << "iteration=" << i << " loss=" << loss << std::endl; } // Use the model GPUMatrix<float> inference_inputs(n_input_dims, batch_size); generate_inputs(&inference_inputs); // <-- your code GPUMatrix<float> inference_outputs(n_output_dims, batch_size); model.network->inference(inference_inputs, inference_outputs); ``` ## Example: learning a 2D image We provide a sample application where an image function _(x,y) -> (R,G,B)_ is learned. It can be run via ```sh tiny-cuda-nn$ ./build/mlp_learning_an_image data/images/albert.jpg data/config_hash.json ``` producing an image every couple of training steps. Each 1000 steps should take a bit over 1 second with the default configuration on an RTX 4090. | 10 steps | 100 steps | 1000 steps | Reference image | |:---:|:---:|:---:|:---:| | ![10steps](data/readme/10.jpg) | ![100steps](data/readme/100.jpg) | ![1000steps](data/readme/1000.jpg) | ![reference](data/images/albert.jpg) | ## Requirements - An __NVIDIA GPU__; tensor cores increase performance when available. All shown results come from an RTX 3090. - A __C++14__ capable compiler. The following choices are recommended and have been tested: - __Windows:__ Visual Studio 2019 or 2022 - __Linux:__ GCC/G++ 8 or higher - A recent version of __[CUDA](https://developer.nvidia.com/cuda-toolkit)__. The following choices are recommended and have been tested: - __Windows:__ CUDA 11.5 or higher - __Linux:__ CUDA 10.2 or higher - __[CMake](https://cmake.org/) v3.21 or higher__. - The fully fused MLP component of this framework requires a __very large__ amount of shared memory in its default configuration. It will likely only work on an RTX 3090, an RTX 2080 Ti, or higher-end GPUs. Lower end cards must reduce the `n_neurons` parameter or use the `CutlassMLP` (better compatibility but slower) instead. If you are using Linux, install the following packages ```sh sudo apt-get install build-essential git ``` We also recommend installing [CUDA](https://developer.nvidia.com/cuda-toolkit) in `/usr/local/` and adding the CUDA installation to your PATH. For example, if you have CUDA 12.6.3, add the following to your `~/.bashrc` ```sh export PATH="/usr/local/cuda-12.6.3/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda-12.6.3/lib64:$LD_LIBRARY_PATH" ``` ## Compilation (Windows & Linux) Begin by cloning this repository and all its submodules using the following command: ```sh $ git clone --recursive https://github.com/nvlabs/tiny-cuda-nn $ cd tiny-cuda-nn ``` Then, use CMake to build the project: (on Windows, this must be in a [developer command prompt](https://docs.microsoft.com/en-us/cpp/build/building-on-the-command-line?view=msvc-160#developer_command_prompt)) ```sh tiny-cuda-nn$ cmake . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo tiny-cuda-nn$ cmake --build build --config RelWithDebInfo -j ``` If compilation fails inexplicably or takes longer than an hour, you might be running out of memory. Try running the above command without `-j` in that case. ## PyTorch extension __tiny-cuda-nn__ comes with a [PyTorch](https://github.com/pytorch/pytorch) extension that allows using the fast MLPs and input encodings from within a [Python](https://www.python.org/) context. These bindings can be significantly faster than full Python implementations; in particular for the [multiresolution hash encoding](https://raw.githubusercontent.com/NVlabs/tiny-cuda-nn/master/data/readme/multiresolution-hash-encoding-diagram.png). > The overheads of Python/PyTorch can nonetheless be extensive if the batch size is small. > For example, with a batch size of 64k, the bundled `mlp_learning_an_image` example is __~2x slower__ through PyTorch than native CUDA. > With a batch size of 256k and higher (default), the performance is much closer. Begin by setting up a Python 3.X environment with a recent, CUDA-enabled version of PyTorch. Then, invoke ```sh pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch ``` Alternatively, if you would like to install from a local clone of __tiny-cuda-nn__, invoke ```sh tiny-cuda-nn$ cd bindings/torch tiny-cuda-nn/bindings/torch$ python setup.py install ``` Upon success, you can use __tiny-cuda-nn__ models as in the following example: ```py import commentjson as json import tinycudann as tcnn import torch with open("data/config_hash.json") as f: config = json.load(f) # Option 1: efficient Encoding+Network combo. model = tcnn.NetworkWithInputEncoding( n_input_dims, n_output_dims, config["encoding"], config["network"] ) # Option 2: separate modules. Slower but more flexible. encoding = tcnn.Encoding(n_input_dims, config["encoding"]) network = tcnn.Network(encoding.n_output_dims, n_output_dims, config["network"]) model = torch.nn.Sequential(encoding, network) ``` See `samples/mlp_learning_an_image_pytorch.py` for an example. ## Components Following is a summary of the components of this framework. [The JSON documentation](DOCUMENTATION.md) lists configuration options. | Networks |   |   | :--- | :---------- | :----- | Fully fused MLP | `src/fully_fused_mlp.cu` | Lightning fast implementation of small multi-layer perceptrons (MLPs). | CUTLASS MLP | `src/cutlass_mlp.cu` | MLP based on [CUTLASS](https://github.com/NVIDIA/cutlass)' GEMM routines. Slower than fully-fused, but handles larger networks and still is reasonably fast. | Input encodings |   |   | :--- | :---------- | :----- | Composite | `include/tiny-cuda-nn/encodings/composite.h` | Allows composing multiple encodings. Can be, for example, used to assemble the Neural Radiance Caching encoding [[Müller et al. 2021]](https://tom94.net/). | Frequency | `include/tiny-cuda-nn/encodings/frequency.h` | NeRF's [[Mildenhall et al. 2020]](https://www.matthewtancik.com/nerf) positional encoding applied equally to all dimensions. | Grid | `include/tiny-cuda-nn/encodings/grid.h` | Encoding based on trainable multiresolution grids. Used for [Instant Neural Graphics Primitives [Müller et al. 2022]](https://nvlabs.github.io/instant-ngp/). The grids can be backed by hashtables, dense storage, or tiled storage. | Identity | `include/tiny-cuda-nn/encodings/identity.h` | Leaves values untouched. | Oneblob | `include/tiny-cuda-nn/encodings/oneblob.h` | From Neural Importance Sampling [[Müller et al. 2019]](https://tom94.net/data/publications/mueller18neural/mueller18neural-v4.pdf) and Neural Control Variates [[Müller et al. 2020]](https://tom94.net/data/publications/mueller20neural/mueller20neural.pdf). | SphericalHarmonics | `include/tiny-cuda-nn/encodings/spherical_harmonics.h` | A frequency-space encoding that is more suitable to direction vectors than component-wise ones. | TriangleWave | `include/tiny-cuda-nn/encodings/triangle_wave.h` | Low-cost alternative to the NeRF's encoding. Used in Neural Radiance Caching [[Müller et al. 2021]](https://tom94.net/). | Losses |   |   | :--- | :---------- | :----- | L1 | `include/tiny-cuda-nn/losses/l1.h` | Standard L1 loss. | Relative L1 | `include/tiny-cuda-nn/losses/l1.h` | Relative L1 loss normalized by the network prediction. | MAPE | `include/tiny-cuda-nn/losses/mape.h` | Mean absolute percentage error (MAPE). The same as Relative L1, but normalized by the target. | SMAPE | `include/tiny-cuda-nn/losses/smape.h` | Symmetric mean absolute percentage error (SMAPE). The same as Relative L1, but normalized by the mean of the prediction and the target. | L2 | `include/tiny-cuda-nn/losses/l2.h` | Standard L2 loss. | Relative L2 | `include/tiny-cuda-nn/losses/relative_l2.h` | Relative L2 loss normalized by the network prediction [[Lehtinen et al. 2018]](https://github.com/NVlabs/noise2noise). | Relative L2 Luminance | `include/tiny-cuda-nn/losses/relative_l2_luminance.h` | Same as above, but normalized by the luminance of the network prediction. Only applicable when network prediction is RGB. Used in Neural Radiance Caching [[Müller et al. 2021]](https://tom94.net/). | Cross Entropy | `include/tiny-cuda-nn/losses/cross_entropy.h` | Standard cross entropy loss. Only applicable when the network prediction is a PDF. | Variance | `include/tiny-cuda-nn/losses/variance_is.h` | Standard variance loss. Only applicable when the network prediction is a PDF. | Optimizers |   |   | :--- | :---------- | :----- | Adam | `include/tiny-cuda-nn/optimizers/adam.h` | Implementation of Adam [[Kingma and Ba 2014]](https://arxiv.org/abs/1412.6980), generalized to AdaBound [[Luo et al. 2019]](https://github.com/Luolc/AdaBound). | Novograd | `include/tiny-cuda-nn/optimizers/lookahead.h` | Implementation of Novograd [[Ginsburg et al. 2019]](https://arxiv.org/abs/1905.11286). | SGD | `include/tiny-cuda-nn/optimizers/sgd.h` | Standard stochastic gradient descent (SGD). | Shampoo | `include/tiny-cuda-nn/optimizers/shampoo.h` | Implementation of the 2nd order Shampoo optimizer [[Gupta et al. 2018]](https://arxiv.org/abs/1802.09568) with home-grown optimizations as well as those by [Anil et al. [2020]](https://arxiv.org/abs/2002.09018). | Average | `include/tiny-cuda-nn/optimizers/average.h` | Wraps another optimizer and computes a linear average of the weights over the last N iterations. The average is used for inference only (does not feed back into training). | Batched | `include/tiny-cuda-nn/optimizers/batched.h` | Wraps another optimizer, invoking the nested optimizer once every N steps on the averaged gradient. Has the same effect as increasing the batch size but requires only a constant amount of memory. | | Composite | `include/tiny-cuda-nn/optimizers/composite.h` | Allows using several optimizers on different parameters. | EMA | `include/tiny-cuda-nn/optimizers/average.h` | Wraps another optimizer and computes an exponential moving average of the weights. The average is used for inference only (does not feed back into training). | Exponential Decay | `include/tiny-cuda-nn/optimizers/exponential_decay.h` | Wraps another optimizer and performs piecewise-constant exponential learning-rate decay. | Lookahead | `include/tiny-cuda-nn/optimizers/lookahead.h` | Wraps another optimizer, implementing the lookahead algorithm [[Zhang et al. 2019]](https://arxiv.org/abs/1907.08610). ## License and Citation This framework is licensed under the BSD 3-clause license. Please see `LICENSE.txt` for details. If you use it in your research, we would appreciate a citation via ```bibtex @software{tiny-cuda-nn, author = {M\"uller, Thomas}, license = {BSD-3-Clause}, month = {4}, title = {{tiny-cuda-nn}}, url = {https://github.com/NVlabs/tiny-cuda-nn}, version = {1.7}, year = {2021} } ``` For business inquiries, please visit our website and submit the form: [NVIDIA Research Licensing](https://www.nvidia.com/en-us/research/inquiries/) ## Publications & Software Among others, this framework powers the following publications: > __Instant Neural Graphics Primitives with a Multiresolution Hash Encoding__ > [Thomas Müller](https://tom94.net), [Alex Evans](https://research.nvidia.com/person/alex-evans), [Christoph Schied](https://research.nvidia.com/person/christoph-schied), [Alexander Keller](https://research.nvidia.com/person/alex-keller) > _ACM Transactions on Graphics (__SIGGRAPH__), July 2022_ > __[Website](https://nvlabs.github.io/instant-ngp/) / [Paper](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf) / [Code](https://github.com/NVlabs/instant-ngp) / [Video](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.mp4) / [BibTeX](https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.bib)__ > __Extracting Triangular 3D Models, Materials, and Lighting From Images__ > [Jacob Munkberg](https://research.nvidia.com/person/jacob-munkberg), [Jon Hasselgren](https://research.nvidia.com/person/jon-hasselgren), [Tianchang Shen](http://www.cs.toronto.edu/~shenti11/), [Jun Gao](http://www.cs.toronto.edu/~jungao/), [Wenzheng Chen](http://www.cs.toronto.edu/~wenzheng/), [Alex Evans](https://research.nvidia.com/person/alex-evans), [Thomas Müller](https://tom94.net), [Sanja Fidler](https://www.cs.toronto.edu/~fidler/) > __CVPR (Oral)__, June 2022 > __[Website](https://nvlabs.github.io/nvdiffrec/) / [Paper](https://nvlabs.github.io/nvdiffrec/assets/paper.pdf) / [Video](https://nvlabs.github.io/nvdiffrec/assets/video.mp4) / [BibTeX](https://nvlabs.github.io/nvdiffrec/assets/bib.txt)__ > __Real-time Neural Radiance Caching for Path Tracing__ > [Thomas Müller](https://tom94.net), [Fabrice Rousselle](https://research.nvidia.com/person/fabrice-rousselle), [Jan Novák](http://jannovak.info), [Alexander Keller](https://research.nvidia.com/person/alex-keller) > _ACM Transactions on Graphics (__SIGGRAPH__), August 2021_ > __[Paper](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.pdf) / [GTC talk](https://gtc21.event.nvidia.com/media/Fully%20Fused%20Neural%20Network%20for%20Radiance%20Caching%20in%20Real%20Time%20Rendering%20%5BE31307%5D/1_liqy6k1c) / [Video](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.mp4) / [Interactive results viewer](https://tom94.net/data/publications/mueller21realtime/interactive-viewer/) / [BibTeX](https://tom94.net/data/publications/mueller21realtime/mueller21realtime.bib)__ As well as the following software: > __NerfAcc: A General NeRF Accleration Toolbox__ > [Ruilong Li](https://www.liruilong.cn/), [Matthew Tancik](https://www.matthewtancik.com/about-me), [Angjoo Kanazawa](https://people.eecs.berkeley.edu/~kanazawa/) > __https://github.com/KAIR-BAIR/nerfacc__ > __Nerfstudio: A Framework for Neural Radiance Field Development__ > [Matthew Tancik*](https://www.matthewtancik.com/about-me), [Ethan Weber*](https://ethanweber.me/), [Evonne Ng*](http://people.eecs.berkeley.edu/~evonne_ng/), [Ruilong Li](https://www.liruilong.cn/), Brent Yi, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, [Angjoo Kanazawa](https://people.eecs.berkeley.edu/~kanazawa/) > __https://github.com/nerfstudio-project/nerfstudio__ Please feel free to make a pull request if your publication or software is not listed. ## Acknowledgments Special thanks go to the NRC authors for helpful discussions and to [Nikolaus Binder](https://research.nvidia.com/person/nikolaus-binder) for providing part of the infrastructure of this framework, as well as for help with utilizing TensorCores from within CUDA. ", Assign "at most 3 tags" to the expected json: {"id":"2428","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"

AI prompts

AI prompts