base on Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ # DeTi*k*Zify<br><sub><sup>Synthesizing Graphics Programs for Scientific Figures and Sketches with Ti*k*Z</sup></sub> [![OpenReview](https://img.shields.io/badge/View%20on%20OpenReview-8C1B13?labelColor=gray&logo=)](https://openreview.net/forum?id=bcVLFQCOjc) [![arXiv](https://img.shields.io/badge/View%20on%20arXiv-B31B1B?logo=arxiv&labelColor=gray)](https://arxiv.org/abs/2405.15306) [![Hugging Face](https://img.shields.io/badge/View%20on%20Hugging%20Face-blue?labelColor=gray&logo=)](https://huggingface.co/collections/nllg/detikzify-664460c521aa7c2880095a8b) [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1hPWqucbPGTavNlYvOBvSNBAwdcPZKe8F) Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy. Furthermore, recreating existing figures that are not stored in formats preserving semantic information is equally complex. To tackle this problem, we introduce [DeTi*k*Zify](https://github.com/potamides/DeTikZify), a novel multimodal language model that automatically synthesizes scientific figures as semantics-preserving [Ti*k*Z](https://github.com/pgf-tikz/pgf) graphics programs based on sketches and existing figures. We also introduce an MCTS-based inference algorithm that enables DeTi*k*Zify to iteratively refine its outputs without the need for additional training. https://github.com/potamides/DeTikZify/assets/53401822/203d2853-0b5c-4a2b-9d09-3ccb65880cd3 ## News * **2025-03-17**: We release [Ti*k*Zero](https://huggingface.co/nllg/tikzero-adapter) adapters which plug directly into [DeTi*k*Zify<sub>v2</sub> (8b)](https://huggingface.co/nllg/detikzify-v2-8b) and enable zero-shot text-conditioning, and [Ti*k*Zero+](https://huggingface.co/nllg/tikzero-plus-10b) with additional end-to-end fine-tuning. For more information see our [paper](https://arxiv.org/abs/2503.11509) and usage examples [below](#usage). * **2024-12-05**: We release [DeTi*k*Zify<sub>v2</sub> (8b)](https://huggingface.co/nllg/detikzify-v2-8b), our latest model which surpasses all previous versions in our evaluation and make it the new default model in our [Hugging Face Space](https://huggingface.co/spaces/nllg/DeTikZify). Check out the [model card](https://huggingface.co/nllg/detikzify-v2-8b-preview#model-card-for-detikzifyv2-8b) for more information. * **2024-09-24**: DeTi*k*Zify was accepted at [NeurIPS 2024](https://neurips.cc/Conferences/2024) as a [spotlight paper](https://neurips.cc/virtual/2024/poster/94474)! ## Installation > [!TIP] > If you encounter difficulties with installation or inference on your own > hardware, consider visiting our [Hugging Face > Space](https://huggingface.co/spaces/nllg/DeTikZify) (please note that > restarting the space can take up to 30 minutes). Should you experience long > queues, you have the option to > [duplicate](https://huggingface.co/spaces/nllg/DeTikZify?duplicate=true) it > with a paid private GPU runtime or [run it > locally](https://huggingface.co/spaces/nllg/DeTikZify?docker=true) with > Docker. Additionally, you can try our demo on [Google > Colab](https://colab.research.google.com/drive/1hPWqucbPGTavNlYvOBvSNBAwdcPZKe8F). > However, setting up the environment there might take some time, and the free > tier only supports inference for the 1b models. The Python package of DeTi*k*Zify can be easily installed using [pip](https://pip.pypa.io/en/stable): ```sh pip install 'detikzify[legacy] @ git+https://github.com/potamides/DeTikZify' ``` The `[legacy]` extra is only required if you plan to use the DeTi*k*Zify<sub>v1</sub> models. If you only plan to use DeTi*k*Zify<sub>v2</sub> you can remove it. If your goal is to run the included [examples](examples), it is easier to clone the repository and install it in editable mode like this: ```sh git clone https://github.com/potamides/DeTikZify pip install -e DeTikZify[examples] ``` In addition, DeTi*k*Zify requires a full [TeX Live 2023](https://www.tug.org/texlive) installation, [ghostscript](https://www.ghostscript.com), and [poppler](https://poppler.freedesktop.org) which you have to install through your package manager or via other means. ## Usage > [!TIP] > For interactive use and general [usage tips](detikzify/webui#usage-tips), > we recommend checking out our [web UI](detikzify/webui), which can be started > directly from the command line (use `--help` for a list of all options): > ```sh > python -m detikzify.webui --light > ``` If all required dependencies are installed, the full range of DeTi*k*Zify features such as compiling, rendering, and saving Ti*k*Z graphics, and MCTS-based inference can be accessed through its programming interface: <details open><summary>DeTi<i>k</i>Zify Example</summary> ```python from operator import itemgetter from detikzify.model import load from detikzify.infer import DetikzifyPipeline image = "https://w.wiki/A7Cc" pipeline = DetikzifyPipeline(*load( model_name_or_path="nllg/detikzify-v2-8b", device_map="auto", torch_dtype="bfloat16", )) # generate a single TikZ program fig = pipeline.sample(image=image) # if it compiles, rasterize it and show it if fig.is_rasterizable: fig.rasterize().show() # run MCTS for 10 minutes and generate multiple TikZ programs figs = set() for score, fig in pipeline.simulate(image=image, timeout=600): figs.add((score, fig)) # save the best TikZ program best = sorted(figs, key=itemgetter(0))[-1][1] best.save("fig.tex") ``` </details> Through [Ti*k*Zero](https://huggingface.co/nllg/tikzero-adapter) adapters and [Ti*k*Zero+](https://huggingface.co/nllg/tikzero-plus-10b) it is also possible to synthesize graphics programs conditioned on text (cf. our [paper](https://arxiv.org/abs/2503.11509) for details). Note that this currently only supported through the programming interface: <details open><summary>Ti<i>k</i>Zero+ Example</summary> ```python from detikzify.model import load from detikzify.infer import DetikzifyPipeline caption = "A multi-layer perceptron with two hidden layers." pipeline = DetikzifyPipeline(*load( model_name_or_path="nllg/tikzero-plus-10b", device_map="auto", torch_dtype="bfloat16", )) # generate a single TikZ program fig = pipeline.sample(text=caption) # if it compiles, rasterize it and show it if fig.is_rasterizable: fig.rasterize().show() ``` </details> <details><summary>Ti<i>k</i>Zero Example</summary> ```python from detikzify.model import load, load_adapter from detikzify.infer import DetikzifyPipeline caption = "A multi-layer perceptron with two hidden layers." pipeline = DetikzifyPipeline( *load_adapter( *load( model_name_or_path="nllg/detikzify-v2-8b", device_map="auto", torch_dtype="bfloat16", ), adapter_name_or_path="nllg/tikzero-adapter", ) ) # generate a single TikZ program fig = pipeline.sample(text=caption) # if it compiles, rasterize it and show it if fig.is_rasterizable: fig.rasterize().show() ``` </details> More involved examples, for example for evaluation and training, can be found in the [examples](examples) folder. ## Model Weights & Datasets We upload all our DeTi*k*Zify models and datasets to the [Hugging Face Hub](https://huggingface.co/collections/nllg/detikzify-664460c521aa7c2880095a8b) (Ti*k*Zero models are available [here](https://huggingface.co/collections/nllg/tikzero-67d1952fab69f5bd172de1fe)). However, please note that for the public release of the [DaTi*k*Z<sub>v2</sub>](https://huggingface.co/datasets/nllg/datikz-v2) and [DaTi*k*Z<sub>v3</sub>](https://huggingface.co/datasets/nllg/datikz-v3) datasets, we had to remove a considerable portion of Ti*k*Z drawings originating from [arXiv](https://arxiv.org), as the [arXiv non-exclusive license](https://arxiv.org/licenses/nonexclusive-distrib/1.0/license.html) does not permit redistribution. We do, however, release our [dataset creation scripts](https://github.com/potamides/DaTikZ) and encourage anyone to recreate the full version of DaTi*k*Z themselves. ## Citation If DeTi*k*Zify and Ti*k*Zero have been beneficial for your research or applications, we kindly request you to acknowledge this by citing them as follows: ```bibtex @inproceedings{belouadi2024detikzify, title={{DeTikZify}: Synthesizing Graphics Programs for Scientific Figures and Sketches with {TikZ}}, author={Jonas Belouadi and Simone Paolo Ponzetto and Steffen Eger}, booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}, year={2024}, url={https://openreview.net/forum?id=bcVLFQCOjc} } ``` ```bibtex @misc{belouadi2025tikzero, title={{TikZero}: Zero-Shot Text-Guided Graphics Program Synthesis}, author={Jonas Belouadi and Eddy Ilg and Margret Keuper and Hideki Tanaka and Masao Utiyama and Raj Dabre and Steffen Eger and Simone Paolo Ponzetto}, year={2025}, eprint={2503.11509}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2503.11509}, } ``` ## Acknowledgments The implementation of the DeTi*k*Zify model architecture is based on [LLaVA](https://github.com/haotian-liu/LLaVA) and [AutomaTikZ](https://github.com/potamides/AutomaTikZ) (v1), and [Idefics 3](https://huggingface.co/HuggingFaceM4/Idefics3-8B-Llama3) (v2). Our MCTS implementation is based on [VerMCTS](https://github.com/namin/llm-verified-with-monte-carlo-tree-search). The Ti*k*Zero architecture draws inspiration from [Flamingo](https://deepmind.google/discover/blog/tackling-multiple-tasks-with-a-single-visual-language-model/) and [LLaMA 3.2-Vision](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices). ", Assign "at most 3 tags" to the expected json: {"id":"13590","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"