base on Efficient framework-agnostic data loading MLX Data ========= MLX Data is a framework agnostic data loading library brought to you by Apple machine learning research. It works with PyTorch, Jax or [MLX](https://ml-explore.github.io/mlx/). The goal of the project is to be efficient but also flexible, enabling for instance the loading and processing of 1,000s of images per second but also running arbitrary python transformations on the resulting batches. It can be used from Python as is shown in the following examples or from C++ with a very similar intuitive API. For more details see the [documentation](https://ml-explore.github.io/mlx-data/). Example ======= The following pipeline is taken from the `Caltech 101` benchmark found in `benchmarks/comparative/caltech101/mlx_data.py`. ```python # A simple python function returning a list of dicts. All samples in MLX data # are dicts of arrays. def files_and_classes(root: Path): files = [str(f) for f in root.glob("**/*.jpg")] files = [f for f in files if "BACKGROUND" not in f] classes = dict( map(reversed, enumerate(sorted(set(f.split("/")[-2] for f in files)))) ) return [ dict(image=f.encode("ascii"), label=classes[f.split("/")[-2]]) for f in files ] dset = ( # Make a buffer (finite length container of samples) from the python list dx.buffer_from_vector(files_and_classes(root)) # Shuffle and transform to a stream .shuffle() .to_stream() # Implement a simple image pipeline. No random augmentations here but they # could be applied. .load_image("image") # load the file pointed to by the 'image' key as an image .image_resize_smallest_side("image", 256) .image_center_crop("image", 224, 224) # Accumulate into batches .batch(batch_size) # Cast to float32 and scale to [0, 1]. We do this in python and we could # have done any transformation we could think of. .key_transform("image", lambda x: x.astype("float32") / 255) # Finally, fetch batches in background threads .prefetch(prefetch_size=8, num_threads=8) ) # dset is a python iterable so one could simply for sample in dset: # access sample["image"] and sample["label"] pass ``` ## Contributing Check out the [contribution guidelines](CONTRIBUTING.md) for more information on contributing to MLX Data. See the [docs](https://ml-explore.github.io/mlx-data/build/html/index.html) for more information on building from source, and running tests. We are grateful for all [our contributors](ACKNOWLEDGMENTS.md#Individual-Contributors). Special thanks to [David Koski](https://github.com/davidkoski) and [Tatiana Likhomanenko](https://github.com/tlikhomanenko/tlikhomanenko) for their [contributions](ACKNOWLEDGMENTS.md#Individual-Contributors) to MLX Data before open-source. If you contribute to MLX Data and wish to be acknowledged, please add your name to the list in your pull request. ## Citing MLX The MLX software suite was initially developed with equal contribution by Awni Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert. If you find MLX useful in your research and wish to cite it, please use the following BibTex entry: ``` @software{mlx2023, author = {Awni Hannun and Jagrit Digani and Angelos Katharopoulos and Ronan Collobert}, title = {{MLX}: Efficient and flexible machine learning on Apple silicon}, url = {https://github.com/ml-explore}, version = {0.0}, year = {2023}, } ``` ", Assign "at most 3 tags" to the expected json: {"id":"5741","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"