AI prompts
base on Efficient framework-agnostic data loading MLX Data
=========
MLX Data is a framework agnostic data loading library brought to you by Apple
machine learning research. It works with PyTorch, Jax or
[MLX](https://ml-explore.github.io/mlx/).
The goal of the project is to be efficient but also flexible, enabling for
instance the loading and processing of 1,000s of images per second but also
running arbitrary python transformations on the resulting batches.
It can be used from Python as is shown in the following examples or from C++
with a very similar intuitive API.
For more details see the [documentation](https://ml-explore.github.io/mlx-data/).
Example
=======
The following pipeline is taken from the `Caltech 101` benchmark found in
`benchmarks/comparative/caltech101/mlx_data.py`.
```python
# A simple python function returning a list of dicts. All samples in MLX data
# are dicts of arrays.
def files_and_classes(root: Path):
files = [str(f) for f in root.glob("**/*.jpg")]
files = [f for f in files if "BACKGROUND" not in f]
classes = dict(
map(reversed, enumerate(sorted(set(f.split("/")[-2] for f in files))))
)
return [
dict(image=f.encode("ascii"), label=classes[f.split("/")[-2]]) for f in files
]
dset = (
# Make a buffer (finite length container of samples) from the python list
dx.buffer_from_vector(files_and_classes(root))
# Shuffle and transform to a stream
.shuffle()
.to_stream()
# Implement a simple image pipeline. No random augmentations here but they
# could be applied.
.load_image("image") # load the file pointed to by the 'image' key as an image
.image_resize_smallest_side("image", 256)
.image_center_crop("image", 224, 224)
# Accumulate into batches
.batch(batch_size)
# Cast to float32 and scale to [0, 1]. We do this in python and we could
# have done any transformation we could think of.
.key_transform("image", lambda x: x.astype("float32") / 255)
# Finally, fetch batches in background threads
.prefetch(prefetch_size=8, num_threads=8)
)
# dset is a python iterable so one could simply
for sample in dset:
# access sample["image"] and sample["label"]
pass
```
## Contributing
Check out the [contribution guidelines](CONTRIBUTING.md) for more
information on contributing to MLX Data. See the
[docs](https://ml-explore.github.io/mlx-data/build/html/index.html) for
more information on building from source, and running tests.
We are grateful for all [our
contributors](ACKNOWLEDGMENTS.md#Individual-Contributors). Special thanks
to [David Koski](https://github.com/davidkoski) and [Tatiana
Likhomanenko](https://github.com/tlikhomanenko/tlikhomanenko) for their
[contributions](ACKNOWLEDGMENTS.md#Individual-Contributors) to MLX Data
before open-source. If you contribute to MLX Data and wish to be
acknowledged, please add your name to the list in your pull request.
## Citing MLX
The MLX software suite was initially developed with equal contribution by
Awni Hannun, Jagrit Digani, Angelos Katharopoulos, and Ronan Collobert. If
you find MLX useful in your research and wish to cite it, please use the
following BibTex entry:
```
@software{mlx2023,
author = {Awni Hannun and Jagrit Digani and Angelos Katharopoulos and Ronan Collobert},
title = {{MLX}: Efficient and flexible machine learning on Apple silicon},
url = {https://github.com/ml-explore},
version = {0.0},
year = {2023},
}
```
", Assign "at most 3 tags" to the expected json: {"id":"5741","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"