base on The Auron accelerator for distributed computing framework (e.g., Spark) leverages native vectorized execution to accelerate query processing <!-- - Licensed to the Apache Software Foundation (ASF) under one or more - contributor license agreements. See the NOTICE file distributed with - this work for additional information regarding copyright ownership. - The ASF licenses this file to You under the Apache License, Version 2.0 - (the "License"); you may not use this file except in compliance with - the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. --> # Apache Auron (Incubating) [![TPC-DS](https://github.com/apache/auron/actions/workflows/tpcds.yml/badge.svg?branch=master)](https://github.com/apache/auron/actions/workflows/tpcds.yml) [![master-ce7-builds](https://github.com/apache/auron/actions/workflows/build-ce7-releases.yml/badge.svg?branch=master)](https://github.com/apache/auron/actions/workflows/build-ce7-releases.yml) <p align="center"><img src="./dev/auron-logo.png" /></p> The Auron accelerator for big data engines (e.g., Spark, Flink) leverages native vectorized execution to accelerate query processing. It combines the power of the [Apache DataFusion](https://arrow.apache.org/datafusion/) library and the scale of the distributed computing framework. Auron takes a fully optimized physical plan from distributed computing framework, mapping it into DataFusion's execution plan, and performs native plan computation. The key capabilities of Auron include: - **Native execution**: Implemented in Rust, eliminating JVM overhead and enabling predictable performance. - **Vectorized computation**: Built on Apache Arrow's columnar format, fully leveraging SIMD instructions for batch processing. - **Pluggable architecture:**: Seamlessly integrates with Apache Spark while designed for future extensibility to other engines. - **Production-hardened optimizations:** Multi-level memory management, compacted shuffle formats, and adaptive execution strategies developed through large-scale deployment. Based on the inherent well-defined extensibility of DataFusion, Auron can be easily extended to support: - Various object stores. - Operators. - Simple and Aggregate functions. - File formats. We encourage you to extend [DataFusion](https://github.com/apache/arrow-datafusion) capability directly and add the supports in Auron with simple modifications in plan-serde and extension translation. ## Build from source To build Auron from source, follow the steps below: 1. Install Rust Auron's native execution lib is written in Rust. You need to install Rust (nightly) before compiling. We recommend using [rustup](https://rustup.rs/) for installation. 2. Install JDK Auron has been well tested with JDK 8, 11, and 17. Make sure `JAVA_HOME` is properly set and points to your desired version. 3. Check out the source code. 4. Build the project. You can build Auron either *locally* or *inside Docker* using one of the supported OS images via the unified script: `auron-build.sh`. Run `./auron-build.sh --help` to see all available options. After the build completes, a fat JAR with all dependencies will be generated in either the `target/` directory (for local builds) or `target-docker/` directory (for Docker builds), depending on the selected build mode. ## Run Spark Job with Auron Accelerator This section describes how to submit and configure a Spark Job with Auron support. 1. Move the Auron JAR to the Spark client classpath (normally spark-xx.xx.xx/jars/). 2. Add the following configs to spark configuration in `spark-xx.xx.xx/conf/spark-default.conf`: ```properties spark.auron.enable true spark.sql.extensions org.apache.spark.sql.auron.AuronSparkSessionExtension spark.shuffle.manager org.apache.spark.sql.execution.auron.shuffle.AuronShuffleManager spark.memory.offHeap.enabled false # suggested executor memory configuration spark.executor.memory 4g spark.executor.memoryOverhead 4096 ``` 3. submit a query with spark-sql, or other tools like spark-thriftserver: ```shell spark-sql -f tpcds/q01.sql ``` ## Performance TPC-DS 1TB Benchmark Results: ![tpcds-benchmark-echarts.png](./benchmark-results/tpcds-benchmark-echarts.png) For methodology and additional results, please refer to [benchmark documentation](https://auron.apache.org/documents/benchmarks.html). We also encourage you to benchmark Auron and share the results with us. 🤗 ## Community ### Subscribe Mailing Lists Mail List is the most recognized form of communication in the Apache community. Contact us through the following mailing list. | Name | Scope | | | |:-----------------------------------------------------------|:--------------------------------|:---------------------------------------------------------|:--------------------------------------------------------------| | [[email protected]](mailto:[email protected]) | Development-related discussions | [Subscribe](mailto:[email protected]) | [Unsubscribe](mailto:[email protected]) | ### Report Issues or Submit Pull Request If you meet any questions, connect us and fix it by submitting a 🔗[Pull Request](https://github.com/apache/auron/pulls). ## License Auron is licensed under the Apache 2.0 License. A copy of the license [can be found here.](LICENSE) ", Assign "at most 3 tags" to the expected json: {"id":"11752","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"