base on Open, Multi-modal Catalog for Data & AI <img src="./docs/assets/images/uc-logo.png" width="600px" /> # Unity Catalog: Open, Multimodal Catalog for Data & AI Unity Catalog is the industry’s only universal catalog for data and AI. - **Multimodal interface supports any format, engine, and asset** - Multi-format support: It is extensible and supports Delta Lake, Apache Iceberg and Apache Hudi via UniForm, Apache Parquet, JSON, CSV, and many others. - Multi-engine support: With its open APIs, data cataloged in Unity can be read by many leading compute engines. - Multimodal: It supports all your data and AI assets, including tables, files, functions, AI models. - **Open source API and implementation** - OpenAPI spec and OSS implementation (Apache 2.0 license). It is also compatible with Apache Hive's metastore API and Apache Iceberg's REST catalog API. Unity Catalog is currently a sandbox project with LF AI and Data Foundation (part of the Linux Foundation). - **Unified governance** for data and AI - Govern and secure tabular data, unstructured assets, and AI assets with a single interface. The current roadmap is available at [Unity Catalog Roadmap](roadmap.md). ![UC Hero Image](./docs/assets/images/uc.png) ### Vibrant ecosystem This is a community effort. Unity Catalog is supported by - [Amazon Web Services](https://aws.amazon.com/) - [Confluent](https://www.confluent.io/) - [Daft (Eventual)](https://github.com/Eventual-Inc/Daft) - [dbt Labs](https://www.getdbt.com/) - [DuckDB](https://duckdblabs.com/) - [Fivetran](https://www.fivetran.com/) - [Google Cloud](https://cloud.google.com/) - [Granica](https://granica.ai/) - [Immuta](https://www.immuta.com/) - [Informatica](https://www.informatica.com/) - [Kuzu](https://www.kuzudb.com/) - [LanceDB](https://lancedb.com/) - [LangChain](https://www.langchain.com/) - [LlamaIndex](https://www.llamaindex.ai/) - [Microsoft Azure](https://azure.microsoft.com) - [NVIDIA](https://www.nvidia.com/) - [Onehouse](https://www.onehouse.ai/) - [PuppyGraph](https://www.puppygraph.com/) - [Salesforce](https://www.salesforce.com/) - [StarRocks (CelerData)](https://celerdata.com/) - [Spice AI](https://github.com/spiceai/spiceai) - [Tecton](https://www.tecton.ai/) - [Unstructured](https://unstructured.io/) Unity Catalog is proud to be hosted by the LF AI & Data Foundation. <a href="https://lfaidata.foundation/projects"> <img src="./docs/assets/images/lfaidata-project-badge-sandbox-color.png" width="200px" /> </a> ## Quickstart - Hello UC! Let's take Unity Catalog for spin. In this guide, we are going to do the following: - In one terminal, run the UC server. - In another terminal, we will explore the contents of the UC server using a CLI. An example project is provided to demonstrate how to use the UC SDK for various assets as well as provide a convenient way to explore the content of any UC server implementation. > If you prefer to run Unity Catalog in Docker use `docker compose up`. See the [Docker Compose docs](./docs/docker_compose.md) for more details. ### Prerequisites You have to ensure that your local environment has the following: - Clone this repository. - Ensure the `JAVA_HOME` environment variable your terminal is configured to point to JDK17. - Compile the project using `build/sbt package` ### Run the UC Server In a terminal, in the cloned repository root directory, start the UC server. ```sh bin/start-uc-server ``` For the remaining steps, continue in a different terminal. ### Operate on Delta tables with the CLI Let's list the tables. ```sh bin/uc table list --catalog unity --schema default ``` You should see a few tables. Some details are truncated because of the nested nature of the data. To see all the content, you can add `--output jsonPretty` to any command. Next, let's get the metadata of one of those tables. ```sh bin/uc table get --full_name unity.default.numbers ``` You can see that it is a Delta table. Now, specifically for Delta tables, this CLI can print a snippet of the contents of a Delta table (powered by the [Delta Kernel Java](https://delta.io/blog/delta-kernel/) project). Let's try that. ```sh bin/uc table read --full_name unity.default.numbers ``` ### Operate on Delta tables with DuckDB For operating on tables with DuckDB, you will have to [install it](https://duckdb.org/docs/installation/) (version 1.0). Let's start DuckDB and install a couple of extensions. To start DuckDB, run the command `duckdb` in the terminal. Then, in the DuckDB shell, run the following commands: ```sql install uc_catalog from core_nightly; load uc_catalog; install delta; load delta; ``` If you have installed these extensions before, you may have to run `update extensions` and restart DuckDB for the following steps to work. Now that we have DuckDB all set up, let's try connecting to UC by specifying a secret. ```sql CREATE SECRET ( TYPE UC, TOKEN 'not-used', ENDPOINT 'http://127.0.0.1:8080', AWS_REGION 'us-east-2' ); ``` You should see it print a short table saying `Success` = `true`. Then we attach the `unity` catalog to DuckDB. ```sql ATTACH 'unity' AS unity (TYPE UC_CATALOG); ``` Now we are ready to query. Try the following: ```sql SHOW ALL TABLES; SELECT * from unity.default.numbers; ``` You should see the tables listed and the contents of the `numbers` table printed. To quit DuckDB, press `Ctrl`+`D` (if your platform supports it), press `Ctrl`+`C`, or use the `.exit` command in the DuckDB shell. ### Interact with the Unity Catalog UI ![UC UI](./docs/assets/images/uc-ui.png) To use the Unity Catalog UI, start a new terminal and ensure you have already started the UC server (e.g., `./bin/start-uc-server`) **Prerequisites** - Node: https://nodejs.org/en/download/package-manager - Yarn: https://classic.yarnpkg.com/lang/en/docs/install **How to start the UI through yarn** ``` cd /ui yarn install yarn start ``` ## CLI tutorial You can interact with a Unity Catalog server to create and manage catalogs, schemas and tables, operate on volumes and functions from the CLI, and much more. See the [cli usage](docs/usage/cli.md) for more details. ## APIs and Compatibility - Open API specification: See the [Unity Catalog Rest API](https://docs.unitycatalog.io/swagger-docs/). - Compatibility and stability: The APIs are currently evolving and should not be assumed to be stable. ## Building Unity Catalog Unity Catalog is built using [sbt](https://www.scala-sbt.org/). To build UC (incl. [Spark Integration](./connectors/spark) module), run the following command: ```sh build/sbt clean package publishLocal ``` Refer to [sbt docs](https://www.scala-sbt.org/1.x/docs/) for more commands. ## Deployment - To create a tarball that can be used to deploy the UC server or run the CLI, run the following: ```sh build/sbt createTarball ``` This will create a tarball in the `target` directory. See the full [deployment guide](docs/deployment.md) for more details. ## Compiling and testing - Install JDK 17 by whatever mechanism is appropriate for your system, and set that version to be the default Java version (e.g. via the env variable `JAVA_HOME`) - To compile all the code without running tests, run the following: ```sh build/sbt clean compile ``` - To compile and execute tests, run the following: ```sh build/sbt -J-Xmx2G clean test ``` - To execute tests with coverage, run the following: ```sh build/sbt -J-Xmx2G jacoco ``` - To update the API specification, just update the `api/all.yaml` and then run the following: ```sh build/sbt generate ``` This will regenerate the OpenAPI data models in the UC server and data models + APIs in the client SDK. - To format the code, run the following: ```sh build/sbt javafmtAll ``` ## Setting up IDE IntelliJ is the recommended IDE to use when developing Unity Catalog. The below steps outline how to add the project to IntelliJ: 1. Clone Unity Catalog into a local folder, such as `~/unitycatalog`. 2. Select `File` > `New Project` > `Project from Existing Sources...` and select `~/unitycatalog`. 3. Under `Import project from external model` select `sbt`. Click `Next`. 4. Click `Finish`. Java code adheres to the [Google style](https://google.github.io/styleguide/javaguide.html), which is verified via `build/sbt javafmtCheckAll` during builds. In order to automatically fix Java code style issues, please use `build/sbt javafmtAll`. ### Configuring Code Formatter for Eclipse/IntelliJ Follow the instructions for [Eclipse](https://github.com/google/google-java-format#eclipse) or [IntelliJ](https://github.com/google/google-java-format#intellij-android-studio-and-other-jetbrains-ides) to install the **google-java-format** plugin (note the required manual actions for IntelliJ). ### Using more recent JDKs The build script [checks for a lower bound on the JDK](./build.sbt#L14) but the [current SBT version](./project/build.properties) imposes an upper bound. Please check the [JDK compatibility](https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html) documentation for more information ### Serving the documentation with mkdocs For an overview of how to contribute to the documentation, please see our introduction [here](./docs/README.md). For the official documentation, please take a look at [https://docs.unitycatalog.io/](https://docs.unitycatalog.io/). ", Assign "at most 3 tags" to the expected json: {"id":"11736","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"