base on DuckDB-powered Postgres for high performance apps & analytics. <p align="center"> <picture> <source media="(prefers-color-scheme: dark)" srcset="logo-dark.svg"> <img width="800" src="logo-light.svg" alt="pg_duckdb logo" /> </picture> </p> <p align="center"> <strong>PostgreSQL extension for DuckDB</strong><br> </p> <p align="center"> <a href="https://hub.docker.com/r/pgduckdb/pgduckdb"><img src="https://img.shields.io/docker/pulls/pgduckdb/pgduckdb?style=flat-square&logo=docker" alt="Docker Pulls"></a> <a href="https://github.com/duckdb/pg_duckdb/releases"><img src="https://img.shields.io/github/v/release/duckdb/pg_duckdb?style=flat-square&logo=github" alt="GitHub Release"></a> <a href="https://github.com/duckdb/pg_duckdb/blob/main/LICENSE"><img src="https://img.shields.io/github/license/duckdb/pg_duckdb?style=flat-square" alt="License"></a> </p> --- # pg_duckdb: Official PostgreSQL Extension for DuckDB **pg_duckdb** integrates DuckDB's columnar-vectorized analytics engine into PostgreSQL, enabling high-performance analytics and data-intensive applications. Built in collaboration with [Hydra][hydra] and [MotherDuck][motherduck]. ## Key Features - **Execute analytics queries without changes**: run your existing SQL analytics queries as you normally would, and `pg_duckdb` will automatically use DuckDB's SQL engine to execute them when you set `duckdb.force_execution=true`. - **Read/write data from data lakes**: Read/write* Parquet, CSV, JSON, Iceberg & Delta Lake from S3, GCS, Azure & R2. - **Integration with cloud analytics**: Out-of-the-box support of [MotherDuck](https://motherduck.com/) as compute provider. ## How `pg_duckdb` works `pg_duckdb` automatically accelerates your existing analytical queries. - **No special syntax needed:** for queries on your regular PostgreSQL tables, you don't need to change your SQL. Just run your `SELECT` statements as you normally would, and `pg_duckdb` will use DuckDB's engine to execute them. - **No data export required:** You do not need to export your data to Parquet or any other format. `pg_duckdb` works directly with your existing PostgreSQL tables. ## See it in action ### Querying your existing PostgreSQL data This is the most common and straightforward use case. If you have a standard PostgreSQL table, you can query it using standard SQL. **Example:** Let's say you have a PostgreSQL table named `orders` (to create it, see [syntax guide](docs/gotchas_and_syntax.md#create-a-table)). To run an analytical query, you just write standard SQL, configure `duckdb.force_execution` and `pg_duckdb` will handle the rest. ```sql SET duckdb.force_execution = true; SELECT order_date, COUNT(*) AS number_of_orders, SUM(amount) AS total_revenue FROM orders GROUP BY order_date ORDER BY order_date; ``` ### Querying external data (your first data lake query) `pg_duckdb` allows you to query external files (like Parquet or CSV) as if they were tables in your database. This is perfect for querying data lakes from `pg_duckdb`. To learn more on these functions, see [read functions documentation](docs/functions.md#read-functions). ```sql -- Setup S3 access in seconds directly from SQL SELECT duckdb.create_simple_secret( type := 'S3', key_id := 'your_key', secret := 'your_secret', region := 'us-east-1' ); SELECT r['product_name'], -- 'r' is to iterate on the row object returned from read_parquet() AVG(r['rating']) AS average_rating FROM read_parquet('s3://your-bucket/reviews.parquet') r GROUP BY r['product_name'] ORDER BY average_rating DESC; ``` ### Combining PostgreSQL and DuckDB data You can easily join your PostgreSQL tables with external data from your data lake. ```sql -- Join a PostgreSQL table with a remote Parquet file SELECT o.product_name, o.total_revenue, r.average_rating FROM ( -- First, aggregate our local orders data SELECT product_name, SUM(amount) AS total_revenue FROM orders GROUP BY product_name ) o JOIN ( -- Then, aggregate our remote reviews data SELECT r['product_name'] AS product_name, AVG(r['rating']) AS average_rating FROM read_parquet('s3://your-bucket/reviews.parquet') r GROUP BY r['product_name'] ) r ON o.product_name = r.product_name ORDER BY o.total_revenue DESC; ``` ### Modern DataLake Formats Work with modern data formats like DuckLake, Iceberg and Delta Lake. To learn more, see [extensions documentation](docs/extensions.md). ```sql -- Query Apache Iceberg tables with time travel SELECT duckdb.install_extension('iceberg'); SELECT * FROM iceberg_scan('s3://warehouse/sales_iceberg', version := '2024-03-15-snapshot') -- Process Delta Lake with schema evolution SELECT duckdb.install_extension('delta'); SELECT * FROM delta_scan('s3://lakehouse/user_events') ``` ### MotherDuck integration (optional) `pg_duckdb` integrates with [MotherDuck](https://motherduck.com/), a cloud analytics platform. This allows you to run your queries on MotherDuck's powerful compute infrastructure, while still using your existing PostgreSQL tables. To learn more, see [MotherDuck documentation](docs/motherduck.md). ```sql -- Connect to MotherDuck CALL duckdb.enable_motherduck('<your_motherduck_token>'); ``` ```sql -- Your existing MotherDuck tables appear automatically SELECT region, COUNT(*) FROM my_cloud_analytics_table; -- Create cloud tables that sync across teams CREATE TABLE real_time_kpis USING duckdb AS SELECT date_trunc('day', created_at) as date, COUNT(*) as daily_signups, SUM(revenue) as daily_revenue FROM user_events GROUP BY date; ``` ## Quick Start ### Docker Run PostgreSQL with pg_duckdb pre-installed in a docker container: ```bash docker run -d -e POSTGRES_PASSWORD=duckdb pgduckdb/pgduckdb:18-v1.1.1 ``` With MotherDuck: ```bash export MOTHERDUCK_TOKEN=<your_token> docker run -d -e POSTGRES_PASSWORD=duckdb -e MOTHERDUCK_TOKEN pgduckdb/pgduckdb:18-v1.1.1 ``` ### Try with Hydra You can also get started using [Hydra][hydra]: ```bash pip install hydra-cli hydra ``` ### Package Managers **pgxman (apt):** ```bash pgxman install pg_duckdb ``` **Compile from source:** ```bash git clone https://github.com/duckdb/pg_duckdb cd pg_duckdb make install ``` *See [compilation guide](docs/compilation.md) for detailed instructions.* ## Configuration See [settings documentation](docs/settings.md) for complete configuration options. ## Documentation | Topic | Description | |-------|-------------| | [Functions](docs/functions.md) | Complete function reference | | [Syntax Guide & Gotchas](docs/gotchas_and_syntax.md) | Quick reference for common SQL patterns and things to know | | [Types](docs/types.md) | Supported data types and advanced types usage | | [MotherDuck](docs/motherduck.md) | Cloud integration guide | | [Secrets](docs/secrets.md) | Credential management | | [Extensions](docs/extensions.md) | DuckDB extension usage | | [Transactions](docs/transactions.md) | Transaction behavior | | [Compilation](docs/compilation.md) | Build from source | **Note**: Advanced DuckDB types (STRUCT, MAP, UNION) require DuckDB execution context. Use `duckdb.query()` for complex type operations and `TEMP` tables for DuckDB table creation in most cases. See [Types documentation](docs/types.md) for details. ## Performance pg_duckdb excels at: - **Analytical Workloads**: Aggregations, window functions, complex JOINs - **Data Lake Queries**: Scanning large Parquet/CSV files - **Mixed Workloads**: Combining OLTP (PostgreSQL) with OLAP (DuckDB) - **ETL Pipelines**: Transform and load data at scale ## Contributing We welcome contributions! Please see: - [Contributing Guidelines](CONTRIBUTING.md) - [Code of Conduct](CODE_OF_CONDUCT.md) - [Project Milestones][milestones] for upcoming features - [Discussions][discussions] for feature requests - [Issues][issues] for bug reports - [Join the DuckDB Discord community](https://discord.duckdb.org/) then chat in [the #pg_duckdb channel](https://discord.com/channels/909674491309850675/1289177578237857802). ## Support - **Documentation**: [Complete documentation][docs] - **Community**: [DuckDB Discord #pg_duckdb channel](https://discord.com/channels/909674491309850675/1289177578237857802) - **Issues**: [GitHub Issues][issues] - **Commercial**: [Hydra][hydra] and [MotherDuck][motherduck] offer commercial support ## Requirements - **PostgreSQL**: 14, 15, 16, 17, 18 - **Operating Systems**: Ubuntu 22.04-24.04, macOS ## License Licensed under the [MIT License](LICENSE). --- <p align="center"> <strong>Built with ❤️</strong><br> in collaboration with <a href="https://hydra.so">Hydra</a> and <a href="https://motherduck.com">MotherDuck</a> </p> [milestones]: https://github.com/duckdb/pg_duckdb/milestones [discussions]: https://github.com/duckdb/pg_duckdb/discussions [issues]: https://github.com/duckdb/pg_duckdb/issues [docs]: https://github.com/duckdb/pg_duckdb/tree/main/docs [hydra]: https://hydra.so/ [motherduck]: https://motherduck.com/ ", Assign "at most 3 tags" to the expected json: {"id":"11464","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"