AI prompts
base on DuckDB-powered Postgres for high performance apps & analytics. # pg_duckdb: Official Postgres extension for DuckDB
pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications.
pg_duckdb was developed in collaboration with our partners, [Hydra](https://hydra.so) and [MotherDuck](https://motherduck.com).
## Goals
* Broad support for Postgres types and functions within analytical context
* Query external datasets on S3, GCP and Azure (Parquet, Iceberg and Delta Lake)
* Join native Postgres and analytical data
* Direct access to data stored in MotherDuck
* Support installation of DuckDB extensions
## Installation
Pre-built binaries and additional installation options are coming soon.
To build pg_duckdb, you need:
* Postgres 16 or 17
* Ubuntu 22.04 or MacOS
* Standard set of build tools for building Postgres extensions
* [Build tools that are required to build DuckDB](https://duckdb.org/docs/dev/building/build_instructions)
To build and install, run:
```sh
make install
```
Add `pg_duckdb` to the `shared_preload_libraries` in your `postgresql.conf` file:
```ini
shared_preload_libraries = 'pg_duckdb'
```
Next, create the `pg_duckdb` extension:
```sql
CREATE EXTENSION pg_duckdb;
```
**IMPORTANT:** DuckDB execution is usually enabled automatically when needed. It's enabled whenever you use DuckDB functions (such as `read_csv`), when you query DuckDB tables, and when running `COPY table TO 's3://...'`. However, if you want queries which only touch Postgres tables to use DuckDB execution you need to run `SET duckdb.force_execution TO true`'. This feature is _opt-in_ to avoid breaking existing queries. To avoid doing that for every session, you can configure it for a certain user by doing `ALTER USER my_analytics_user SET duckdb.force_execution TO true`.
## Features
- `SELECT` queries executed by the DuckDB engine can directly read Postgres tables. (If you only query Postgres tables you need to run `SET duckdb.force_execution TO true`, see the **IMPORTANT** section above for details)
- Able to read [data types](https://www.postgresql.org/docs/current/datatype.html) that exist in both Postgres and DuckDB. The following data types are supported: numeric, character, binary, date/time, boolean, uuid, json, and arrays.
- If DuckDB cannot support the query for any reason, execution falls back to Postgres.
- Read and Write support for object storage (AWS S3, Cloudflare R2, or Google GCS):
- Read parquet and CSV files:
- `SELECT n FROM read_parquet('s3://bucket/file.parquet') AS (n int)`
- `SELECT n FROM read_csv('s3://bucket/file.csv') AS (n int)`
- You can pass globs and arrays to these functions, just like in DuckDB
- Enable the DuckDB Iceberg extension using `SELECT duckdb.install_extension('iceberg')` and read Iceberg files with `iceberg_scan`.
- Write a query — or an entire table — to parquet in object storage.
- `COPY (SELECT foo, bar FROM baz) TO 's3://...'`
- `COPY table TO 's3://...'`
- Read and write to Parquet format in a single query
```sql
COPY (
SELECT count(*), name
FROM read_parquet('s3://bucket/file.parquet') AS (name text)
GROUP BY name
ORDER BY count DESC
) TO 's3://bucket/results.parquet';
```
- Read and Write support for data stored in MotherDuck
- Query and `JOIN` data in object storage/MotherDuck with Postgres tables, views, and materialized views.
- Create indexes on Postgres tables to accelerate your DuckDB queries
- Install DuckDB extensions using `SELECT duckdb.install_extension('extension_name');`
- Toggle DuckDB execution on/off with a setting:
- `SET duckdb.force_execution = true|false`
- Cache remote object locally for faster execution using `SELECT duckdb.cache('path', 'type');` where
- 'path' is HTTPFS/S3/GCS/R2 remote object
- 'type' specify remote object type: 'parquet' or 'csv'
## Getting Started
pg_duckdb relies on DuckDB's vectorized execution engine to read and write data to object storage bucket (AWS S3, Cloudflare R2, or Google GCS) and/or MotherDuck. The follow two sections describe how to get started with these destinations.
### Object storage bucket (AWS S3, Cloudflare R2, or Google GCS)
Querying data stored in Parquet, CSV, and Iceberg format can be done with `read_parquet`, `read_csv`, and `iceberg_scan` respectively.
1. Add a credential to enable DuckDB's httpfs support.
```sql
-- Session Token is Optional
INSERT INTO duckdb.secrets
(type, key_id, secret, session_token, region)
VALUES ('S3', 'access_key_id', 'secret_access_key', 'session_token', 'us-east-1');
```
2. Copy data directly to your bucket - no ETL pipeline!
```sql
COPY (SELECT user_id, item_id, price, purchased_at FROM purchases)
TO 's3://your-bucket/purchases.parquet;
```
3. Perform analytics on your data.
```sql
SELECT SUM(price) AS total, item_id
FROM read_parquet('s3://your-bucket/purchases.parquet')
AS (price float, item_id int)
GROUP BY item_id
ORDER BY total DESC
LIMIT 100;
```
### Connect with MotherDuck
pg_duckdb also integrates with [MotherDuck][md]. To enable this support you first need to [generate an access token][md-access-token] and then add the following line to your `postgresql.conf` file:
```ini
duckdb.motherduck_token = 'your_access_token'
```
NOTE: If you don't want to store the token in your `postgresql.conf`file can also store the token in the `motherduck_token` environment variable and then explicitly enable MotherDuck support in your `postgresql.conf` file:
```ini
duckdb.motherduck_enabled = true
```
If you installed `pg_duckdb` in a different Postgres database than the default one named `postgres`, then you also need to add the following line to your `postgresql.conf` file:
```ini
duckdb.motherduck_postgres_database = 'your_database_name'
```
After doing this (and possibly restarting Postgres). You can then you create tables in the MotherDuck database by using the `duckdb` [Table Access Method][tam] like this:
```sql
CREATE TABLE orders(id bigint, item text, price NUMERIC(10, 2)) USING duckdb;
CREATE TABLE users_md_copy USING duckdb AS SELECT * FROM users;
```
[tam]: https://www.postgresql.org/docs/current/tableam.html
Any tables that you already had in MotherDuck are automatically available in Postgres. Since DuckDB and MotherDuck allow accessing multiple databases from a single connection and Postgres does not, we map database+schema in DuckDB to a schema name in Postgres.
This is done in the following way:
1. Each schema in your default MotherDuck database are simply merged with the Postgres schemas with the same name.
2. Except for the `main` DuckDB schema in your default database, which is merged with the Postgres `public` schema.
3. Tables in other databases are put into dedicated DuckDB-only schemas. These schemas are of the form `ddb$<duckdb_db_name>$<duckdb_schema_name>` (including the literal `$` characters).
4. Except for the `main` schema in those other databases. That schema should be accessed using the shorter name `ddb$<db_name>` instead.
An example of each of these cases is shown below:
```sql
INSERT INTO my_table VALUES (1, 'abc'); -- inserts into my_db.main.my_table
INSERT INTO your_schema.tab1 VALUES (1, 'abc'); -- inserts into my_db.your_schema.tab1
SELECT COUNT(*) FROM ddb$my_shared_db.aggregated_order_data; -- reads from my_shared_db.main.aggregated_order_data
SELECT COUNT(*) FROM ddb$sample_data$hn.hacker_news; -- reads from sample_data.hn.hacker_news
```
[md]: https://motherduck.com/
[md-access-token]: https://motherduck.com/docs/key-tasks/authenticating-and-connecting-to-motherduck/authenticating-to-motherduck/#authentication-using-an-access-token
## Roadmap
Please see the [project roadmap][roadmap] for upcoming planned tasks and features.
## Contributing
pg_duckdb was developed in collaboration with our partners, [Hydra](https://hydra.so) and [MotherDuck](https://motherduck.com). We look forward to their continued contributions and leadership.
[Hydra](https://hydra.so) is a Y Combinator-backed database company, focused on DuckDB-Powered Postgres for app developers.
[MotherDuck](https://motherduck.com) is the cloud-based data warehouse that extends the power of DuckDB.
We welcome all contributions big and small:
- Vote on or suggest features for our roadmap.
- Open a PR.
- Submit a feature request or bug report.
## Resources
- Please see the [project roadmap][roadmap] for upcoming planned tasks and features.
- [GitHub Issues](https://github.com/duckdb/pg_duckdb/issues) for bugs and missing features
- [Discord discussion](https://discord.duckdb.org/) with the DuckDB community
- See our docs for more info and limitations
[roadmap]: https://github.com/orgs/duckdb/projects/10
", Assign "at most 3 tags" to the expected json: {"id":"11464","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"