AWS S3

AWS S3 is an object storage service offering industry-leading scalability, data availability, security, and performance. It is read-only and supports below file formats:

The S3 Wrapper allows you to read data of below formats from S3 within your Postgres database.

CSV - with or without header line
JSON Lines
Parquet

The S3 Wrapper also supports below compression algorithms:

gzip
bzip2
xz
zlib

Note for CSV and JSONL files: currently all columns in S3 files must be defined in the foreign table and their types must be text type.

Note for Parquet files: the whole Parquet file will be loaded into local memory if it is compressed, so keep the file size as small as possible.

Supported Data Types For Parquet File

The S3 Wrapper uses Parquet file data types from arrow_array::types, below are their mappings to Postgres data types.

Postgres Type	Parquet Type
boolean	BooleanType
char	Int8Type
smallint	Int16Type
real	Float32Type
integer	Int32Type
double precision	Float64Type
bigint	Int64Type
numeric	Float64Type
text	ByteArrayType
date	Date64Type
timestamp	TimestampNanosecondType

Preparation

Before you get started, make sure the wrappers extension is installed on your database:


_10create extension if not exists wrappers with schema extensions;

and then create the foreign data wrapper:


_10create foreign data wrapper s3_wrapper
_10  handler s3_fdw_handler
_10  validator s3_fdw_validator;

Secure your credentials (optional)

By default, Postgres stores FDW credentials inide pg_catalog.pg_foreign_server in plain text. Anyone with access to this table will be able to view these credentials. Wrappers is designed to work with Vault, which provides an additional level of security for storing credentials. We recommend using Vault to store your credentials.


_14-- Save your AWS credential in Vault and retrieve the `key_id`
_14insert into vault.secrets (name, secret)
_14values (
_14  'vault_access_key_id',
_14  '<access key id>'
_14)
_14returning key_id;
_14
_14insert into vault.secrets (name, secret)
_14values (
_14  'vault_secret_access_key',
_14  '<secret access key>'
_14)
_14returning key_id;

Connecting to S3

We need to provide Postgres with the credentials to connect to S3, and any additional options. We can do this using the create server command:

The full list of options are below:

aws_access_key_id (required) - Your access key
aws_secret_access_key (required) - Your secret key
aws_region (required) - The region of your bucket (if providing an endpoint URL with a region in it, make sure that they are the same)
endpoint_url (optional) - An optional URL to allow connection to S3-compliant providers (i.e. Wasabi, Cloudflare R2, Backblaze B2, DigitalOcean Spaces)

Connecting to S3-compliant Providers - Wasabi


_10create server s3_server
_10      foreign data wrapper s3_wrapper
_10      options (
_10        aws_access_key_id 'you_wasabi_access_key',
_10        aws_secret_access_key 'your_wasabi_secret_access_key',
_10        aws_region 'eu-central-1',
_10        endpoint_url 'https://s3.eu-central-1.wasabisys.com'
_10      );

Creating Foreign Tables

The S3 Wrapper supports data reads from S3.

Integration	Select	Insert	Update	Delete	Truncate
S3	✅	❌	❌	❌	❌

For example:


_13create foreign table s3_table_csv (
_13  name text,
_13  sex text,
_13  age text,
_13  height text,
_13  weight text
_13)
_13  server s3_server
_13  options (
_13    uri 's3://bucket/s3_table.csv',
_13    format 'csv',
_13    has_header 'true'
_13  );

One file in S3 corresponds a foreign table in Postgres. For CSV and JSONL file, all columns must be present in the foreign table and type must be text. You can do custom transformations, like type conversion, by creating a view on top of the foreign table or using a subquery.

For Parquet file, no need to define all columns in the foreign table but column names must match between Parquet file and its foreign table.

Foreign table options

The full list of foreign table options are below:

uri - S3 URI, required. For example, s3://bucket/s3_table.csv
format - File format, required. csv, jsonl, or parquet
has_header - If the CSV file has header, optional. true or false, default is false
compress - Compression algorithm, optional. One of gzip, bzip2, xz, zlib, default is no compression

Query Pushdown Support

This FDW doesn't support query pushdown.

Examples

Some examples on how to use S3 foreign tables.

Basic example

This will create some "foreign table" inside your Postgres database can read data from S3:


_75-- CSV file, no compression
_75create foreign table s3_table_csv (
_75  name text,
_75  sex text,
_75  age text,
_75  height text,
_75  weight text
_75)
_75  server s3_server
_75  options (
_75    uri 's3://bucket/s3_table.csv',
_75    format 'csv',
_75    has_header 'true'
_75  );
_75
_75-- JSON line file, no compression
_75create foreign table s3_table_jsonl (
_75  name text,
_75  sex text,
_75  age text,
_75  height text,
_75  weight text
_75)
_75  server s3_server
_75  options (
_75    uri 's3://bucket/s3_table.jsonl',
_75    format 'jsonl'
_75  );
_75
_75-- GZIP compressed CSV file
_75create foreign table s3_table_csv_gzip (
_75  name text,
_75  sex text,
_75  age text,
_75  height text,
_75  weight text
_75)
_75  server s3_server
_75  options (
_75    uri 's3://bucket/s3_table.csv.gz',
_75    format 'csv',
_75    has_header 'true',
_75    compress 'gzip'
_75  );
_75
_75-- Parquet file, no compression
_75create foreign table s3_table_parquet (
_75  id integer,
_75  bool_col boolean,
_75  bigint_col bigint,
_75  float_col real,
_75  date_string_col text,
_75  timestamp_col timestamp
_75)
_75  server s3_server
_75  options (
_75    uri 's3://bucket/s3_table.parquet',
_75    format 'parquet'
_75  );
_75
_75-- GZIP compressed Parquet file
_75create foreign table s3_table_parquet_gz (
_75  id integer,
_75  bool_col boolean,
_75  bigint_col bigint,
_75  float_col real,
_75  date_string_col text,
_75  timestamp_col timestamp
_75)
_75  server s3_server
_75  options (
_75    uri 's3://bucket/s3_table.parquet.gz',
_75    format 'parquet',
_75    compress 'gzip'
_75  );

Edit this page on GitHub

AWS S3

Supported Data Types For Parquet File#

Preparation#

Secure your credentials (optional)#

Connecting to S3#

Connecting to S3-compliant Providers - Wasabi#

Creating Foreign Tables#

Foreign table options#

Query Pushdown Support#

Examples#

Basic example#