Skip to main content

Load Data from Amazon S3 using COPY

tip

Expected time: 5 minutes ⏱

Databend COPY can read object files(CSV or Parquet format) from Amazon S3 buckets, To understand Amazon S3’s core concepts please read Amazon S3 documentation.

Prerequisites

To complete this Quickstart, your must meet the following prerequisites:

  • AWS Account: This Quickstart uses Amazon S3 and requires an AWS account’s access key id and secret access key.
  • Databend: You will connect to the database and using COPY to pull data from your Amazon S3 bucket, please see How to deploy Databend.

Part 1: Creating an Amazon S3 Bucket and Adding a File

  1. On your local machine, create a text file with the following CSV contents and name it books.csv:
books.csv
Transaction Processing,Jim Gray,1992
Readings in Database Systems,Michael Stonebraker,2004

This CSV file field delimiter is , and the record delimiter is \n.

  1. In S3 create a bucket and upload books.csv to the bucket.

Part 2: Creating a Database and Table using COPY

create database book_db;
use book_db;
create table books
(
title VARCHAR(255),
author VARCHAR(255),
date VARCHAR(255)
);

Now that the database and table have been created. In Part 1 of this Quickstart, you uploaded the books.csv file to your bucket. To use the COPY data loading, you will need the following information:

  • The name of the S3 URI(s3://bucket/to/path/), such as: s3://databend-bohu/data/
  • Your AWS account’s access keys, such as:
    • Access Key ID: your-access-key-id
    • Secret Access Key: your-secret-access-key

Using this URI and keys, execute the following statement, replacing the placeholder values with your own:

copy into books
from 's3://databend-bohu/data/'
credentials=(aws_key_id='<your-access-key-id>' aws_secret_key='<your-secret-access-key>')
pattern ='.*[.]csv'
file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0);

Now, let's check the data to make sure data has actually loaded:

mysql> select * from books;
+------------------------------+----------------------+-------+
| title | author | date |
+------------------------------+----------------------+-------+
| Transaction Processing | Jim Gray | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+----------------------+-------+
tip

If the file(s) is large and we want to check the file format is ok to parse, we can use the SIZE_LIMIT:


copy into books
from 's3://databend-bohu/data/'
credentials=(aws_key_id='<your-access-key-id>' aws_secret_key='<your-secret-access-key>')
pattern ='.*[.]csv'
file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0)
size_limit = 1; -- only load 1 rows