Load Data from Amazon S3 using COPY
tip
Expected time: 5 minutes ⏱
Databend COPY can read object files(CSV or Parquet format) from Amazon S3 buckets, To understand Amazon S3’s core concepts please read Amazon S3 documentation.
Prerequisites
To complete this Quickstart, your must meet the following prerequisites:
- AWS Account: This Quickstart uses Amazon S3 and requires an AWS account’s
access key id
andsecret access key
. - Databend: You will connect to the database and using COPY to pull data from your Amazon S3 bucket, please see How to deploy Databend.
Part 1: Creating an Amazon S3 Bucket and Adding a File
- On your local machine, create a text file with the following CSV contents and name it
books.csv
:
Transaction Processing,Jim Gray,1992
Readings in Database Systems,Michael Stonebraker,2004
This CSV file field delimiter is ,
and the record delimiter is \n
.
- In S3 create a bucket and upload
books.csv
to the bucket.
Part 2: Creating a Database and Table using COPY
create database book_db;
use book_db;
create table books
(
title VARCHAR(255),
author VARCHAR(255),
date VARCHAR(255)
);
Now that the database and table have been created.
In Part 1 of this Quickstart, you uploaded the books.csv
file to your bucket.
To use the COPY data loading, you will need the following information:
- The name of the S3 URI(s3://bucket/to/path/), such as: s3://databend-bohu/data/
- Your AWS account’s access keys, such as:
- Access Key ID: your-access-key-id
- Secret Access Key: your-secret-access-key
Using this URI and keys, execute the following statement, replacing the placeholder values with your own:
copy into books
from 's3://databend-bohu/data/'
credentials=(aws_key_id='<your-access-key-id>' aws_secret_key='<your-secret-access-key>')
pattern ='.*[.]csv'
file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0);
Now, let's check the data to make sure data has actually loaded:
mysql> select * from books;
+------------------------------+----------------------+-------+
| title | author | date |
+------------------------------+----------------------+-------+
| Transaction Processing | Jim Gray | 1992 |
| Readings in Database Systems | Michael Stonebraker | 2004 |
+------------------------------+----------------------+-------+
tip
If the file(s) is large and we want to check the file format is ok to parse, we can use the SIZE_LIMIT
:
copy into books
from 's3://databend-bohu/data/'
credentials=(aws_key_id='<your-access-key-id>' aws_secret_key='<your-secret-access-key>')
pattern ='.*[.]csv'
file_format = (type = 'CSV' field_delimiter = ',' record_delimiter = '\n' skip_header = 0)
size_limit = 1; -- only load 1 rows