Coder Social home page Coder Social logo

redshift-spectrum-demo's Introduction

redshift-spectrum-demo

How to get the parquet data in S3 via Redshift Spectrum based on the Glue Table.

Requirements

  • Terraform 0.14.9
  • Python 3.8.7
    • pipenv

Environment

  • macOS (Catalina)

Usage

First of all, change the S3 bucket name what to store the parquet data.

// terraform.tfvars
- s3_bucket_parquet = "redshift-spectrum-demo"
+ s3_bucket_parquet = "YOUR_BUCKET_NAME"

Check the necessary infra resources and deploy to AWS cloud.

$ terraform init
$ terraform plan
$ terraform apply

Create the parquet data for testing, and upload this to S3 bucket.

$ cd utils
$ pipenv --python 3.8.7
$ pipenv install --dev
$ pipenv run python generate_parquet.py
$ aws s3 cp ../data/sample.parq s3://YOUR_BUCKET_NAME/parquet/

Check the endpoint and port of Redshift Cluster, and try to access Redshift Cluster using psql command.

$ aws redshift describe-clusters --cluster-identifier demo-redshift | jq '.Clusters[].Endpoint'
{
  "Address": "demo-redshift.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com",
  "Port": 5439
}
$ psql -h demo-redshift.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com -p 5439 -U admin -d demo
Password for user admin:
psql (13.2, server 8.0.2)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

demo=#
demo=#\q

Let's run the Glue Crawler for create a Glue Table in the Glue Catalog based on parquet data what uploaded to the S3 bucket.

$ aws glue start-crawler --name demo-crawler
$ aws glue get-table --database-name demo-database --name demo_parquet

And then, create an external schema for the Redshift Spectrum, because of connect to the Redshift cluster with the Glue Table. Don't forget to replace YOUR_ACCOUNT_ID with your account ID in create-external-schema.sql.

$ psql -h demo-redshift.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com -p 5439 -U admin -d demo < sql/create-external-schema.sql
Password for user admin:
CREATE SCHEMA

Finally, let's query the parquet data in S3 via Redshift Spectrum.

$ psql -h demo-redshift.xxxxxxxxxxxx.ap-northeast-1.redshift.amazonaws.com -p 5439 -U admin -d demo < sql/get-parquet.sql > result
$ cat result
 year |  make  |  model
------+--------+---------
 2014 | toyota | corolla
 2017 | nissan | sentra
 2018 | honda  | civic
 2020 | hyndai | nissan
(4 rows)

You have to delete AWS resources if the verification was completed.

$ terraform destroy

Author

Inseo Kim

License

MIT

redshift-spectrum-demo's People

Contributors

sano307 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.