Coder Social home page Coder Social logo

subhamay-cloudworks / 0052-agapanthus-cft Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 1.44 MB

Working with Glue Data Catalog and Running the Glue Crawler On Demand

aws-cloudformation aws-glue aws-glue-crawler aws-iam-policies aws-iam-roles aws-glue-data-catalog

0052-agapanthus-cft's Introduction

Project Agapanthus: Working with Glue Data Catalog and Running the Glue Crawler On Demand

A user / producer uploads a json and a csv source file to a S3 bucket. A Glue crawler is run on demand to read the files and create a Glue database and store the metadata in Glue Data Catalog.

Description

This project is a demonstration of AWS Glue crawlers to scan and create metadata definitions in the Glue Data Catalog. AWS Glue crawler connects to a data store, progresses through a prioritized list of classifiers to extract the schema of your data and other statistics, and then populates the Glue Data Catalog with this metadata. Crawlers could run periodically to detect the availability of new data as well as changes to existing data, including table definition changes. Crawlers automatically add new tables, new partitions to existing tables, and new versions of table definitions. You can customize Glue crawlers to classify your own file types.

Project Agapanthus - Design Diagram

Project Agapanthus - Services Used

Getting Started

Dependencies

  • Create a Customer Managed KMS Key in the region where you want to create the stack.
  • Modify the KMS Key Policy to let the IAM user encrypt / decrypt using any resource using the created KMS Key.

Installing

  • Clone the repository.
  • Create a S3 bucket and make it public.
  • Create the folders - 0052-agapanthus/cft/nested-stacks, 0052-agapanthus/cft/cross-stacks
  • Upload the following YAML templates to 0052-agapanthus/cft/nested-stacks
    • glue-crawler-stack.yaml
    • glue-database-stack.yaml
    • glue-iam-role-stack.yaml
    • s3-stack.yaml
  • Upload the following YAML templates to 0052-agapanthus/cft/cross-stacks
    • custom-resource-lambda-stack.yaml
  • Upload the following YAML templates to 0052-agapanthus/cft/
    • agapanthus-root-stack.yaml
  • Create the cross-stack using the template custom-resource-lambda-stack.yaml by using the S3 url and pass the appropriate parameters. Note the cross stack name which you need to pass to the root stack.
  • Create the entire using by using the root stack template agapanthus-root-stack.yaml by providing the required parameters and the s3 cross stack name created in the previous step.
  • Use the KMS Key Id you have created in your account.

Executing program

  • Upload the sample csv and json files to the s3 bucket.
  • Run the Glue Crawler.
  • Check the two tables created in the Glue database.
  • Query the data using Athena.

Help

Post message in my blog (https://blog.subhamay.com)

Authors

Contributors names and contact info

Subhamay Bhattacharyya - [email protected]

Version History

  • 0.1
    • Initial Release

License

None

Acknowledgments

Inspiration, code snippets, etc.

0052-agapanthus-cft's People

Contributors

subhamay-cloudworks avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.