This project contains applications required to load Snowplow data into relational databases.
RDB Shredder is a Spark job which:
- Reads Snowplow enriched events from S3
- Extracts any unstructured event JSONs and context JSONs found
- Validates that these JSONs conform to schema
- Adds metadata to these JSONs to track their origins
- Writes these JSONs out to nested folders dependent on their schema
It is designed to be run by the EmrEtlRunner immediately after the Spark Enrich job.
RDB Loader (previously known as StorageLoader) is a Scala application that runs as AWS EMR step, discovering data, produced by RDB Shredder and loading it into one of possible storage targets.
Technical Docs | Setup Guide | Roadmap & Contributing |
---|---|---|
Technical Docs | Setup Guide | coming soon |
Snowplow Relational Database Loader is copyright 2012-2017 Snowplow Analytics Ltd.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.