Coder Social home page Coder Social logo

bcgov / gdx-analytics-microservice Goto Github PK

View Code? Open in Web Editor NEW
1.0 7.0 0.0 915 KB

The collection of GDX-Analytics Python microservices used to load and process data between systems and services.

License: Apache License 2.0

Python 88.62% PLpgSQL 11.38%
citz python automation microservices pipenv gdx-analytics

gdx-analytics-microservice's Introduction

GDX Analytics Microservice

img


The GDX Analytics microservice repository is the working space for the suite of Python based microservices supporting data retrieval, processing, loading, and other handling as part of our workflow.

Features

This repository has been structured to support packaging and distrubution. Pipenv is required as a dependency manager, and Python 3.7 to build the virtual environments. Each microservice script is stored in a subdirectory, and each subdirectory contains a README file detailing how to run the microservice. A folder named lib provides shared component modules. Under each microservice, lib gets installed into the Pipenv as an editable package, and then is imported by its relative path.

Project Status

This project is currently under development and actively supported by the GDX Analytics Team.

Contents by Directory:

The S3 to Redshift microservice will read the config json to determine the input data location, it's content and how to process that (including column data types, content replacements, datetime formats), and where to output the results (the Redshift table). Each processed file will land in a <bucket>/processed/ folder in S3, which can be /processed/good/ or /processed/bad/ depending on the success or failure of processing the input file. The Redshift COPY command is performed as a single transaction which will not commit the changes unless they are successful in the transaction.

The CMS Lite Metadata microservice emerged from a specialized use case of the S3 to Redshift microservice which required additional logic to build Lookup tables and Dictionary tables, as indicated though input data columns containing nested delimiters. To do so, it processes a single input csv file containing metadata about pages in CMS Lite, to generate several batch CSV files as a batch process. It then runs the COPY command on all of these files as a single Redshift transaction. As with the S3 to Redshift Microservice, The json configuration files specify the expected form of input data and output options.

The Google API microservices are a collection of scripts to automate the loading of data collected through various Google APIs such as the Google My Business API for Location and Driving Direction insights; and the Google Search Console API for Search result analytics. Upon accessing the requested data, the Google API microservices build an output csv file containing that data, and stores it into S3. From there, the loading of data from S3 to Redshift follows very closely to the flow described in the S3 to Redshift microservice.

The /sfts folder contains the Secure File Transfer System microservice. This was configured first to support Performance Management and Reporting Program (PMRP) data exchange. This microservice is triggered to run after the successful transfer of PMRP date into Redshift. The microservice first generates an object in S3 from the output of a Redshift transaction modelling PMRP data with other GDX Analytics data, and then transfers that object from S3 to an upload location on BCGov's Secure File Transfer Service. The microservice is two scripts; one to generate the objets in S3 based on Redshift queries (redshift_to_s3.py) and one to transfer previously un-transferred files from S3 to SFTS (s3_to_sfts.py).

The /lib folder contains the common components. As our microservices grow we are aiming to create shared patterns of use across them, and then modularize those shared patterns as reusable code. Eventually the components package may comprise a packaged application.

Related Repositories

  • This is the central repository for work by the GDX Analytics Team.

Getting Help or Reporting an Issue

For inquiries about starting a new analytics account please contact the GDX Analytics Team.

How to Contribute

If you would like to contribute, please see our CONTRIBUTING guideleines.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

License

Copyright 2015 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.

gdx-analytics-microservice's People

Contributors

bjenning avatar bradleyoldham avatar cnburnett avatar danpollock avatar dependabot[bot] avatar doughon avatar lclarko avatar taylorfries avatar veenupunyani avatar vikasvermacertain avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

gdx-analytics-microservice's Issues

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 181 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Add missing topics

TL;DR

Topics greatly improve the discoverability of repos; please add the short code from the table below to the topics of your repo so that ministries can use GitHub's search to find out what repos belong to them and other visitors can find useful content (and reuse it!).

Why Topic

In short order we'll add our 800th repo. This large number clearly demonstrates the success of using GitHub and our Open Source initiative. This huge success means it's critical that we work to make our content as discoverable as possible. Through discoverability, we promote code reuse across a large decentralized organization like the Government of British Columbia as well as allow ministries to find the repos they own.

What to do

Below is a table of abbreviation a.k.a short codes for each ministry; they're the ones used in all @gov.bc.ca email addresses. Please add the short codes of the ministry or organization that "owns" this repo as a topic.

add a topic

That's it, you're done!!!

How to use

Once topics are added, you can use them in GitHub's search. For example, enter something like org:bcgov topic:citz to find all the repos that belong to Citizens' Services. You can refine this search by adding key words specific to a subject you're interested in. To learn more about searching through repos check out GitHub's doc on searching.

Pro Tip ๐Ÿค“

  • If your org is not in the list below, or the table contains errors, please create an issue here.

  • While you're doing this, add additional topics that would help someone searching for "something". These can be the language used javascript or R; something like opendata or data for data only repos; or any other key words that are useful.

  • Add a meaningful description to your repo. This is hugely valuable to people looking through our repositories.

  • If your application is live, add the production URL.

Ministry Short Codes

Short Code Organization Name
AEST Advanced Education, Skills & Training
AGRI Agriculture
ALC Agriculture Land Commission
AG Attorney General
MCF Children & Family Development
CITZ Citizens' Services
DBC Destination BC
EMBC Emergency Management BC
EAO Environmental Assessment Office
EDUC Education
EMPR Energy, Mines & Petroleum Resources
ENV Environment & Climate Change Strategy
FIN Finance
FLNR Forests, Lands, Natural Resource Operations & Rural Development
HLTH Health
IRR Indigenous Relations & Reconciliation
JEDC Jobs, Economic Development & Competitiveness
LBR Labour Policy & Legislation
LDB BC Liquor Distribution Branch
MMHA Mental Health & Addictions
MAH Municipal Affairs & Housing
BCPC Pension Corporation
PSA Public Service Agency
PSSG Public Safety and Solicitor General
SDPR Social Development & Poverty Reduction
TCA Tourism, Arts & Culture
TRAN Transportation & Infrastructure

NOTE See an error or omission? Please create an issue here to get it remedied.

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 180 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Lets use common phrasing

TL;DR ๐ŸŽ๏ธ

Teams are encouraged to favour modern inclusive phrasing both in their communication as well as in any source checked into their repositories. You'll find a table at the end of this text with preferred phrasing to socialize with your team.

Words Matter

We're aligning our development community to favour inclusive phrasing for common technical expressions. There is a table below that outlines the phrases that are being retired along with the preferred alternatives.

During your team scrum, technical meetings, documentation, the code you write, etc. use the inclusive phrasing from the table below. That's it - it really is that easy.

For the curious mind, the Public Service Agency (PSA) has published a guide describing how Words Matter in our daily communication. Its an insightful read and a good reminder to be curious and open minded.

What about the master branch?

The word "master" is not inherently bad or non-inclusive. For example people get a masters degree; become a master of their craft; or master a skill. It's generally when the word "master" is used along side the word "slave" that it becomes non-inclusive.

Some teams choose to use the word main for the default branch of a repo as opposed to the more commonly used master branch. While it's not required or recommended, your team is empowered to do what works for them. If you do rename the master branch consider using main so that we have consistency among the repos within our organization.

Preferred Phrasing

Non-Inclusive Inclusive
Whitelist => Allowlist
Blacklist => Denylist
Master / Slave => Leader / Follower; Primary / Standby; etc
Grandfathered => Legacy status
Sanity check => Quick check; Confidence check; etc
Dummy value => Placeholder value; Sample value; etc

Pro Tip ๐Ÿค“

This list is not comprehensive. If you're aware of other outdated nomenclature please create an issue (PR preferred) with your suggestion.

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 371 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.