Coder Social home page Coder Social logo

googlecloudplatform / emblem Goto Github PK

View Code? Open in Web Editor NEW
239.0 21.0 61.0 3.4 MB

Emblem Giving is a sample application that demonstrates a serverless architecture with continuous delivery, and trouble recovery. :diamond_shape_with_a_dot_inside:

License: Apache License 2.0

Python 32.09% HTML 11.42% CSS 1.72% Dockerfile 1.82% Shell 17.02% HCL 15.96% JavaScript 19.97%
serverless continuous-delivery sample-app recoverability architecture google-cloud google-cloud-run samples

emblem's People

Contributors

anguillanneuf avatar averikitsch avatar dependabot[bot] avatar dinagraves avatar engelke avatar glasnt avatar grayside avatar iennae avatar jping0220 avatar kelsk avatar matyifkbt avatar mco-gh avatar muncus avatar pattishin avatar raffomartini avatar renovate-bot avatar rogerthatdev avatar smeet07 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emblem's Issues

Create Terraform Continuous Deployment

Proposed Flow

  • Merge to Main trigger starts build which applies terraform to the staging environment, runs some tests and then publishes the results to Pub/Sub.
  • If the tests fail, the terraform will be reverted. If it succeeds, the message goes to Pub/Sub to apply the file in prod.
  • Cloud Build Pub/Sub trigger starts build and waits for human approval to apply the changes in Prod.
  • Changes are abandoned or applied.

Terraform Pipeline

(Journey) SJ14: Ensure repeatability of common tasks

Description

As a technical practitioner, in order to ensure common operations are done the same way every time by any member of the team, I would like automation, a standing execution environment, and scripts or tools (such as terraform) to manage a library of playbook actions such as deployment.

Phase 1 Requirements

  • All scripts are meant to run locally, for a personal dev project, or be part of a CI operation
  • Project represents an environment, region might need to change for disaster recovery, other things may be parameters.

Related Decision Records

  • What environment should run our "remote operations" that are not continuous?
  • Should all operations be centralized, or should component-specific operations be nested inside the component for more independence?

Possible Future Enhancements

TBD

Decision: Content API database infrastructure

This is a follow-up to #14

The Content API as prototyped in #14 uses Cloud Firestore as a persistence layer, but we should make a deliberate choice of database infrastructure. Earlier designs suggested Cloud SQL on the principle that organizations moving to Serverless have existing knowledge in MySQL or PostgreSQL, making it the boring choice.

Setup GitHub repository foundation for code & project management

We have a bare repository! Now we need to set it up for work.

  • Base directory structure
  • Baseline gitignore
  • Default branch protection
  • CODEOWNERS file to define review policies
  • Issue labels to facilitate triage and archeology
  • Issue templates for feature, bug, and journey
  • Initialize Decision Records log
  • Create "emblem-team" as collaborators

More items might be added to this list in the first ~2 weeks, after that they will be tracked as individual chore issues.

Decision: website frontend framework

We should decide which frontend framework we want to use for the website.

@grayside has previously proposed the Material framework as an initial starting point.

Given that we're not focusing on the frontend too much and Material gives us lots of built-in widgets, I'm on board with this suggestion.

@grant does this sound OK to you?

(Journey) AJ2: Campaign Discovery

Description

As a potential contributor, in order to find an appealing giving campaign, I want to review a list of all active campaigns and read campaign details to make a decision.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

  • View a list of active campaigns
  • Click to view dedicated page with campaign details
  • Start a donation from campaign view in list or detail page
  • API can serve a list of campaigns on request

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

  • Page through listing of many campaigns, MVP can assume a limited number of campaigns until Campaign Creation is enabled via the UI.
  • Search experience to discover campaigns by title
  • Sorting controls, such as alphabetical by title vs. donations to date

Write Terraform file to provision project

Terraform will be used to add resources to the project, eg pub/sub topics, databases, service accounts, etc. Terraform will not be used for service deployments.

Goal: Whenever an edit to the Terraform project is merged to main, it should be executed and update the project.

Expand the structure of the Decision Record log & process

This decision is suggested by #24

Proposal

  1. Each decision has its own markdown file.
  2. We have a template based on https://adr.github.io/madr/ and a naming scheme.
  3. In the future we could add front-matter with tags if we want to put together topical "arcs"
  4. We add a "needs decision" label to PRs and issues to indicate a decision record is expected

Problems this will solve

The initial decision record log opts for terse explanations of decisions in a single large file. This provides limited space to explore the full impact of a decision. Further, we currently have an unstructured approach to when decisions are needed, with an initial practice of filing "Decision:" feature requests and a hazy idea on "significant changes" in CONTRIBUTING.md.

Alternatives

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Create strawman implementation of the API

Create a strawman implementation of the API to illustrate our ideas for the API design.
No expectation of support for creation methods working: data may be faked.

Proposal: Release Process

This issue is created to propose a release process. I want to get some early agreement here, then will convert to a PR documenting the process in more detail.

  • Commits will follow Conventional Commit messages
  • Versioning will follow Semantic Versioning.
  • Changelog & release automation with release-please.
    • release-please creates a PR to update changelog, tie into this PR for release smoke tests
    • Human reviews the release and double-checks we shouldn't wait for any PRs in flight.
  • Manual steps to modify the release (automation TBD):
    • Add a button to deploy the release's version
    • Link to each decision made since the last release
    • Link to each User Journey that changed since the last release
  • Releases to amplify
    • If there are other release materials such as video or blog, cross-link.
    • Tweet the blog post

(Journey) SJ4: Deploy a change

Description

As an application developer, I want to know how to set up a continuous delivery pipeline from source control to my serverless environment and know how to detect problems before they affect my entire user base.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

  • After first setting up the project, it is possible to deploy infrastructure or code updates manually
  • When the main branch is updated, we are able to deliver updates based on code change
  • Delivery of code changes includes testing to make sure the change did not create an outage

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

  • Canary deployments

Emblem Troubleshooting Demo Script

We want an early-stage demo of Emblem to include in outreach efforts.

A minimal demo of troubleshooting:

  • The app is running in the Cloud
  • User can visit the app and attempt a donation
  • An error occurs (perhaps the API serves 5xx errors for donation?)
  • Putting on the developer hat the user dives into the console to see the error and follow it to logging and tracing to triangulate the problem

This issue captures adding this demo script to the repo

API: Markdown API docs

Migrate existing and create new API documentation in markdown format, and maintain inside the repository

(Journey) SJ18: Authenticate End Users

Description

As an application developer, in order to authenticate contributors and campaign owners, I want to know the easiest path to implement and operate a registration and user authentication system.

Phase 1 Requirements

  • Users visiting the webapp have a registration and login/logout flow
  • Content API has a basis of trust that user actions are authorized.
  • If "real demo users" engage with Emblem, there is no risk of PII risk from our system

Related Decision Records

#117: Choose an authorization option for the website

Possible Future Enhancements

TBD

Create `setup.sh`

Create shell script that should

  • Create new project
  • Create all the resources required to recreate the emblem application

Should use terraform for project creation and resource allocation.
Should use cloud build for deploying resources.

(Journey) SJ5: Rollback a bad change

Description

As a release manager, in order to recover from a bad deployment, I want to know how to revert my application to the previous known working release.

Phase 1 Requirements

  • If a deploy of the Website fails, we can automatically/quickly restore functionality by remotely executing a pre-defined operation.
  • If a deploy of the Content API is found to have created an outage, we can quickly restore functionality by remotely executing a pre-defined operation.
  • This process is extensible to add support for state/schema management
  • There is documentation on how & when to perform a rollback

Related Decision Records

Decisions to make include:

  • How will rollback happen for Cloud Run?
  • How will rollback happen for Cloud Functions?
  • What pieces are automatic? Is human action or approval always required?

Possible Future Enhancements

  • Add schema management to the rollback process
  • Add notifications, such as filing a GitHub issue about the rollback.

Create Cloud Functions (Emblem API) Continuous Deployment

Proposed Flow:

  • Merge to Main trigger starts build which runs tests, builds the image, uploads it to the storage solution.
  • The upload to the storage solution publishes a message to Pub/Sub. This is built in functionality for GCS and Artifact Registry.
  • A Cloud Build Pub/Sub trigger is subscribed to the storage topic. This starts a build which pushes to staging, runs integration tests, and sends the results to a new Pub/Sub Topic, which we will call “push-to-prod”. This is a custom message and will contain the location / name of the image, and a percentage of traffic to send to prod with the new image.
  • A second Cloud Build Pub/Sub trigger deploys a "green" deployment and directs the percentage of traffic to the new instance. This is done via green/blue deployments and is done by updating an environment variable in the Cloud Run website. It then runs more tests and sends a message back to the same pub/sub “push-to-prod” pub/sub topic with the id or tag of the revision and the new percentage of traffic to redirect ( % + 5 )

image

Create error-handling (Flask) routes

We should create routes for handling errors (404, 500, etc.) so we can display:

  • branded error-handling pages, **and **'
  • "teachable moment"-type instructions on how to fix the error

Decision: Use an OpenAPI specification for API design

OpenAPI is the leading system to describe REST APIs, and would open options for the following:

  • Code generation of client & server code
  • Pretty API documentation
  • Future integration support with Cloud API Gateway (when it adds openapiv3)

Less obvious/common things we might do:

General things we might do (not OpenAPI-specific)

  • API design is tracked in the code repository
  • Changes to the API design require multiple approvals to ensure we're all in sync

(Journey) SJ10: Create a local development environment

Description

As a technical practitioner, in order to run the application locally, I want to know how to set up my local development environment within 15 minutes of getting started.

Phase 1 Requirements

  • Direct command-line way to spin up all necessary services to run locally. This might mean using "feature toggles" to use in-memory storage or locally running backing services such as a database.
  • Method in VSCode to set up services, along the lines as the CLI requirement
  • Out-of-box configuration for common git and vscode behavior, included such things as recommended vscode extensions for our tech stack and gitattributes
  • The local setup is clearly documented without assuming existing tech stack knowledge

Related Decision Records

  • Why using vscode as our "default" IDE?

Possible Future Enhancements

TBD

Decision: Content API Hosting Environment

This is a follow-up to #14.

#14 uses Cloud Run as a convenient deployment strategy for a prototype, but we want to make a more deliberate choice in permanent hosting environment.

Expected candidates are Cloud Run and Cloud Functions.

(Journey) SJ17: Limit blast radius of a security exploit

Description

As an operations engineer, in order to ensure a security exploit will have limited impact on my Google Cloud resources, I want to know how to apply the principle of least privilege confidently in a confusing IAM landscape.

Phase 1 Requirements

  • Each component has at least one dedicated service account
  • Each service account has the minimum roles necessary to perform work
  • Editor/Owner role should never be used, except if necessary (such as with Terraform)
  • IAM permissions should be scoped to individual resources, for example: accessing a specific secret, invoking a specific service.

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

  • Use Recommendations API (e.g., gcloud recommender) to routinely check for IAM over-scoping
  • Use inspec.io or other auditing tool to verify the default compute account isn't in use

(Journey) AJ1: User Donates

Description

As a contributor, I make a contribution to an identified giving campaign.

Note: No financial transaction will take place.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

  • User can submit a form to create a donation to a campaign via the website
  • API allows creation of a donation that will be recorded to the database.
  • API has a basis to trust the named donor

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

  • Recurring contribution
  • Deferred contribution
  • Payment authorization workflow

(Journey) DJ2: Simulate Incidents

Description

As a demo deliverer, in order to demonstrate realistic challenges and the value of good observability instrumentation, I will emulate traffic, inject faults, and sometimes disable effective logging, error reporting, and user messages.

Phase 1 Requirements

  • Requests can be modified by the client to create a server error. For example, a faulty-request: true HTTP header that, if present, causes a 5xx response
  • The UI allows demoing with a fault on command, such as by having an extra button for "Faulty Donation" that causes a failure

Related Decision Records

TBD

Possible Future Enhancements

TBD

Add code style support for Python

Proposal

Follow-up #6 with code style support for Python. Per @dinagraves, use black of flake8.

  • Add a github action that runs lint & makes code suggestions for python code.
  • If we have local dev setup in place, add the same lint configuration to run locally.
  • For local dev, document the tools to use to align with our python style.

Problems this will solve

  • Keep consistent code style
  • Reduce human attention during reviews to address common readability challenges.

How we work: GitHub Edition / User Journeys

There are a few process challenges we need to address on how we get work done, review work, and make sure we're meeting our "business goals" (the user journeys defined in the PRD). GitHub provides enough tooling to achieve these things, but we still have some decisions to make.

  • Work To Be Done: Issues are a natural fit.
  • Releases: We'll use releases.
  • Release Planning: I think we should use Milestones to define the scope of current/next/future releases. Emphasis on using version numbers for release planning and text like "roadmap-2022Q1" for future roadmapping that we want to leave in the issue queue.
  • User Journey Progress: This is the hard one. Some user journeys can be "complete". Others can be improved but not perfected. Still others could be undone by the next change. For this reason, we need to be able to associate any given issue or PR to one or more user journeys.
  • Issue Metadata: Labels support type, priority, and component tagging.

Proposal

  1. Every user journey is a sort of "meta" issue
  2. Journeys have priority (e.g., a P0 Journey should be expedited for project health)
  3. Closing a journey requires a "business analysis" rather than a search for open issues
  4. Closed journeys can be reopened if something violates their requirements
  5. Issues will reference one or more journeys (probably 2+)
  6. When an issue is closed, that's a good moment to look in on related journeys and evaluate if there is unrecognized work to be done to "fix" the journey.

If this approach seems good I will capture some of these details into the CONTRIBUTING.md.

(Journey) SJ3: Deploy a New Instance

Description

As a technical practitioner, in order to experiment with the Emblem app, I want to easily and quickly deploy it using my own account. This should be completed within 15 minutes and set a clear expectation of wait time.

Note: 15 minutes is a ceiling and we should expect drop-off of completion or follow-up exploration beyond 5 minutes.

Phase 1 Requirements (v0.3.0)

  • Requiring only an empty project, a single script can be executed to bootstrap the project.
  • Deployment completes within 15 minutes, prefer 5-10 minutes to ensure we have space in the future

Phase 2 Requirements

  • Deploy with Cloud Shell makes a push-button option available
  • Deployment completion provides clear prompts to next steps, such as directing to filtered log views or launching a NEOS tutorial
  • We have deployment telemetry indicating how many unique deployed instances have been launched

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

  • TBD

Decision: Folder structure of the website

We should make a few decisions about the website's overall folder layout.

In #38, I propose the following:

  • View functions will be stored in separate views/*.py files
  • Flask Blueprints will be used to "import" views into the main app.py file.
  • NEW Templates will be stored in templates/<TEMPLATE_GROUP>/*.html files, where TEMPLATE_GROUP is a grouping of templates by purpose (e.g. errors, donations, users, etc.)

If we decide to take this route, we'll have to refactor the existing view functions. (I'm happy to do this once we agree on a path forward.)

Decision: CSS splitting

Summary

We should decide how we'll organize CSS rules for the website

Options

Site-wide CSS
Site-wide CSS is easy to cache, is a good approach performance-wise, and requires no additional tooling.

However, it requires that we put all styles in a single file - which can be hard for developers to navigate through.

Site-wide + Per-page CSS 👎
Combining site-wide and per-page CSS also wouldn't require additional tooling, and is easier for developers to navigate through.

However, it's not the most performant solution - as a separate CSS file must be downloaded (and cached) for each page.

Packaged CSS 👎
We could use some pre-processing/build system to package separate CSS files into one combined site-wide file. This is easy for developers to navigate through and reason about, so long as one CSS file does not collide with any others.

(To reduce the probability such collisions, we should encourage use of as-selective-as-possible CSS. This is probably more of a "best practice" than a team-specific decision.)

However, this would require an additional layer of tooling.

Recommendations

Personally, I'm 👎 on the site-wide option and open to either of the other two - but I don't feel strongly about this, and would appreciate hearing others' opinions.

We've decided to stick to the site-wide CSS approach for now, to keep our implementation simple.

Research GitHub Actions + Code Suggester for linters

Linting is a great way to enforce standards, but the work churn created by linting mistakes is frustrating. Investigate how to use googleapis/code-suggester with GitHub Actions to automatically "suggest" lint fixes to PRs so contributors can accept corrections with a couple clicks.

I consider this a P1 because I want to establish this as a key element of project productivity early.

Create Cloud Run Continuous Deployment Pipeline

Proposed Flow

  • Merge to Main trigger starts build which runs tests, builds the image, uploads it to the storage solution.
  • The upload to the storage solution publishes a message to Pub/Sub. This is built in functionality for GCS and Artifact Registry.
  • A Cloud Build Pub/Sub trigger is subscribed to the storage topic. This starts a build which pushes to staging, runs integration tests, and sends the results to a new Pub/Sub Topic, which we will call “push-to-prod”. This is a custom message and will contain the location / name of the image, and a percentage of traffic to send to prod with the new image.
  • A second Cloud Build Pub/Sub trigger pushes to prod and directs the percentage of traffic to the new instance. It then runs more tests and sends a message back to the same pub/sub “push-to-prod” pub/sub topic with the id or tag of the revision and the new percentage of traffic to redirect ( % + 5 )

image

http://go/emblem-pipeline-design#heading=h.py1hyb96xmrt

Add code style support for Terraform

Proposal

Follow-up #6 with code style support for Terraform. #6 has the github action inline, so the CI work for this should be copy-and-paste and (if needed) GitHub token management.

  • Add a github action that runs lint & makes code suggestions for terraform code.
  • If we have local dev setup in place, add the same lint configuration to run locally.
  • For local dev, document the tools to use to align with terraform style.

Problems this will solve

  • Keep consistent code style
  • Reduce human attention during reviews to address common readability challenges.

(Journey) SJ2: Cloud-based Software Evolution

Description

As a technical practitioner, I want to understand how to manage software change to account for the evolution of Google Cloud Products, industry recommended practices, language & framework features, application requirements, and the people on my team.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

  • Contributing guidelines on decision records are in CONTRIBUTING.md
  • An ADR Log is in place to capture decisions.
  • Our tech stack is clearly documented
  • Decide as a team the initial practice is sufficient and sustainable, or fast-follow to phase 2, perhaps pulling from the future enhancements.

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

  • Adopt a more mature decision record practice such as MADR format (not the associated tool, which is archived) or https://github.com/thomvaill/log4brains
  • Build topical views of decision records to support exploring the evolution of a specific concept such as "security" or "delivery". This might also be in the form of retrospective blog posts.
  • Build automation around decision record creation, such as requiring them in PRs with/without a certain label or automatically filing a task to create a decision record if a PR is merged without one.
  • Automate the association of decision records with releases or User Journeys

(Journey) SJ1: Explore Realistic Code

Description

As a technical practitioner, in order to learn what to expect of my code, I want to explore, step debug, and read technical design documents on a more complex sample.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

  • TBD

Related Decision Records

TBD

Possible Future Enhancements

TBD

(Journey) SJ11: Incident Investigation

Description

As a technical practitioner, in order to address errors and outages, I want to know how to investigate a "production" problem, diagnose the cause, and communicate next steps. I expect to use logs, metrics, traces, and error reports.

Phase 1 Requirements

  • Logging: Use structured logging (JSON to stderr). Include component labels, severity levels, and trace IDs for request correlation
  • Error Reporting: Errors that represent Google Cloud or Emblem failure should use Error Reporting to file a problem. Errors the result of user action should be logged as needed but may be better represented later in metrics.

Phase 2

  • Tracing: Requests to all external services should be wrapped in spans and propagate the trace ID.

Related Decision Records

  • How we assemble structured logs
  • How we work with trace ID (e.g., manual/library selection)

Possible Future Enhancements

  • Cross-over with SJ12 and SJ13 to use built-in metrics or custom/log-based metrics to surface unexpected technical or business behaviors. (E.g., everything is working as expected, but donations are way down/consistently recorded as zero)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.