googlecloudplatform / emblem Goto Github PK

Emblem Giving is a sample application that demonstrates a serverless architecture with continuous delivery, and trouble recovery. :diamond_shape_with_a_dot_inside:

License: Apache License 2.0

Python 32.09% HTML 11.42% CSS 1.72% Dockerfile 1.82% Shell 17.02% HCL 15.96% JavaScript 19.97%

serverless continuous-delivery sample-app recoverability architecture google-cloud google-cloud-run samples

emblem's People

Contributors

Stargazers

Watchers

Forkers

renovate-bot grayside engelke abdennebi-forks matyifkbt mollypi anguillanneuf swansama nicain muncus iennae rogerthatdev carlosafonso chrismarino doytsujin saffiali edgenard brian-rey linderttobias aamonten savgoustakis ptone oomti aidodev rominirani unforced chemichelle anilkumar15 wwitzel3 matut0 joristaglio shradhakhard qpc-github quantum-platinum-cloud pattishin lightpriest mcmatan mattmoss-ae snowflaketraining097 cgrotz palladius caesta-altostratus krishnaji samkenxstream averikitsch knetolicka isabella232 smeet07 mco-gh attwaben githubnair rozdolsky33 hyunuk btowner01 yuriylesyuk shrugpro jping0220 glasnt chriss-0x01

emblem's Issues

Create Terraform Continuous Deployment

Proposed Flow

Merge to Main trigger starts build which applies terraform to the staging environment, runs some tests and then publishes the results to Pub/Sub.
If the tests fail, the terraform will be reverted. If it succeeds, the message goes to Pub/Sub to apply the file in prod.
Cloud Build Pub/Sub trigger starts build and waits for human approval to apply the changes in Prod.
Changes are abandoned or applied.

(Journey) SJ14: Ensure repeatability of common tasks

Description

As a technical practitioner, in order to ensure common operations are done the same way every time by any member of the team, I would like automation, a standing execution environment, and scripts or tools (such as terraform) to manage a library of playbook actions such as deployment.

Phase 1 Requirements

All scripts are meant to run locally, for a personal dev project, or be part of a CI operation
Project represents an environment, region might need to change for disaster recovery, other things may be parameters.

Related Decision Records

What environment should run our "remote operations" that are not continuous?
Should all operations be centralized, or should component-specific operations be nested inside the component for more independence?

Possible Future Enhancements

TBD

Decision: Content API database infrastructure

This is a follow-up to #14

The Content API as prototyped in #14 uses Cloud Firestore as a persistence layer, but we should make a deliberate choice of database infrastructure. Earlier designs suggested Cloud SQL on the principle that organizations moving to Serverless have existing knowledge in MySQL or PostgreSQL, making it the boring choice.

Setup GitHub repository foundation for code & project management

We have a bare repository! Now we need to set it up for work.

Base directory structure
Baseline gitignore
Default branch protection
CODEOWNERS file to define review policies
Issue labels to facilitate triage and archeology
Issue templates for feature, bug, and journey
Initialize Decision Records log
Create "emblem-team" as collaborators

More items might be added to this list in the first ~2 weeks, after that they will be tracked as individual chore issues.

Task: style the "View Campaign

Ignore

Decision: website frontend framework

We should decide which frontend framework we want to use for the website.

@grayside has previously proposed the Material framework as an initial starting point.

Given that we're not focusing on the frontend too much and Material gives us lots of built-in widgets, I'm on board with this suggestion.

@grant does this sound OK to you?

(Journey) AJ2: Campaign Discovery

Description

As a potential contributor, in order to find an appealing giving campaign, I want to review a list of all active campaigns and read campaign details to make a decision.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

View a list of active campaigns
Click to view dedicated page with campaign details
Start a donation from campaign view in list or detail page
API can serve a list of campaigns on request

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

Page through listing of many campaigns, MVP can assume a limited number of campaigns until Campaign Creation is enabled via the UI.
Search experience to discover campaigns by title
Sorting controls, such as alphabetical by title vs. donations to date

Create "hello world" webapp

Create a basic website able to serve an unstyled "Hello World" page.

Create wireframe for "view donation" page

This "wireframe page" will be cleaned up (frontend-wise) and hooked up to the API as part of later issues.

Write Terraform file to provision project

Terraform will be used to add resources to the project, eg pub/sub topics, databases, service accounts, etc. Terraform will not be used for service deployments.

Goal: Whenever an edit to the Terraform project is merged to main, it should be executed and update the project.

Task: Style the "Create Campaign" page

We should style the Create Campaign page according to the Emblem V2 design.

(Ping Ace if you need a link to that.)

Canonical styling is available in the "List Campaigns"/Home page.

Expand the structure of the Decision Record log & process

This decision is suggested by #24

Proposal

Each decision has its own markdown file.
We have a template based on https://adr.github.io/madr/ and a naming scheme.
In the future we could add front-matter with tags if we want to put together topical "arcs"
We add a "needs decision" label to PRs and issues to indicate a decision record is expected

Problems this will solve

The initial decision record log opts for terse explanations of decisions in a single large file. This provides limited space to explore the full impact of a decision. Further, we currently have an unstructured approach to when decisions are needed, with an initial practice of filing "Decision:" feature requests and a hazy idea on "significant changes" in CONTRIBUTING.md.

Alternatives

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Create strawman implementation of the API

Create a strawman implementation of the API to illustrate our ideas for the API design.
No expectation of support for creation methods working: data may be faked.

Proposal: Release Process

This issue is created to propose a release process. I want to get some early agreement here, then will convert to a PR documenting the process in more detail.

Commits will follow Conventional Commit messages
- We will try conventional-commit-lint bot to see if it helps us follow this practice
Versioning will follow Semantic Versioning.
Changelog & release automation with release-please.
- release-please creates a PR to update changelog, tie into this PR for release smoke tests
- Human reviews the release and double-checks we shouldn't wait for any PRs in flight.
Manual steps to modify the release (automation TBD):
- Add a button to deploy the release's version
- Link to each decision made since the last release
- Link to each User Journey that changed since the last release
Releases to amplify
- If there are other release materials such as video or blog, cross-link.
- Tweet the blog post

Task: Style the error pages

We should style the error-handling pages according to the Emblem V2 design.

(Ping Ace if you need a link to that.)

Canonical styling is available in the "List Campaigns"/Home page.

Task: Style the "View Donation" page

We should style the View Donation page according to the Emblem V2 design.

(Ping Ace if you need a link to that.)

Canonical styling is available in the "List Campaigns"/Home page.

(Journey) SJ4: Deploy a change

Description

As an application developer, I want to know how to set up a continuous delivery pipeline from source control to my serverless environment and know how to detect problems before they affect my entire user base.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

After first setting up the project, it is possible to deploy infrastructure or code updates manually
When the main branch is updated, we are able to deliver updates based on code change
Delivery of code changes includes testing to make sure the change did not create an outage

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

Canary deployments

Set up hosting environment for Website (Cloud Run)

Set up staging & prod environments.

Emblem Troubleshooting Demo Script

We want an early-stage demo of Emblem to include in outreach efforts.

A minimal demo of troubleshooting:

The app is running in the Cloud
User can visit the app and attempt a donation
An error occurs (perhaps the API serves 5xx errors for donation?)
Putting on the developer hat the user dives into the console to see the error and follow it to logging and tracing to triangulate the problem

This issue captures adding this demo script to the repo

Set up renovate-bot to manage dependencies

Renovate is commonly used in Google Cloud repositories, let's add it here to get nice PRs with dependency changes.

This may require some access token wrangling.

API: Markdown API docs

Migrate existing and create new API documentation in markdown format, and maintain inside the repository

Create wireframe for "create campaign" page

(Journey) SJ18: Authenticate End Users

Description

As an application developer, in order to authenticate contributors and campaign owners, I want to know the easiest path to implement and operate a registration and user authentication system.

Phase 1 Requirements

Users visiting the webapp have a registration and login/logout flow
Content API has a basis of trust that user actions are authorized.
If "real demo users" engage with Emblem, there is no risk of PII risk from our system

Related Decision Records

#117: Choose an authorization option for the website

Possible Future Enhancements

TBD

Create wireframe for "donate" page

This "wireframe page" will be cleaned up (frontend-wise) and hooked up to the API as part of later issues.

Task: Style the "View Campaign" page

We should style the View [Individual] Campaign page according to the Emblem V2 design.

(Ping Ace if you need a link to that.)

Canonical styling is available in the "List Campaigns"/Home page.

Create `setup.sh`

Create shell script that should

Create new project
Create all the resources required to recreate the emblem application

Should use terraform for project creation and resource allocation.
Should use cloud build for deploying resources.

(Journey) SJ5: Rollback a bad change

Description

As a release manager, in order to recover from a bad deployment, I want to know how to revert my application to the previous known working release.

Phase 1 Requirements

If a deploy of the Website fails, we can automatically/quickly restore functionality by remotely executing a pre-defined operation.
If a deploy of the Content API is found to have created an outage, we can quickly restore functionality by remotely executing a pre-defined operation.
This process is extensible to add support for state/schema management
There is documentation on how & when to perform a rollback

Related Decision Records

Decisions to make include:

How will rollback happen for Cloud Run?
How will rollback happen for Cloud Functions?
What pieces are automatic? Is human action or approval always required?

Possible Future Enhancements

Add schema management to the rollback process
Add notifications, such as filing a GitHub issue about the rollback.

Create Cloud Functions (Emblem API) Continuous Deployment

Proposed Flow:

Merge to Main trigger starts build which runs tests, builds the image, uploads it to the storage solution.
The upload to the storage solution publishes a message to Pub/Sub. This is built in functionality for GCS and Artifact Registry.
A Cloud Build Pub/Sub trigger is subscribed to the storage topic. This starts a build which pushes to staging, runs integration tests, and sends the results to a new Pub/Sub Topic, which we will call “push-to-prod”. This is a custom message and will contain the location / name of the image, and a percentage of traffic to send to prod with the new image.
A second Cloud Build Pub/Sub trigger deploys a "green" deployment and directs the percentage of traffic to the new instance. This is done via green/blue deployments and is done by updating an environment variable in the Cloud Run website. It then runs more tests and sends a message back to the same pub/sub “push-to-prod” pub/sub topic with the id or tag of the revision and the new percentage of traffic to redirect ( % + 5 )

Create error-handling (Flask) routes

We should create routes for handling errors (404, 500, etc.) so we can display:

branded error-handling pages, **and **'
"teachable moment"-type instructions on how to fix the error

Decision: Use an OpenAPI specification for API design

OpenAPI is the leading system to describe REST APIs, and would open options for the following:

Code generation of client & server code
Pretty API documentation
Future integration support with Cloud API Gateway (when it adds openapiv3)

Less obvious/common things we might do:

Use it to drive a test suite (e.g., moving forward something like https://github.com/GoogleCloudPlatform/serverless-sample-tester or postmanlabs/openapi-to-postman#225)

General things we might do (not OpenAPI-specific)

API design is tracked in the code repository
Changes to the API design require multiple approvals to ensure we're all in sync

(Journey) SJ10: Create a local development environment

Description

As a technical practitioner, in order to run the application locally, I want to know how to set up my local development environment within 15 minutes of getting started.

Phase 1 Requirements

Direct command-line way to spin up all necessary services to run locally. This might mean using "feature toggles" to use in-memory storage or locally running backing services such as a database.
Method in VSCode to set up services, along the lines as the CLI requirement
Out-of-box configuration for common git and vscode behavior, included such things as recommended vscode extensions for our tech stack and gitattributes
The local setup is clearly documented without assuming existing tech stack knowledge

Related Decision Records

Why using vscode as our "default" IDE?

Possible Future Enhancements

TBD

Decision: Content API Hosting Environment

This is a follow-up to #14.

#14 uses Cloud Run as a convenient deployment strategy for a prototype, but we want to make a more deliberate choice in permanent hosting environment.

Expected candidates are Cloud Run and Cloud Functions.

(Journey) SJ17: Limit blast radius of a security exploit

Description

As an operations engineer, in order to ensure a security exploit will have limited impact on my Google Cloud resources, I want to know how to apply the principle of least privilege confidently in a confusing IAM landscape.

Phase 1 Requirements

Each component has at least one dedicated service account
Each service account has the minimum roles necessary to perform work
Editor/Owner role should never be used, except if necessary (such as with Terraform)
IAM permissions should be scoped to individual resources, for example: accessing a specific secret, invoking a specific service.

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

Use Recommendations API (e.g., gcloud recommender) to routinely check for IAM over-scoping
Use inspec.io or other auditing tool to verify the default compute account isn't in use

(Journey) AJ1: User Donates

Description

As a contributor, I make a contribution to an identified giving campaign.

Note: No financial transaction will take place.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

User can submit a form to create a donation to a campaign via the website
API allows creation of a donation that will be recorded to the database.
API has a basis to trust the named donor

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

Recurring contribution
Deferred contribution
Payment authorization workflow

(Journey) DJ2: Simulate Incidents

Description

As a demo deliverer, in order to demonstrate realistic challenges and the value of good observability instrumentation, I will emulate traffic, inject faults, and sometimes disable effective logging, error reporting, and user messages.

Phase 1 Requirements

Requests can be modified by the client to create a server error. For example, a faulty-request: true HTTP header that, if present, causes a 5xx response
The UI allows demoing with a fault on command, such as by having an extra button for "Faulty Donation" that causes a failure

Related Decision Records

TBD

Possible Future Enhancements

TBD

Task: Style the "Donate to Campaign" page

We should style the Donate to Campaign page according to the Emblem V2 design.

(Ping Ace if you need a link to that.)

Canonical styling is available in the "List Campaigns"/Home page.****

Add code style support for Python

Proposal

Follow-up #6 with code style support for Python. Per @dinagraves, use black of flake8.

Add a github action that runs lint & makes code suggestions for python code.
If we have local dev setup in place, add the same lint configuration to run locally.
For local dev, document the tools to use to align with our python style.

Problems this will solve

Keep consistent code style
Reduce human attention during reviews to address common readability challenges.

How we work: GitHub Edition / User Journeys

There are a few process challenges we need to address on how we get work done, review work, and make sure we're meeting our "business goals" (the user journeys defined in the PRD). GitHub provides enough tooling to achieve these things, but we still have some decisions to make.

Work To Be Done: Issues are a natural fit.
Releases: We'll use releases.
Release Planning: I think we should use Milestones to define the scope of current/next/future releases. Emphasis on using version numbers for release planning and text like "roadmap-2022Q1" for future roadmapping that we want to leave in the issue queue.
User Journey Progress: This is the hard one. Some user journeys can be "complete". Others can be improved but not perfected. Still others could be undone by the next change. For this reason, we need to be able to associate any given issue or PR to one or more user journeys.
Issue Metadata: Labels support type, priority, and component tagging.

Proposal

Every user journey is a sort of "meta" issue
Journeys have priority (e.g., a P0 Journey should be expedited for project health)
Closing a journey requires a "business analysis" rather than a search for open issues
Closed journeys can be reopened if something violates their requirements
Issues will reference one or more journeys (probably 2+)
When an issue is closed, that's a good moment to look in on related journeys and evaluate if there is unrecognized work to be done to "fix" the journey.

If this approach seems good I will capture some of these details into the CONTRIBUTING.md.

(Journey) SJ3: Deploy a New Instance

Description

As a technical practitioner, in order to experiment with the Emblem app, I want to easily and quickly deploy it using my own account. This should be completed within 15 minutes and set a clear expectation of wait time.

Note: 15 minutes is a ceiling and we should expect drop-off of completion or follow-up exploration beyond 5 minutes.

Phase 1 Requirements (v0.3.0)

Requiring only an empty project, a single script can be executed to bootstrap the project.
Deployment completes within 15 minutes, prefer 5-10 minutes to ensure we have space in the future

Phase 2 Requirements

Deploy with Cloud Shell makes a push-button option available
Deployment completion provides clear prompts to next steps, such as directing to filtered log views or launching a NEOS tutorial
We have deployment telemetry indicating how many unique deployed instances have been launched

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

Decision: Folder structure of the website

We should make a few decisions about the website's overall folder layout.

In #38, I propose the following:

View functions will be stored in separate views/*.py files
Flask Blueprints will be used to "import" views into the main app.py file.
NEW Templates will be stored in templates/<TEMPLATE_GROUP>/*.html files, where TEMPLATE_GROUP is a grouping of templates by purpose (e.g. errors, donations, users, etc.)

If we decide to take this route, we'll have to refactor the existing view functions. (I'm happy to do this once we agree on a path forward.)

Decision: CSS splitting

Summary

We should decide how we'll organize CSS rules for the website

Options

Site-wide CSS ✅
Site-wide CSS is easy to cache, is a good approach performance-wise, and requires no additional tooling.

However, it requires that we put all styles in a single file - which can be hard for developers to navigate through.

Site-wide + Per-page CSS 👎
Combining site-wide and per-page CSS also wouldn't require additional tooling, and is easier for developers to navigate through.

However, it's not the most performant solution - as a separate CSS file must be downloaded (and cached) for each page.

Packaged CSS 👎
We could use some pre-processing/build system to package separate CSS files into one combined site-wide file. This is easy for developers to navigate through and reason about, so long as one CSS file does not collide with any others.

(To reduce the probability such collisions, we should encourage use of as-selective-as-possible CSS. This is probably more of a "best practice" than a team-specific decision.)

However, this would require an additional layer of tooling.

Recommendations

~~Personally, I'm 👎 on the site-wide option and open to either of the other two - but I don't feel strongly about this, and would appreciate hearing others' opinions.~~

We've decided to stick to the site-wide CSS approach for now, to keep our implementation simple.

Containerize Application Components

Write Dockerfiles for API and website. Use Functions Framework for Functions.

Research GitHub Actions + Code Suggester for linters

Linting is a great way to enforce standards, but the work churn created by linting mistakes is frustrating. Investigate how to use googleapis/code-suggester with GitHub Actions to automatically "suggest" lint fixes to PRs so contributors can accept corrections with a couple clicks.

I consider this a P1 because I want to establish this as a key element of project productivity early.

Create Cloud Run Continuous Deployment Pipeline

Proposed Flow

Merge to Main trigger starts build which runs tests, builds the image, uploads it to the storage solution.
The upload to the storage solution publishes a message to Pub/Sub. This is built in functionality for GCS and Artifact Registry.
A Cloud Build Pub/Sub trigger is subscribed to the storage topic. This starts a build which pushes to staging, runs integration tests, and sends the results to a new Pub/Sub Topic, which we will call “push-to-prod”. This is a custom message and will contain the location / name of the image, and a percentage of traffic to send to prod with the new image.
A second Cloud Build Pub/Sub trigger pushes to prod and directs the percentage of traffic to the new instance. It then runs more tests and sends a message back to the same pub/sub “push-to-prod” pub/sub topic with the id or tag of the revision and the new percentage of traffic to redirect ( % + 5 )

http://go/emblem-pipeline-design#heading=h.py1hyb96xmrt

Add code style support for Terraform

Proposal

Follow-up #6 with code style support for Terraform. #6 has the github action inline, so the CI work for this should be copy-and-paste and (if needed) GitHub token management.

Add a github action that runs lint & makes code suggestions for terraform code.
If we have local dev setup in place, add the same lint configuration to run locally.
For local dev, document the tools to use to align with terraform style.

Problems this will solve

Keep consistent code style
Reduce human attention during reviews to address common readability challenges.

(Journey) SJ2: Cloud-based Software Evolution

Description

As a technical practitioner, I want to understand how to manage software change to account for the evolution of Google Cloud Products, industry recommended practices, language & framework features, application requirements, and the people on my team.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

Contributing guidelines on decision records are in CONTRIBUTING.md
An ADR Log is in place to capture decisions.
Our tech stack is clearly documented
Decide as a team the initial practice is sufficient and sustainable, or fast-follow to phase 2, perhaps pulling from the future enhancements.

Related Decision Records

<This section will be filled out as this journey's tasks pass through design>

Possible Future Enhancements

Adopt a more mature decision record practice such as MADR format (not the associated tool, which is archived) or https://github.com/thomvaill/log4brains
Build topical views of decision records to support exploring the evolution of a specific concept such as "security" or "delivery". This might also be in the form of retrospective blog posts.
Build automation around decision record creation, such as requiring them in PRs with/without a certain label or automatically filing a task to create a decision record if a PR is merged without one.
Automate the association of decision records with releases or User Journeys

Create wireframe for "view campaign details" page

(Journey) SJ1: Explore Realistic Code

Description

As a technical practitioner, in order to learn what to expect of my code, I want to explore, step debug, and read technical design documents on a more complex sample.

Phase 1 Requirements

Completing all requirements does not mean this journey can be considered "healthy", and closed, but it is an indicator that we haven't identified other tasks to be done.

Related Decision Records

TBD

Possible Future Enhancements

TBD

(Journey) SJ11: Incident Investigation

Description

As a technical practitioner, in order to address errors and outages, I want to know how to investigate a "production" problem, diagnose the cause, and communicate next steps. I expect to use logs, metrics, traces, and error reports.

Phase 1 Requirements

Logging: Use structured logging (JSON to stderr). Include component labels, severity levels, and trace IDs for request correlation
Error Reporting: Errors that represent Google Cloud or Emblem failure should use Error Reporting to file a problem. Errors the result of user action should be logged as needed but may be better represented later in metrics.

Phase 2

Tracing: Requests to all external services should be wrapped in spans and propagate the trace ID.

Related Decision Records

How we assemble structured logs
How we work with trace ID (e.g., manual/library selection)

Possible Future Enhancements

Cross-over with SJ12 and SJ13 to use built-in metrics or custom/log-based metrics to surface unexpected technical or business behaviors. (E.g., everything is working as expected, but donations are way down/consistently recorded as zero)

Create auth (log-in/out) flow

Logins and logouts should be built using Google Account-based OAuth.

googlecloudplatform / emblem Goto Github PK

emblem's People

Contributors

Stargazers

Watchers

Forkers

emblem's Issues

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Proposal

Problems this will solve

Alternatives

Additional context

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Proposal

Problems this will solve

Proposal

Description

Phase 1 Requirements (v0.3.0)

Phase 2 Requirements

Related Decision Records

Possible Future Enhancements

Summary

Options

Recommendations

Proposal

Problems this will solve

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Related Decision Records

Possible Future Enhancements

Description

Phase 1 Requirements

Phase 2

Related Decision Records

Possible Future Enhancements

Recommend Projects

Recommend Topics

Recommend Org