Coder Social home page Coder Social logo

doc-pipeline's Introduction

Google Cloud Platform document pipeline

doc-pipeline converts DocFX YAML to HTML. You can run it locally, but it is set up to run periodically to generate new docs in production/staging/dev.

doc-pipeline uses docker-ci-helper to facilitate running Docker. See the instructions below for how to test and run locally.

doc-pipeline also depends on docuploader to compress and upload a directory to Google Cloud Storage.

Using doc-pipeline

doc-pipeline is only for converting DocFX YAML to HTML suitable for cloud.google.com.

You can generate DocFX YAML using language-specific generators.

What to do during library releases

Here is how to use doc-pipeline. All of the steps except the credential setup should be automated/scripted as part of the release process.

  1. Fetch the credentials to be able to upload to the bucket. Add the following to your Kokoro build config:

    before_action {
       fetch_keystore {
          keystore_resource {
             keystore_config_id: 73713
             keyname: "docuploader_service_account"
          }
       }
    }
    
  2. Generate DocFX YAML. Usually, this is done as part of the library release process.

    • Note: You must include a toc.yml file. However, do not include a docfx.json file because doc-pipeline generates one for you.
    • You can check the HTML looks OK by running the pipeline locally (see below).
  3. Change to the directory that contains all of the .yml files.

  4. Create a docs.metadata file in the same directory as the YAML:

    docuploader create-metadata
    

    Add flags to specify the language, package, version etc. See docuploader.

  5. Upload the YAML with the docfx prefix:

    docuploader upload --staging-bucket docs-staging-v2-staging --destination-prefix docfx .
    

    There is also docs-staging-v2 (production) and docs-staging-v2-dev (development). Use -staging until your HTML format is confirmed to be correct.

  6. That's it! doc-pipeline periodically runs, generates the HTML for new docfx-* tarballs, and uploads the resulting HTML to the same bucket. The HTML has the same name as the DocFX tarball, except it doesn't have the docfx prefix.

Cross references

DocFX supports cross references using xrefmap files. Each file maps a UID to the URL for that object. The xref map files are automatically generated when DocFX generates docs. One generation job can refer to other xref map files to be able to link to those objects.

Here's how it works in doc-pipeline:

  1. When we convert the YAML to HTML, we upload two things:
    1. The resulting HTML content (in a tarball).
    2. The xref map file to the xrefs directory of the bucket. You can see them all using gsutil ls gs://docs-staging-v2/xrefs.
  2. You can use the xref-services argument for docuploader create-metadata to refer to cross reference services.
  3. If one package wants to use the xref map from another doc-pipeline package, you need to configure it. Use the xrefs argument of docuploader create-metadata to specify the xref map files you need. Use the following format:
    • devsite://lang/library[@version]: If no version is given, the SemVer latest is used. For example, devsite://dotnet/[email protected] would lead to the xref map at gs://docs-staging-v2/xrefs/dotnet-my-pkg-1.0.0.tar.gz.yml. devsite://dotnet/my-pkg would get the latest version of my-pkg.
  4. doc-pipeline will then download and use the specified xref maps. If an xref map cannot be found, a warning is logged, but the build does not fail. Because of this, you can generate docs that depend on each other in any order. If the dependency doesn't exist yet, that's OK, the next regen will pick it up.

How to regenerate the HTML

You can regenerate all HTML by setting FORCE_GENERATE_ALL=true when triggering the job.

You can regenerate the HTML for a single blob by setting SOURCE_BLOB=docfx-lang-pkg-version.tgz when triggering the job.

If you want to use a different bucket than the default, set SOURCE_BUCKET.

Deleting old content

You can delete old tarballs using the delete-blob job. Trigger the job with the BLOB_TO_DELETE environment variable set to the full name of the blob you want to delete, for example gs://my-bucket/lang-library-1.0.0.tar.gz.

Be sure to delete the docfx- and non-docfx- tarballs! Also, after deleting the tarball, be sure to delete the content on the site; it is not automatically deleted.

Development

Environment variables

See .trampolinerc for the canonical list of relevant environment variables.

  • TESTING_BUCKET: Set when running tests. See the Testing section.
  • SOURCE_BUCKET: The bucket to use for regeneration. See Running locally.
  • SOURCE_BLOB: A single blob to regenerate. Only the blob name - do not include gs:// or the bucket.
  • LANGUAGE: Regenerates all docs under specified language. For example: LANGUAGE=dotnet
  • FORCE_GENERATE_ALL: Set to true to regenerate all docs.
  • FORCE_GENERATE_LATEST: Set to true to regenerate all latest versions of docs.
  • BLOB_TO_DELETE: Blob to delete from storage. Include full bucket and object name. For example: gs://my-bucket/docfx-python-test-tarball.tar.gz

Formatting and style

Formatting is done with black and style is verified with flake8.

You can check everything is correct by running:

black --check docpipeline tests
flake8 docpipeline tests

If a file is not properly formatted, you can fix it with:

black docpipeline tests

Testing

  1. Create a testing Cloud Storage bucket (my-bucket).
  2. Run the tests.
    • Docker:
      1. Copy a service account with permission to access my-bucket to /dev/shm/73713_docuploader_service_account.
      2. Run the following command, replacing my-bucket with your development bucket:
        TEST_BUCKET=my-bucket TRAMPOLINE_BUILD_FILE=./ci/run_tests.sh TRAMPOLINE_IMAGE=gcr.io/cloud-devrel-kokoro-resources/docfx TRAMPOLINE_DOCKERFILE=docfx/Dockerfile ci/trampoline_v2.sh
        
      3. To update goldens, add the UPDATE_GOLDENS=1 environment variable.
    • Local:
      1. pip install -e .
      2. pip install pytest
      3. black --check tests
        flake8 tests
        GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json TEST_BUCKET=my-bucket pytest tests
        
      4. To update goldens add the --update-goldens flag:
        pytest --update-goldens tests
        

Running locally for one package

  1. Create a directory inside doc-pipeline (for example, my-dir).
  2. Create a docs.metadata file in my-dir. You can copy one from here.
  3. Move or copy the .yml files for one package to my-dir.
  4. Run the following command, replacing my-dir with your directory name:
    INPUT=my-dir TRAMPOLINE_BUILD_FILE=./generate.sh TRAMPOLINE_IMAGE=gcr.io/cloud-devrel-kokoro-resources/docfx TRAMPOLINE_DOCKERFILE=docfx/Dockerfile ci/trampoline_v2.sh
    
  5. The script runs docfx build over the package in my-dir, and places the resulting HTML inside a subdirectory in my-dir. The subdirectory is named after the package name found in the metadata.
  6. Note: running through this method will skip on processing xrefs.

Running locally with Cloud Storage bucket

  1. Create a Cloud Storage bucket and add a docfx-*.tgz file. For example:
    gsutil cp gs://docs-staging-v2-staging/docfx-nodejs-scheduler-2.1.1.tar.gz gs://my-bucket
    
  2. Copy a service account with permission to access my-bucket to /dev/shm/73713_docuploader_service_account.
  3. Run the following command, replacing my-bucket with your development bucket:
    SOURCE_BUCKET=my-bucket TRAMPOLINE_BUILD_FILE=./generate.sh TRAMPOLINE_IMAGE=gcr.io/cloud-devrel-kokoro-resources/docfx TRAMPOLINE_DOCKERFILE=docfx/Dockerfile ci/trampoline_v2.sh
    
  4. The script downloads the tarball, runs docfx build, and uploads the result.
  5. You can download the resulting HTML .tgz file, unpack it, inspect a few files (you should see <html devsite=""> at the top), and try staging it to confirm it looks OK.

doc-pipeline's People

Contributors

alicejli avatar coryan avatar dandhlee avatar dansaadati avatar dazuma avatar dependabot[bot] avatar dzlier-gcp avatar eaball35 avatar fhinkel avatar google-cloud-policy-bot[bot] avatar jskeet avatar justinbeckwith avatar renovate-bot avatar tbpg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doc-pipeline's Issues

Use stem configured in docs.metadata

The xref base URL assumes the default stem. There may be other places.

def get_base_url(language, name):
# The baseUrl must start with a scheme and domain. With no scheme, docfx
# assumes it's a file:// link.
base_url = f"https://cloud.google.com/{language}/docs/reference/" + f"{name}/"
# Help packages should not include the version in the URL.
if name != "help":
base_url += "latest/"
return base_url

Re-instantiate blobs if we run into 404 issues

@cojenco helped looked into the issue where if by the time the Blob is instantiated and when we download tarballs inside the Blob using blob.download_to_filename(), it could fail if there is a "newer" version of the file that's re-uploaded. There is a generation parameter that gets included by default for the storage query. This can be avoided by re-instantiating the Blob if we run into a 404, which should minimize the amount of times we could fail with 404s.

Automatically file issues for failing tarballs

When the pipeline fails to process a tarball, we should automatically file an issue.

While we iron out the notifications, issues should be filed on this repo.

We can use flakybot to manage the issues.

Thoughts:

  • Does the docs.metadata include the repo?
  • We'd need to generate xUnit XML with a "fake" test case for each tarball.
  • Each tarball can be in a different repo. Do we need a different flakybot invocation for each repo, or can we update flakybot to magically handle that for us?
    • I'm leaning toward separate invocations -- the complexity stems from this repo and has not been needed on flakybot thus far (that I know of).
  • Can we depend on the fact tarballs are lazily generated? If a tarball fails to generate, can we always be sure that when it succeeds in the future, we'll tell flakybot about it?
    • What if we update the template, try to regenerate the HTML, and it fails? The pipeline won't automatically rebuild the tarball since the HTML already exists, so will it never self-heal and close the issue on its own? This could be really annoying for library owners.
  • What if the failure is caused by the pipeline? Should we notify the source repo? The doc-pipeline owners should be cc'd on all issues filed by this bot when the issue is filed on another repo.
    • If more than N tarballs fail to process, we should assume it's doc-pipeline's fault, not the tarball.

Only fetch the exact xrefmap files needed for the current build

Tarballs can specify the xrefmaps they need using the xrefs field in docs.metadata. Let's use that field to specify the exact xrefmap files needed for the current build, rather than downloading every xrefmap for every build.

@jskeet came up with:

devsite://dotnet/Google.Api.Gax/2.5.0

We can convert that to an xrefmap by removing devsite://, replacing the first and last / with -, and adding .tar.gz.yml at the end. It will be an error if that xrefmap does not exist.

Another benefit of this is that one library can have multiple versions. Each version will have its own xrefmap. If every xrefmap is pulled in, there will be multiple xrefmap files that register the same UIDs. Plus, when we support multiple versions, we'll need to use just the right version of the xrefmap as URLs will be different.

Finally, this will benefit libraries without xrefs because they won't need to download anything.

@jskeet will implement the change to the dotnet libraries. I will implement the change to doc-pipeline.

Build failing due to missing six dependency

ImportError while importing test module '/workspace/tests/test_generate.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/local/lib/python3.9/importlib/__init__.py:127: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/test_generate.py:25: in <module>
    from google.cloud import storage
/h/.local/lib/python3.9/site-packages/google/cloud/storage/__init__.py:35: in <module>
    from google.cloud.storage.batch import Batch
/h/.local/lib/python3.9/site-packages/google/cloud/storage/batch.py:27: in <module>
    import six
E   ModuleNotFoundError: No module named 'six'

Let's add an explicit dependency to fix the build for now.

Update published xref maps with correctly versioned URL

Once we have multi-version docs published, we should update our xref maps and regenerate content.

def get_base_url(language, name):
# The baseUrl must start with a scheme and domain. With no scheme, docfx
# assumes it's a file:// link.
base_url = f"https://cloud.google.com/{language}/docs/reference/" + f"{name}/"
# Help packages should not include the version in the URL.
if name != "help":
base_url += "latest/"
return base_url

generate: many tests failed

Many tests failed at the same time in this package.

  • I will close this issue when there are no more failures in this package and
    there is at least one pass.
  • No new issues will be filed for this package until this issue is closed.
  • If there are already issues for individual test cases, I will close them when
    the corresponding test passes. You can close them earlier, if you prefer, and
    I won't reopen them while this issue is still open.

Here are the tests that failed:

  • docfx-java-google-cloud-aiplatform-0.3.0.tar.gz
  • docfx-java-google-cloud-assured-workloads-0.3.1.tar.gz
  • docfx-java-google-cloud-bigquery-1.127.5.tar.gz
  • docfx-java-google-cloud-bigquery-1.127.6.tar.gz
  • docfx-java-google-cloud-compute-0.119.6-alpha.tar.gz
  • docfx-java-google-cloud-dns-1.1.2.tar.gz
  • docfx-java-google-cloud-functions-1.0.8.tar.gz
  • docfx-java-google-cloud-gcloud-maven-plugin-0.1.1.tar.gz
  • docfx-java-google-cloud-logging-logback-0.120.2-alpha.tar.gz
  • docfx-java-google-cloud-memcache-1.0.1.tar.gz
  • docfx-java-google-cloud-networkconnectivity-0.2.0.tar.gz
  • docfx-java-google-cloud-networkconnectivity-0.2.1.tar.gz
  • docfx-java-google-cloud-notification-0.121.7-beta.tar.gz
  • docfx-java-google-cloud-resourcemanager-0.118.10-alpha.tar.gz
  • docfx-java-google-cloud-retail-0.2.0.tar.gz
  • docfx-java-google-cloud-spanner-5.0.0.tar.gz
  • docfx-java-google-cloud-storage-1.113.12.tar.gz
  • docfx-java-google-cloud-workflow-executions-0.1.6.tar.gz
  • docfx-java-google-cloud-workflows-0.2.1.tar.gz
  • docfx-java-proto-google-cloud-orgpolicy-v1-1.1.1.tar.gz
  • docfx-java-proto-google-iam-v1-1.0.10.tar.gz
  • docfx-java-proto-google-identity-accesscontextmanager-v1-1.0.14.tar.gz
  • docfx-java-pubsublite-kafka-0.2.2.tar.gz
  • docfx-java-pubsublite-kafka-0.6.3.tar.gz
  • docfx-java-pubsublite-spark-sql-streaming-0.3.1.tar.gz
  • docfx-nodejs-speech-4.5.0.tar.gz

commit: 32ba472
buildURL: Build Status, Sponge
status: failed

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.

Request user to delete tmp directory when running tests

Running the test with existing tmp directory in the doc-pipeline directory can cause flakiness and unknown behaviors.

Instead of potentially prematurely deleting the tmp folder, the test should ask the user to get rid of it before running any tests.

Automatically rebuild HTML when templates or YAML update

Right now, if you update the templates or the YAML of a package, the HTML won't get regenerated automatically. What if our default job changed to:

  1. Get all blobs.
  2. For every YAML blob:
    1. If no HTML version exists, generate it.
    2. Else if an HTML version exists and it was updated before the YAML blob, regenerate it.
    3. Else if an HTML version exists and it was generated before the latest commit to doc-templates, regenerate it.

Error preparing Java TOCs

docpipeline > Error processing docfx-java-pubsublite-kafka-0.2.0.tar.gz:

'dict' object has no attribute 'sort'

Reproduce locally:

$ gsutil cp gs://docs-staging-v2/docfx-java-pubsublite-kafka-0.2.0.tar.gz gs://my-bucket
$ SOURCE_BLOB=docfx-java-pubsublite-kafka-0.2.0.tar.gz SOURCE_BUCKET=my-bucket TRAMPOLINE_BUILD_FILE=./generate.sh TRAMPOLINE_IMAGE=gcr.io/cloud-devrel-kokoro-resources/docfx TRAMPOLINE_DOCKERFILE=docfx/Dockerfile ci/trampoline_v2.sh

Here is the .sort call:

# sort list of dict on dict key 'uid' value
toc.sort(key=lambda x: x.get("uid"))

Allow parallel builds to work

The goal is that we can replicate the behavior for FORCE_GENERATE_ALL but have it run faster for all the languages by having the builds run in parallel for each language. It will be dependent on #40 to be finished first.

Handle normalized semver versions for latest version handling

Similar to how normalized semver versions for Python has been an issue in the pipeline, it is also not getting picked up as the "latest" version when handling FORCE_GENEREATE_LATEST, and probably for xref bit as well. We should revert the Python versioning back to be semver-compliant to find the latest.

It will be a slightly complicated logic as we'll have to convert it to be semver compliant, but also keep note of the original versioning to pinpoint and pick up on that version later on if needed.

Test bucket does not seem to be getting cleaned up properly

All new dependency PRs seem to have run into the test buckets having too many blobs from previous runs, and is not getting cleaned up properly. I'm not sure if someone is using the test bucket, but if it's not getting cleaned up properly we should look into why this is happening.

Increase timeout from 10 hours

Running FORCE_GENERATE_ALL build timed out after 10 hours, processing 781 blobs out of 3045. Raw math, that's ~25% of the blobs in 10 hours, so I'll increase to 72 hours for now.

Dependency Dashboard

This issue contains a list of Renovate updates and their statuses.

This repository currently has no open or pending branches.


  • Check this box to trigger a request for Renovate to run again on this repository

Enable Flakybot for nightly tests

Empty output from generation jobs

Seeing some jobs with only the following files, which is missing the actual content:

docuploader > Sit tight, I'm tarring up your docs in ..
./
./docs.metadata
./xrefmap.yml
./_toc.yaml

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.