Coder Social home page Coder Social logo

Comments (21)

AMZN-Dk avatar AMZN-Dk commented on August 29, 2024 3

Based off feedback from Testing SIG meeting we are good to close this out and move forward. @jonawals can you address @amzn-changml question above outside of this.

from sig-testing.

jeremyong-az avatar jeremyong-az commented on August 29, 2024 2

Love the data! How will test time compare against build times in AR after this change? I'm assuming the long pole will continue to be the non-unity Linux build?

from sig-testing.

Kadino avatar Kadino commented on August 29, 2024 2

Additional discussion between @FuzzyCarterAWS and @jonawals :

  • The aggregate data is good, though it would also be nice to see a case study on the impact of given changes. Low scope change vs large-scope change, highlight how much savings for different scenarios.
    • Leaf-ward nodes are highlights where there's a lot of benefit. Upstream dependencies have less benefit of deselecting more tests - but the benefit of verifying the right set of dependent risk.
    • May want to include data on safe mode vs no-TIAF to verify stats on any overhead

from sig-testing.

jonawals avatar jonawals commented on August 29, 2024 2

Notable that OpenCPPCoverage is GPL 3.0 and thus any fork should also be. May be a snag for the TSC to buy in.

We do have a fork, but it inherits the GPL 3.0 license (we forked in order to open a PR to merge in the new coverage mode required by TIAF, but no PRs have been merged since summer 2020...). As we do not host nor distribute the source or binary, and as we only use the binary, both LF and our internal legal team have determined we're ok to use said binary 👍

A longer term solution would be to implement our own coverage tool as it would appear that OpenCppCoverage is a de facto dead project (despite new PRs and GHIs being opened). If TIAF is deemed a necessary feature of O3DE, this would actually solve quite a few problems (cross platform-ness, licensing, and the lack of maintenance for OpenCppCoverage).

Is TIAF intended to be locally runnable? Would we be syncing its dependencies alongside O3DE's normal dependencies?

Not intended, per sé, but as we are not distributing the source or binary with O3DE then users themselves can make the call as to whether the licensing restrictions are prohibitive or not for their own purposes.

from sig-testing.

jromnoa avatar jromnoa commented on August 29, 2024 1

I have been looking forward to this for a while and have been getting info on it wherever I can. Test speed and efficiency are things I focus on a lot with automated tests (especially the slower python ones) and this will help enhance that.

@jonawals
Couple questions:

  1. Are there any code previews or examples of the system itself we can see as to how its implementation looks so far? I know its mostly on the pipeline side but just curious as to what it looks like.
  2. The explanation is thorough and I was able to get an idea of how it will work but one thing I was confused about was the Indirect Source Coverage (ISC) functionality with components and python tests. Since no code example exists, do you mean that if I had a python test file registered via CMakeLists.txt like this:
    ly_add_pytest(
        NAME AutomatedTesting::Atom_Main_Null_Render_MaterialEditor
        TEST_SUITE main
        TEST_SERIAL
        TIMEOUT 700
        PATH ${CMAKE_CURRENT_LIST_DIR}/TestSuite_Main_Null_Render_MaterialEditor_01.py
        RUNTIME_DEPENDENCIES
            AssetProcessor
            AutomatedTesting.Assets
            MaterialEditor
        COMPONENT
            Atom
    )

It will run my test if the "Atom" component contains changes in the current PR?

Thanks again, can't wait for this feature to go in.

from sig-testing.

Kadino avatar Kadino commented on August 29, 2024 1

Definitely love the focus on reducing execution times! If we accept this test-skipping feature, it should still exist alongside other performance and stability improvement efforts to directly improve the long running tests.

  • What is the ongoing infrastructural cost of adding this feature?
    • How much data is persisted, and how much does it cost to synchronize it on the fleet?
    • Do different platforms have different coverage data/formats?
    • Does the instrumentation increase build or test execution times? (separate from potential gains from test deselection)
  • Will there be a way to opt-in to enabling TIAF deselection for a test module?
    • SIGs may want to onboard/offboard certain tests to be candidates for deselection

I have some concern about excluding tests based on the historical coverage of their call-graph. While this is a good approximation of what dependencies should exist for unit-scope tests, it is a less stable approximation for tests with a call graph magnitudes larger. Particularly because that graph is under near-constant change from seemingly unrelated features, including changes that can dynamically vary at runtime. The purpose of the broader tests is to detect the subtle issues where an incoming change unintentionally introduces new broken dependencies, or breaks implicit dependencies such as assets and configuration files and other environmental state. Most implicit dependencies appear to not be tracked, and seem nigh uncountable coverage to add metrics for. Due to this, deselecting the larger tests would create a tradeoff that primarily finds new regression within one feature's code, instead of between features and dependencies. (However only reducing the set of unit-scope tests would greatly reduce the overall throughput improvement, so there's definitely desire to reduce the broader set!)

from sig-testing.

hultonha avatar hultonha commented on August 29, 2024 1

Looks great @jonawals, I wanted to verify a couple of points

it takes on average three attempts at AR

Do you have the data for this? Or is this based on anecdotal evidence? It'd be great if we could look back over the data and see this number

it takes on average 15 to 20 minutes

Same for this, can you point to the tracking data for this if possible (this will be super useful to help then compare the before/after for TIAF)

I'm also curious to learn more about the other platform support (most notably Linux). Is is possible to swap out OpenCppCoverage (only supported on Windows) with something like gcov or lcov on Linux to achieve the same effect?

from sig-testing.

jonawals avatar jonawals commented on August 29, 2024 1

Love the data! How will test time compare against build times in AR after this change? I'm assuming the long pole will continue to be the non-unity Linux build?

Yeah, our long poles will still be the biggest drag on AR, but if we can, on average, a) reduce the time spent running tests and b) reduce the number of attempts to pass AR, then we should see some meaningful savings over time 👍

from sig-testing.

jonawals avatar jonawals commented on August 29, 2024 1

I have been looking forward to this for a while and have been getting info on it wherever I can. Test speed and efficiency are things I focus on a lot with automated tests (especially the slower python ones) and this will help enhance that.

@jonawals Couple questions:

1. Are there any code previews or examples of the system itself we can see as to how its implementation looks so far? I know its mostly on the pipeline side but just curious as to what it looks like.

2. The explanation is thorough and I was able to get an idea of how it will work but one thing I was confused about was the Indirect Source Coverage (ISC) functionality with components and python tests. Since no code example exists, do you mean that if I had a python test file registered via `CMakeLists.txt` like this:
    ly_add_pytest(
        NAME AutomatedTesting::Atom_Main_Null_Render_MaterialEditor
        TEST_SUITE main
        TEST_SERIAL
        TIMEOUT 700
        PATH ${CMAKE_CURRENT_LIST_DIR}/TestSuite_Main_Null_Render_MaterialEditor_01.py
        RUNTIME_DEPENDENCIES
            AssetProcessor
            AutomatedTesting.Assets
            MaterialEditor
        COMPONENT
            Atom
    )

It will run my test if the "Atom" component contains changes in the current PR?

Thanks again, can't wait for this feature to go in.

Regarding 1), the runtimes for the C++ and Python tests live in Code\Tools\TestImpactFramework and the scripts to glue it all together in AR live in scripts\build\TestImpactAnalysis,

Regarding 2), what will happen is that the Python Coverage Gem (AutomatedTesting\Gem\PythonCoverage) will listen to any components that are added to any entities in the Editor and log them. Where possible, these components are then used to backtrack to the Gems (if any) that these components belong to and infer that any source changes to those gems in turn mean the Python tests that those components were enumerated for are covering those component sources. So, in your example, should AutomatedTesting::Atom_Main_Null_Render_MaterialEditor add any components that can be backtracked to sources in this way, we will run your test if those sources change 👍

from sig-testing.

Kadino avatar Kadino commented on August 29, 2024 1

Given the large scope, we likely should bring this in front of the TSC and not unilaterally accept from only SIG-Testing. After review and refinement with this group, this should seek wider approval.

from sig-testing.

Kadino avatar Kadino commented on August 29, 2024 1

As long as there's an easy opt-in, then I see little reason to prevent other SIGs from using such a tool. Opt-in seems initially preferable to opt-out while onboarding to a new tool, though switching to opt-out (default in) may eventually be preferable in the long term.

Then this would only require SIG-Testing to document why someone would want to opt to always have a given test run, and leave the power in other SIGs hands. For example: benefit of running tests less often VS potential pain of a deselected test breaking, tangentially-related broken code shipping, and then AR blocking the next new changes to the test's feature area.

from sig-testing.

Kadino avatar Kadino commented on August 29, 2024 1

Jenkins' test runtimes would become more spiky in the future with this feature enabled. I don't directly see this an issue, but it may be slightly surprising if the AR suites grow to be able to spike up to say 5 hrs for a core-checkin. Should be less of an issue if all tests are configured to always run in the Periodic builds. (frankly a separate issue if SIGs ever start allowing 5 total hours of tests into the smoke+main suites)

from sig-testing.

Kadino avatar Kadino commented on August 29, 2024 1

Notable that OpenCPPCoverage is GPL 3.0 and thus any fork should also be. May be a snag for the TSC to buy in.

Is TIAF intended to be locally runnable? Would we be syncing its dependencies alongside O3DE's normal dependencies?

from sig-testing.

jonawals avatar jonawals commented on August 29, 2024 1

As long as there's an easy opt-in, then I see little reason to prevent other SIGs from using such a tool. Opt-in seems initially preferable to opt-out while onboarding to a new tool, though switching to opt-out (default in) may eventually be preferable in the long term.

Then this would only require SIG-Testing to document why someone would want to opt to always have a given test run, and leave the power in other SIGs hands. For example: benefit of running tests less often VS potential pain of a deselected test breaking, tangentially-related broken code shipping, and then AR blocking the next new changes to the test's feature area.

Opt-in sounds like a solid plan, although for the initial onboarding I would go as far as to say to use safe mode so that we can gather the data while still running the deselected tests. The data we generate from this can feed into stakeholder decisions as to whether or not they wish to opt-in when we take safe mode off.

I'm drifting off-topic a bit here but this is all useful stuff we can bring to the discussion with the wider audience 👍

from sig-testing.

jonawals avatar jonawals commented on August 29, 2024 1

Then this would only require SIG-Testing to document why someone would want to opt to always have a given test run, and leave the power in other SIGs hands. For example: benefit of running tests less often VS potential pain of a deselected test breaking, tangentially-related broken code shipping, and then AR blocking the next new changes to the test's feature area.

@jeremyong-az, the chair of the TSC replied that they will defer to the expertise of sig-testing and sig-build unless the discussion needs to be escalated 👍

from sig-testing.

AMZN-Dk avatar AMZN-Dk commented on August 29, 2024 1

Looks good for us to move this into final review stage, tentative agenda item for 29/11/2022 SIG Testing meeting @Kadino

from sig-testing.

jonawals avatar jonawals commented on August 29, 2024

Definitely love the focus on reducing execution times! If we accept this test-skipping feature, it should still exist alongside other performance and stability improvement efforts to directly improve the long running tests.

* What is the ongoing infrastructural cost of adding this feature?

In terms of infrastructure, there shouldn't be any additional costs so long as we aren't running both CTest and TIAF. Right now, we have our own internal AR pipeline dedicated to TIAF that runs in addition to the public AR for O3DE. For us internally, this is adding additional costs only because we are duplicating work. If and when we choose to move the LF AR over to TIAF, we should aim to do so in such a way we minimize (or remove entirely) any transition periods that require the duplication of AR work.

  * How much data is persisted, and how much does it cost to synchronize it on the fleet?
  * Do different platforms have different coverage data/formats?
  * Does the instrumentation increase build or test execution times? (separate from potential gains from test deselection)

The data for each AR run is stored in an S3 bucket (separated out by repo, platform, branch and build config) and is minuscule in size (kilobytes). The data need not persist for long and there wouldn't be any problem with auto-culling the data after so many days or weeks. The worst thing that can happen should the data be trashed is a re-seed of the coverage. Right now, we only support Windows. This was originally due to the practicalities of prototyping for one platform only. Unfortunately, there is no cross-toolchain, cross-platform instrumentation tool so this would need to be adapted for, say, Linux and Mac (the TIA framework can accommodate different instrumentation on a platform by platform basis, it just requires more work to write it). As for the extra latency incurred by instrumentation, for Python tests the instrumentation comes for free, and for C++ tests, using a branch of OpenCppCoverage, we do under some circumstances add a bit more but additional optimizations have brought this to near enough parity with uninstrumented test runs (the overall savings are still a net benefit). It would be worth looking into a "one size fits all" instrumentation solution for C++ tests as OpenCppCoverage has its problems and is effectively a dead project (no PRs merged since May 2020).

* Will there be a way to opt-in to enabling TIAF deselection for a test module?
  
  * SIGs may want to onboard/offboard certain tests to be candidates for deselection

We have the ability right now to opt out of TIAF via a config file generated by CMake (tbh it's not the best approach but the code pathways are there, at least). What happens when a test target is excluded is it's just not selected at all. What we could do is add an additional list for "test targets to exclude from selection but run anyway". That way, if stakeholders want to be sure their test targets are always run regardless of TIAF's test selection, they can do so.

I have some concern about excluding tests based on the historical coverage of their call-graph. While this is a good approximation of what dependencies should exist for unit-scope tests, it is a less stable approximation for tests with a call graph magnitudes larger. Particularly because that graph is under near-constant change from seemingly unrelated features, including changes that can dynamically vary at runtime. The purpose of the broader tests is to detect the subtle issues where an incoming change unintentionally introduces new broken dependencies, or breaks implicit dependencies such as assets and configuration files and other environmental state. Most implicit dependencies appear to not be tracked, and seem nigh uncountable coverage to add metrics for. Due to this, deselecting the larger tests would create a tradeoff that primarily finds new regression within one feature's code, instead of between features and dependencies. (However only reducing the set of unit-scope tests would greatly reduce the overall throughput improvement, so there's definitely desire to reduce that set!)

We have the option to run TIAF in "safe mode". What this does is select the tests to run, but run the discarded tests afterwards as a separate run. We send all of this data off to wherever it needs to be stored for reporting purposes and then we can analyses any misfires were TIAF discarded a test that turned out to detect legitimate regressions for the given changes in the PR. I think as part of the onboarding process for TIAF we should definitely consider using this mode for a while to gather enough data for us to decide whether the tradeoff is worth letting, say, nightly builds pick up any misfires (if any) etc. 👍

from sig-testing.

jonawals avatar jonawals commented on August 29, 2024

Looks great @jonawals, I wanted to verify a couple of points

it takes on average three attempts at AR

Do you have the data for this? Or is this based on anecdotal evidence? It'd be great if we could look back over the data and see this number

it takes on average 15 to 20 minutes

Same for this, can you point to the tracking data for this if possible (this will be super useful to help then compare the before/after for TIAF)

I'll reach out to my colleague who has this data handy and update here 👍

I'm also curious to learn more about the other platform support (most notably Linux). Is is possible to swap out OpenCppCoverage (only supported on Windows) with something like gcov or lcov on Linux to achieve the same effect?

We can, and that would make perfect sense for the Linux and Mac platforms. All that is required is to make the modifications to the build configs to instrument the binaries accordingly and write a small adapter for the generated coverage data 👍

from sig-testing.

hultonha avatar hultonha commented on August 29, 2024

@jonawals

I'll reach out to my colleague who has this data handy and update here 👍

Excellent thanks!

We can, and that would make perfect sense for the Linux and Mac platforms. All that is required is to make the modifications to the build configs to instrument the binaries accordingly and write a small adapter for the generated coverage data 👍

Great to hear, I think once things are settled on Windows this is a logical next step to improve Linux test times

from sig-testing.

jonawals avatar jonawals commented on August 29, 2024

Thank you @Kadino for putting the discussion points of the SIG-Testing meeting on here!

from sig-testing.

amzn-changml avatar amzn-changml commented on August 29, 2024

SIG-Build agrees with this implementation and there are no immediate issues we're seeing from our perspective (defer to SIG-Testing on methodology). We are still concerned with the use of OpenCPPCoverage (for all the reasons mentioned in this thread), but as we discussed in prior meetings, we will assume that there will be an eventual offramp plan. For future goals, we also may want to further improve on local and agnostic storage devices for the cache, as not all contributors may have the ability to use S3 in their use cases.

The data provided for test time improvements are impressive! Total end-to-end runs should be improved, but agree with other statements about certain build targets being the long tail. SIG-Build does have upcoming improvements that might help, but we need further efforts to make compilation more efficient.

I do have a question about caching, however: You mentioned that cached artifacts will be culled on a regular basis, but does that use any LRU/MRU replacement strategies? Just seeing if we're maximizing the retention of the artifacts for first/clean runs.

from sig-testing.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.