Coder Social home page Coder Social logo

Comments (11)

chvalean avatar chvalean commented on May 12, 2024

Thank you for reporting this!
A few questions:

  • does the VM start eventually and does it provide KVP information?
  • is this particular to RHEL 6.10 or are you seeing this on other releases as well?
  • are you running lisa remotely on another server? Does this repro by running lisa directly on the Hyper-V server?

from lis-test.

zqwzshm avatar zqwzshm commented on May 12, 2024

Thank you for reply!

does the VM start eventually and does it provide KVP information?

No, the VM did not start, it is still in saved state. If I try "start-vm" immediately after the exception raised, I am more likely to get the same exception again. But if I try start-vm after a few seconds, VM would be able to start normally.
Before this exception being raised, everything works fine, including get-intergrationservice.

is this particular to RHEL 6.10 or are you seeing this on other releases as well?

I reproduced this issue with RHEL7 just now, reproduced twice with 505 loops. It is not even a recently discovered issue. It has been existing for a long time. In my opinion, this is more like a pure host issue, I don't think guest os is even related.

are you running lisa remotely on another server? Does this repro by running lisa directly on the Hyper-V server?

I'm running LISA directly from local repo on server.
BTW, by "reproduce rate" I was referring to running the reproduce script in loop.

from lis-test.

zqwzshm avatar zqwzshm commented on May 12, 2024

We met this problem 2 more times.

------------ Log Excerpt 1 ---------------------
03:21:12 Info : DoPS1TestRunning(RHEL-7.5-20180315.0-x86_64-GEN2-A)
03:21:12 Info : GetTestData(VerifyIntegratedTimeSyncService)
03:21:12 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from PS1TestRunning to PS1TestCompleted
03:21:12 Info : DoPS1TestCompleted(RHEL-7.5-20180315.0-x86_64-GEN2-A)
03:21:12 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A Status for test VerifyIntegratedTimeSyncService = Passed
03:21:15 Info : SetTestResult(Functional,VerifyIntegratedTimeSyncService)
03:21:15 Info : GetTestID(VerifyIntegratedTimeSyncService)
03:21:15 Info : GetCurentSuite(Functional)
03:21:15 Info : GetTestData(VerifyIntegratedTimeSyncService)
03:21:15 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from PS1TestCompleted to DetermineReboot
03:21:15 Info : DoDetermineReboot(RHEL-7.5-20180315.0-x86_64-GEN2-A)
03:21:15 Info : GetNextText(RHEL-7.5-20180315.0-x86_64-GEN2-A)
03:21:15 Debug: vm.currentTest = VerifyIntegratedTimeSyncService
03:21:15 Debug: vm.suite = Functional
03:21:15 Info : GetTestData(VerifyIntegratedTimeSyncService)
03:21:15 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from DetermineReboot to ShutdownSystem
03:21:15 Info : DoShutdownSystem(RHEL-7.5-20180315.0-x86_64-GEN2-A)
03:21:15 Provider load failure
03:21:15 + CategoryInfo : NotSpecified: (:) [Get-VM], VirtualizationExcept
03:21:15 ion
03:21:15 + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands
03:21:15 .GetVM
03:21:15 + PSComputerName : 2016-Auto
03:21:15
03:21:15 Warn : RHEL-7.5-20180315.0-x86_64-GEN2-A did not shutdown gracefully. Forcing VM TurnOff.
03:21:15 Provider load failure
03:21:15 + CategoryInfo : NotSpecified: (:) [Stop-VM], VirtualizationExcep
03:21:15 tion
03:21:15 + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands
03:21:15 .StopVM
03:21:15 + PSComputerName : 2016-Auto
03:21:15
03:21:15 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from ShutdownSystem to ShuttingDown
03:21:15 Info : DoShuttingDown(RHEL-7.5-20180315.0-x86_64-GEN2-A)
03:21:15 Provider load failure
03:21:15 + CategoryInfo : NotSpecified: (:) [Get-VM], VirtualizationExcept
03:21:15 ion
03:21:15 + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands
03:21:15 .GetVM
03:21:15 + PSComputerName : 2016-Auto
03:21:15
03:21:15 Error: ShutDownVM cannot find the VM RHEL-7.5-20180315.0-x86_64-GEN2-A on HyperV server 2016-Auto after testing
03:21:15 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from ShuttingDown to Disabled
03:21:15 Info : DoStateMachine() exiting
03:21:15
03:21:15 Test Results Summary
------------ Log Excerpt 1 ---------------------

------------ Log Excerpt 2 ---------------------
11:55:21 Info : DoPS1TestRunning(RHEL-7.5-20180315.0-x86_64-GEN2-A)
11:55:21 Info : GetTestData(VMHeartBeat_ISOInstallation)
11:55:21 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from PS1TestRunning to PS1TestCompleted
11:55:21 Info : DoPS1TestCompleted(RHEL-7.5-20180315.0-x86_64-GEN2-A)
11:55:21 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A Status for test VMHeartBeat_ISOInstallation = Passed
11:55:21 Info : SetTestResult(Functional,VMHeartBeat_ISOInstallation)
11:55:21 Info : GetTestID(VMHeartBeat_ISOInstallation)
11:55:21 Info : GetCurentSuite(Functional)
11:55:21 Info : GetTestData(VMHeartBeat_ISOInstallation)
11:55:21 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from PS1TestCompleted to DetermineReboot
11:55:21 Info : DoDetermineReboot(RHEL-7.5-20180315.0-x86_64-GEN2-A)
11:55:21 Info : GetNextText(RHEL-7.5-20180315.0-x86_64-GEN2-A)
11:55:21 Debug: vm.currentTest = VMHeartBeat_ISOInstallation
11:55:21 Debug: vm.suite = Functional
11:55:21 Info : GetTestData(VMHeartBeat_ISOInstallation)
11:55:21 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from DetermineReboot to ShutdownSystem
11:55:21 Info : DoShutdownSystem(RHEL-7.5-20180315.0-x86_64-GEN2-A)
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from ShutdownSystem to ShuttingDown
11:55:28 Info : DoShuttingDown(RHEL-7.5-20180315.0-x86_64-GEN2-A)
11:55:28 Info : GetTestData(VMHeartBeat_ISOInstallation)
11:55:28 Info : SetRunningTime(VMHeartBeat_ISOInstallation)
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A currentTest lasts 0 Hours, 0 Minutes, 47 seconds.
11:55:28 Info : GetCurentSuite(Functional)
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from ShuttingDown to SystemDown
11:55:28 Info : Entering DoSystemDown( RHEL-7.5-20180315.0-x86_64-GEN2-A )
11:55:28 Info : GetTestData(VMHeartBeat_ISOInstallation)
11:55:28 Info : GetNextText(RHEL-7.5-20180315.0-x86_64-GEN2-A)
11:55:28 Debug: vm.currentTest = VMHeartBeat_ISOInstallation
11:55:28 Debug: vm.suite = Functional
11:55:28 Info : GetTestData(Verify_LIS_Modules)
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A currentTest updated to Verify_LIS_Modules
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from SystemDown to ApplyCheckpoint
11:55:28 Info : Entering DoApplyCheckpoint( RHEL-7.5-20180315.0-x86_64-GEN2-A )
11:55:28 Info : GetTestData(Verify_LIS_Modules)
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A Over-riding default snapshotName from global section to ICABase
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A is being reset to snapshot ICABase
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from ApplyCheckpoint to StartSystem
11:55:28 Info : DoStartSystem( RHEL-7.5-20180315.0-x86_64-GEN2-A )
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A is being started
11:55:28 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from StartSystem to SystemStarting
11:55:28 Info : Entering DoSystemStarting( RHEL-7.5-20180315.0-x86_64-GEN2-A )
11:55:28 Debug: vm.ipv4 = 10.73.74.96
11:55:28 Debug: vm.ipv4 = 10.73.74.96 and ipv4 =
11:55:32 Info : Entering DoSystemStarting( RHEL-7.5-20180315.0-x86_64-GEN2-A )
11:55:32 Debug: vm.ipv4 = 10.73.74.96
11:55:32 Debug: vm.ipv4 = 10.73.74.96 and ipv4 =
11:55:35 Info : Entering DoSystemStarting( RHEL-7.5-20180315.0-x86_64-GEN2-A )
11:55:35 Debug: vm.ipv4 = 10.73.74.96
11:55:35 Debug: vm.ipv4 = 10.73.74.96 and ipv4 =
11:55:39 Info : Entering DoSystemStarting( RHEL-7.5-20180315.0-x86_64-GEN2-A )
11:55:39 Provider load failure
11:55:39 + CategoryInfo : NotSpecified: (:) [Get-VM], VirtualizationExcept
11:55:39 ion
11:55:39 + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands
11:55:39 .GetVM
11:55:39 + PSComputerName : 2016-Auto
11:55:39
11:55:39 Error: RHEL-7.5-20180315.0-x86_64-GEN2-A SystemStarting entered state without being in a HyperV Running state - disabling VM
11:55:39 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from SystemStarting to ForceShutDown
11:55:39 Debug: vm.ipv4 = 10.73.74.96
11:55:39 Provider load failure
11:55:39 + CategoryInfo : InvalidOperation: (:) [Get-WmiObject], Managemen
11:55:39 tException
11:55:39 + FullyQualifiedErrorId : GetWMIManagementException,Microsoft.PowerShell.C
11:55:39 ommands.GetWmiObjectCommand
11:55:39 + PSComputerName : 2016-Auto
11:55:39
11:55:39 Debug: vm.ipv4 = 10.73.74.96 and ipv4 =
11:55:43 Info : DoForceShutdown(RHEL-7.5-20180315.0-x86_64-GEN2-A)
11:55:43 Info : GetTestData(done)
11:55:43 Info : SetRunningTime(done)
11:55:43 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A currentTest lasts 0 Hours, 0 Minutes, 17 seconds.
11:55:43 Info : GetCurentSuite(Functional)
11:55:43 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from ForceShutDown to SystemDown
11:55:43 Info : Entering DoSystemDown( RHEL-7.5-20180315.0-x86_64-GEN2-A )
11:55:43 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A currentTest updated to done
11:55:43 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from SystemDown to Finished
11:55:43 Info : SaveResultToXML to (Functional,TestResults\cases-20180316-195346)
11:55:43 Info : GetCurentSuite(Functional)
11:55:43 Info : DoStateMachine() exiting
------------ Log Excerpt 2 ---------------------

from lis-test.

zqwzshm avatar zqwzshm commented on May 12, 2024

And one more time:

07:53:37 Info : DoPS1TestRunning(RHEL-7.5-20180315.0-x86_64-GEN2-A)
07:53:37 Info : GetTestData(VHDxResizeOver2TB_IDE)
07:53:37 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from PS1TestRunning to PS1TestCompleted
07:53:37 Info : DoPS1TestCompleted(RHEL-7.5-20180315.0-x86_64-GEN2-A)
07:53:37 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A Status for test VHDxResizeOver2TB_IDE = Failed
07:53:37 Info : SetTestResult(Functional,VHDxResizeOver2TB_IDE)
07:53:37 Info : GetTestID(VHDxResizeOver2TB_IDE)
07:53:37 Info : GetCurentSuite(Functional)
07:53:37 Info : GetTestData(VHDxResizeOver2TB_IDE)
07:53:37 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from PS1TestCompleted to DetermineReboot
07:53:37 Info : DoDetermineReboot(RHEL-7.5-20180315.0-x86_64-GEN2-A)
07:53:37 Info : GetNextText(RHEL-7.5-20180315.0-x86_64-GEN2-A)
07:53:37 Debug: vm.currentTest = VHDxResizeOver2TB_IDE
07:53:37 Debug: vm.suite = Functional
07:53:37 Info : GetTestData(VHDxResizeOver2TB_IDE)
07:53:37 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from DetermineReboot to ShutdownSystem
07:53:37 Info : DoShutdownSystem(RHEL-7.5-20180315.0-x86_64-GEN2-A)
07:53:37 Provider load failure
07:53:37 + CategoryInfo : NotSpecified: (:) [Get-VM], VirtualizationExcept
07:53:37 ion
07:53:37 + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands
07:53:37 .GetVM
07:53:37 + PSComputerName : 2016-Auto
07:53:37
07:53:37 Warn : RHEL-7.5-20180315.0-x86_64-GEN2-A did not shutdown gracefully. Forcing VM TurnOff.
07:53:37 Provider load failure
07:53:37 + CategoryInfo : NotSpecified: (:) [Stop-VM], VirtualizationExcep
07:53:37 tion
07:53:37 + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands
07:53:37 .StopVM
07:53:37 + PSComputerName : 2016-Auto
07:53:37
07:53:37 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from ShutdownSystem to ShuttingDown
07:53:37 Info : DoShuttingDown(RHEL-7.5-20180315.0-x86_64-GEN2-A)
07:53:37 Provider load failure
07:53:37 + CategoryInfo : NotSpecified: (:) [Get-VM], VirtualizationExcept
07:53:37 ion
07:53:37 + FullyQualifiedErrorId : Unspecified,Microsoft.HyperV.PowerShell.Commands
07:53:37 .GetVM
07:53:37 + PSComputerName : 2016-Auto
07:53:37
07:53:37 Error: ShutDownVM cannot find the VM RHEL-7.5-20180315.0-x86_64-GEN2-A on HyperV server 2016-Auto after testing
07:53:37 Info : RHEL-7.5-20180315.0-x86_64-GEN2-A transitioned from ShuttingDown to Disabled
07:53:37 Info : DoStateMachine() exiting
07:53:37
07:53:37 Test Results Summary

from lis-test.

zqwzshm avatar zqwzshm commented on May 12, 2024

It seems multi tasking might be one factor that causes this problem. We were trying to reproduce this issue on another VM, and the running framework exited again.

I tried to run the reproduce script with a vm not booted into a OS ( staying on grub menu ), and I was able to reproduce this issue too. So I guess OS on VM is not related.

from lis-test.

chvalean avatar chvalean commented on May 12, 2024

Might be a long shot but worth testing - #894
In that case the memory consumption of the lisa PS session was too hight and reaching the limit.
Can you try to monitor and increase that value?

from lis-test.

zqwzshm avatar zqwzshm commented on May 12, 2024

I checked the memory limit, it is set to 2147483647 (max integer), which basically means unlimited, and the settings does not allow bigger value than that.

I ran 6 background jobs looping the reproduce script, the failure occurred simultaneously in the 6 jobs after about 600 loops. Multitasking might not be related according to the result this time, sorry for the misleading info before. No significant memory increase observed during the run.

We are actually not sure about the root cause of #894 . We were facing more issues than that ( slow backup etc. ), and we finally decided to reinstall host OS on that machine, and the problems disappeared. For this time, the issue exists on at least 2 hosts tested. So I guess it is not caused by same reason as #894 .

from lis-test.

chvalean avatar chvalean commented on May 12, 2024

Thanks for the info @zqwzshm
Let me also run the code snippet provided and see if I can repro this on our end.

from lis-test.

chvalean avatar chvalean commented on May 12, 2024

Tested this today on a WS2016 server with the latest updates and I cannot reproduce the mentioned issue.
Got the save loop to 3500 count and it's still running fine.

Here are my env versions:
WS2016 14393.2125
PS version: 5.1.14393.2125

VM has 4GB RAM and 4 cores, and it runs RHEL 7.3 with an upstream kernel.

Did some searching and found an old post that might help with troubleshooting this in your env, maybe it helps:
https://blogs.technet.microsoft.com/heyscriptingguy/2012/09/12/use-powershell-to-troubleshoot-provider-load-failure/

from lis-test.

zqwzshm avatar zqwzshm commented on May 12, 2024

Thank you very much for your help! And sorry for replying so late. I had been distracted by some other work.

I tried to reproduce this issue on 4 servers, and it turns out that it is not reproducible on all the servers:
Server 1: Reproduced
Hyper-V Server Core. There are about 30 VMs on the host, and 6 are running
Server 2: Reproduced
Hyper-V Server Core, There are about 30 VMs on the host, and 6 are running
Server 3: NOT Reporoduced
Hyper-V Server Core, rather idle, 5 VM on the host, 2 are running
Server 3: NOT Reporoduced
WS2016 Standard (with GUI), a local test machine, 8GB memory, 4 cores, no VM running.

Looks like work load is related, idle servers are less likely to encounter this issue.

Since I am only able to reproduce this issue on server core, which does not have GUI (and event viewer naturally), I actually struggled for a while to get the trace and debug events via powershell. Anyway, it logged heavily during the run (the save/start/get-vm script), and there are a lot of errors even before the "Provider Load Failure" exception occurred. However, I am not seeing any special error event at the exact point the exception occurred.

Did a little digging into the errors, looks like svchost process was trying to access WMI and got error return. Did not do further digging since this is getting into system level. We might look for some simple workaround like rerunning cases about save/start half-manually. Any better idea?

Thank you very much!

Got events via following powershell commands:
Get-WinEvent -ProviderName Microsoft-Windows-Hyper-V*
Get-WinEvent -ProviderName Microsoft-Windows-WMI-Activity -MaxEvents 10000
Get-WinEvent -LogName Microsoft-Windows-WMI-Activity/Trace -Oldest
Get-WinEvent -LogName Microsoft-Windows-WMI-Activity/Debug -Oldest

hv-events.txt
wmi-events.txt
wmi-trace.txt
wmi-debug.txt

from lis-test.

zqwzshm avatar zqwzshm commented on May 12, 2024

Looks like there's not much we can actually do about it. Auto rerun would be a straightforward solution. This is one reason of the feature request #963 . Since you are not hitting this issue, we'll do auto rerun locally.
Thank you very much for your help!

from lis-test.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.