microsoft / validate-dcb Goto Github PK
View Code? Open in Web Editor NEWValidator for RDMA Configuration and Best Practices
License: MIT License
Validator for RDMA Configuration and Best Practices
License: MIT License
Initiate.ps1 is calling EnableExit which is ending the process before the test is complete. It also closes the powershell window you have open.
The current validation of the Configuration File Path on the "Save and Deploy" page of the wizard does not allow specifying a new file path. The wizard will only allow access to the "Export" button if an existing file is selected via the browse button. It does not allow manual paths, or selection of an empty folder.
Seeing an error running Validate-DCB on servers, not sure how to troubleshoot
`
[+] [SUT: servername]-[SMB Adapter: STORE-D]-[Noun: SMBServerNetworkInterface] SMB Client must report RDMA Capable 2ms
[-] Should have an Live Migration limit of 750 MBps 1ms
ParameterBindingException: A positional parameter cannot be found that accepts argument '1'.
at , C:\Program Files\WindowsPowerShell\Modules\Validate-DCB\20191128.2.2.82\tests\unit\modal.unit.tests.ps1: line 902
`
In the default testscope, ContinueonFailure param doesn't work because the condition
If ($GlobalResults.FailedCount -ne 0) {
Doesn't account for the parameter.
Currently load balancing mode is only checked if specified in the config file. Should update to check for the recommended type if not specified in the config file
[-] [SUT: BFICLHOST3]-[RDMAEnabledAdapters: SET01]-[Noun: NetQosDcbxSetting] interfaces DCBX 'Willing' option should be false 74ms Expected: {false} But was: {} 558: ($actNetQoSState.NetQosDcbxSettingInterfaces | Where-Object InterfaceAlias -like $thisRDMAEnabledAdapter.Name).Willing | Should Be 'false' at <ScriptBlock>, C:\Program Files\WindowsPowerShell\Modules\Validate-DCB\20191128.2.2.82\tests\unit\modal.unit.tests.ps1: line 558
The Intel X722 does not appear to have this capability...?
Enabled : True
Capabilities : Hardware Current
-------- -------
MacSecBypass :
DcbxSupport :
NumTCs(Max/ETS/PFC) :
HardwareCapabilities :
CurrentCapabilities :
OperationalTrafficClasses : Not Available
OperationalFlowControl : Not Available
OperationalClassifications : Not Available
RemoteTrafficClasses : Not Available
RemoteFlowControl : Not Available
RemoteClassifications : Not Available
OperationalSettings :
RemoteSettings :
AdminStatus :
ifAlias : xxxx
InterfaceAlias : xxxx
ifDesc : Intel(R) Ethernet Connection X722 for 10GbE SFP+
Caption : MSFT_NetAdapterQosSettingData 'Intel(R) Ethernet Connection X722 for 10GbE SFP+'
Description : Intel(R) Ethernet Connection X722 for 10GbE SFP+
ElementName : Intel(R) Ethernet Connection X722 for 10GbE SFP+
InstanceID : {E69F7F95-0474-4AD7-BF71-5530AC4247A9}
InterfaceDescription : Intel(R) Ethernet Connection X722 for 10GbE SFP+
From PROSet PowerShell.
Get-IntelNetAdapterStatus -Name "*x722*" -Status DCB
Get-IntelNetAdapterStatus : The specified device does not support DCB, has DCB disabled, or Intel's implementation of DCB is not installed.
At line:1 char:1
\+ Get-IntelNetAdapterStatus -Name "*x722*" -Status DCB
\+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
\+ CategoryInfo : NotSpecified: (:) [Get-IntelNetAdapterStatus], Exception
\+ FullyQualifiedErrorId : System.Exception,Intel.PowerShell.Network.Adapter.GetIntelNetAdapterStatus
If this is the case, which it seems to be, then Intel RNICs may need an exclusion to this test.
When using Microsoft Validate-DCB on Dell EMC Solutions for Microsoft Azure Stack HCI Solutions with Mellanox NICs, we see a recommendation to install a version of the Mellanox driver that has not been certified or even available for download from Dell.
https://github.com/microsoft/Validate-DCB/blob/master/helpers/drivers/drivers.psd1
@{ IHV = 'Mellanox' ; DriverFileName = 'mlx5.sys' ; MinimumDriverVersion = '2.60.21096.0' } # ConnectX-4
Is there a way to add a check if Dell EMC Solutions for Microsoft Azure Stack HCI Solution use the driver version shown in the Support Matrix for Microsoft HCI Solutions?
For Cluster HB Setting
1% cluster bandwidth percentage is for 25GB nics, if 10GB nic is present cluster bandwidth will be 2%
### Verify Cluster Bandwidth Percentage is equal to 1%
It "[Config File]-[NonNodeData.NetQos]-[Noun: NetQosTrafficClass] Cluster BandwidthPercentage must be 1%" {
($ConfigData.NonNodeData.NetQos.GetEnumerator().Where{ $_.Template -eq 'Cluster' }).BandwidthPercentage | Should be 1
}
When executing Validate-DCB, using the UI as the mechanism for parameter input, I have tried many different combinations including renaming adapters etc. however I cannot overcome this error. I fear that this is further causing additional errors.
Configuration information resulting in the error:
Configuration attempt 1 to bypass the error:
Validate-DCB/Tests/unit/modal.unit.tests.ps1
Line 894 in fb27696
This test starting on line 256 of global.unit.tests.ps1.
### Verify each VMSwitch.RDMAEnabledAdapter includes the VLANID property from Get-NetAdapterAdvancedProperty It "[Config File]-[AllNodes.VMSwitch.RDMAEnabledAdapters]-[Node: $($thisNode.NodeName)]-[Entry: $($thisRDMAEnabledAdapter.Name)]-[Noun: NetAdapterAdvancedProperty] Must include the VLANID property for each entry" { $thisRDMAEnabledAdapter.VLANID | Should not BeNullOrEmpty }
Based on the example config files this is checking the pNIC for a VLANID. But shouldn't the pNIC be in TRUNK mode?
Based on this example:
RDMAEnabledAdapters = @( @{ Name = 'RoCE-01' ; VMNetworkAdapter = 'SMB01' ; VLANID = '101' ; JumboPacket = 9014 } @{ Name = 'RoCE-02' ; VMNetworkAdapter = 'SMB02' ; VLANID = '101' ; JumboPacket = 9014 } )
The test runs against RoCE-1, which is the pNIC. The VLAN test should be run against the affinitized VMNetworkAdapter, SMB*.
This test fails on multi-node clusters since AllNodes is defined as an array in the config file.
$AllNodes = @()
$configData.AllNodes.GetType()
IsPublic IsSerial Name BaseType
True True Object[] System.Array
The test needs to look at the first element in the array. Or look through the array. Or check whether it's an array. Or some combination of the above.
Example fix:
### Verify configData contains the AllNodes HashTable It "[Config File]-[AllNodes] Config File must contain the AllNodes Hashtable" { $configData.AllNodes[0] | Should BeOfType System.Collections.Hashtable }
In the Data Center Bridging page of the wizard, there is a typo in the second paragraph.
"or are no using RDMA" should be replaced by "If you are not using RDMA" .
Additionally "Data Center Bridging is required network reliability" should read "Data Center Bridging is required for network reliability"
In Tests/unit/modal.unit.tests.ps1, a solution with network adapters with connection speed greater than 10GbE, a fixed maximum limit of 750 MBps is looked for. A reference to an article https://techcommunity.microsoft.com/t5/failover-clustering/optimizing-hyper-v-live-migrations-on-an-hyperconverged/ba-p/396609 is used as justification. However, that article states "The testing conducted tested different bandwidth limits on a dual 10 Gpbs RDMA enabled NIC and measured failures under stress conditions and found that throttling live migration to 750 MB achieved the highest level of availability to the system. On a system with higher bandwidth, you may be able to throttle to a value higher than 750 MB." More specific guidance is provided by Microsoft at https://docs.microsoft.com/en-us/azure-stack/hci/concepts/host-network-requirements#traffic-bandwidth-allocation. This article provides more specific guidance illustrating that a fixed 750MBps is not always the right value.
Here is an example of the appropriate calculation (and settings):
`
$aggregateLinkSpeed = ($smbNIC1.TransmitLinkSpeed + $smbNIC2.TransmitLinkSpeed)/1000000000
$smbBandwidthAllocationPercent = .5
$smbBandwidthLimit = $aggregateLinkSpeed * $smbBandwidthAllocationPercent
$liveMigrationBandwidthLimit = $smbBandwidthLimit * .29
$liveMigrationMaxMigrationLimit = 2
if($liveMigrationBandwidthLimit -lt 5){
$migrationPerformanceOption = "Compression"
Set-VMHost -VirtualMachineMigrationPerformanceOption $migrationPerformanceOption
}
else{
$migrationPerformanceOption = "SMB"
Set-VMHost -VirtualMachineMigrationPerformanceOption $migrationPerformanceOption
Set-SmbBandwidthLimit -CimSession $server -Category LiveMigration -BytesPerSecond ($liveMigrationBandwidthLimit/8)*1000000000)
}
`
Original issue tagged by Adi. They will need to be able to specify the location of the validate-dcb output report
File missing Initiate.ps1
If you select the back button when you are at the "Save and deploy" page of the wizard, the buttons at the bottom of the wizard do not update, and continue to read "Back" and "Export" (Instead of "Back" and "Next"). This breaks the wizard. you cannot move forward, and the only way to resolve the issue is to exit the wizard and start over again.
When executing Validate-DCB, I am receiving the following message:
[-] [SUT: hci01]-[SMB Adapter: Storage1]-[Noun: SMBServerNetworkInterface] SMB Client must report RDMA Capable 11ms
Expected $true, but got $null.
875: (($SMBServerNetworkInterface | Where-Object InterfaceIndex -eq $NetAdapter.IfIndex) | Select-Object -first 1).RdmaCapable | Should be $true at <ScriptBlock>, C:\Program Files\WindowsPowerShell\Modules\Validate-DCB\20210802.2.2.117\tests\unit\modal.unit.tests.ps1: line 875
However, when expecting the properties of the adapter, it appears the above is being incorrectly reported.
Get-NetAdapter
Name InterfaceDescription ifIndex Status MacAddress
---- -------------------- ------- ------ ----------
Compute-2 HPE Ethernet 10/25Gb 2-port 640FLR...#2 17 Up 04-09-73-...
Storage1 Hyper-V Virtual Ethernet Adapter 16 Up 00-15-5D-...
StorageReplica2 Hyper-V Virtual Ethernet Adapter #4 41 Up 00-15-5D-...
Embedded LOM 1 Port 3 HPE Ethernet 1Gb 4-port 331i Adapter 13 Disconnected B8-83-03-...
Compute-1 HPE Ethernet 10/25Gb 2-port 640FLR-S... 12 Up 04-09-73-...
Embedded LOM 1 Port 4 HPE Ethernet 1Gb 4-port 331i Adapter #3 10 Disconnected B8-83-03-...
Embedded LOM 1 Port 2 HPE Ethernet 1Gb 4-port 331i Adapter #4 8 Disconnected B8-83-03-...
Management-1 HPE Ethernet 1Gb 4-port 331i Adapter #2 7 Up B8-83-03-...
StorageReplica1 Hyper-V Virtual Ethernet Adapter #3 37 Up 00-15-5D-...
Storage2 Hyper-V Virtual Ethernet Adapter #2 33 Up 00-15-5D-...
Get-SmbClientNetworkInterface
Interface Index RSS Capable RDMA Capable Speed IpAddresses
--------------- ----------- ------------ ----- -----------
13 False False 0 bps {fe80::d03b:d964:2eea:f57f}
10 False False 0 bps {fe80::d4c0:2632:efc9:d40c}
8 False False 0 bps {fe80::45f4:fd34:f8b1:b679}
17 False False 25 Gbps {}
12 False False 25 Gbps {}
16 True True 25 Gbps {fe80::ddd9:560:bbe1:dff3, 172.100.0.4}
33 True True 25 Gbps {fe80::b8be:b1cb:f721:f5e6, 172.102.0.4}
37 True True 25 Gbps {fe80::2d42:f3e:5e69:97f3, 172.101.0.4}
41 True True 25 Gbps {fe80::a51a:264a:2743:56c7, 172.103.0.4}
9 False False 10 Gbps {fe80::4525:a4fc:7150:2a46, 169.254.1.184}
7 True False 1 Gbps {fe80::d1b7:daef:a181:9e88, 10.40.219.53, 10.40...
Hello,
I have seen that the script does not include checking the settings on the BIOS settings, this can lead to situations that following the steps and the script is not reporting any errors still DCB is not enabled.
Can you please include BIOS settings verifications, on the image we have one example of this from DELL BIOS settings.
On the "Clusters and Nodes" page of the wizard, if you inadvertently enter an invalid or incorrect server or cluster name there is no way to remove the invalid entry except restarting the wizard.
If you use the UI in v2.1 and supply a FQDN for the cluster name, it will not resolve the cluster nodes.
If the driver output check fails, ideally the expected and actual values would be output in the results.
Windows 2022 needs to be included in supported OSes.
Next button does not re-enable after clicking on the back button
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.