Comments (33)
Could you share the DNS settings in cluster manifest please, making sure to remove any sensitive information (if any).
Specifically, we are looking for a section that looks similar to the following:
<Section Name="DnsService">
<Parameter Name="IsEnabled" Value="true" />
</Section>
What version of service fabric is your cluster? Thanks!
from service-fabric-cli.
Fabric Version: 6.1.480.9494
<Section Name="DnsService">
<Parameter Name="InstanceCount" Value="1" />
<Parameter Name="IsEnabled" Value="True" />
</Section>
from service-fabric-cli.
I get this error:
Unhealthy event: SourceId='System.FabricDnsService', Property='Socket', HealthState='Warning', ConsiderWarningAsError=false.
DnsService UDP listener is unable to start. Please make sure there are no processes listening on the DNS port 53.
List of processes listening on the DNS port:
UDP 0.0.0.0:53 : 6212
from service-fabric-cli.
microsoft/service-fabric-issues#803
from service-fabric-cli.
Are you able to upgrade the cluster to a newer version? This is a known issue which has been fixed in newer releases (6.2 and up). If you are not able to update, you can use the mitigation with ICS mentioned in the thread. This issue will also not appear in cloud clusters, it shows up only in local clusters. Please let me know if you run into issues applying those mitigations. Thanks!
from service-fabric-cli.
I've updated - now I am unable to start my cluster... where can I got to find the logs for the reason why, the c:\sfcluster
only has etl logs which I have no way of understanding....
from service-fabric-cli.
My manifest now:
<?xml version="1.0" encoding="utf-8"?>
<ClusterManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Name="DevCluster" Version="0" Description="This is a generated file. Do not modify." xmlns="http://schemas.microsoft.com/2011/01/fabric">
<NodeTypes>
<NodeType Name="NodeType0">
<Endpoints>
<ClientConnectionEndpoint Port="19000" />
<LeaseDriverEndpoint Port="19001" />
<ClusterConnectionEndpoint Port="19002" />
<HttpGatewayEndpoint Port="19080" Protocol="http" />
<HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
<ServiceConnectionEndpoint Port="19006" />
<ApplicationEndpoints StartPort="30001" EndPort="31000" />
</Endpoints>
<PlacementProperties>
<Property Name="NodeTypeName" Value="NodeType0" />
</PlacementProperties>
</NodeType>
</NodeTypes>
<Infrastructure>
<WindowsServer IsScaleMin="true">
<NodeList>
<Node NodeName="_Node_0" IPAddressOrFQDN="DESKTOP-ITQGDFK" IsSeedNode="true" NodeTypeRef="NodeType0" FaultDomain="fd:/0" UpgradeDomain="0" />
</NodeList>
</WindowsServer>
</Infrastructure>
<FabricSettings>
<Section Name="ApplicationGateway/Http">
<Parameter Name="IsEnabled" Value="true" />
</Section>
<Section Name="ClusterManager">
<Parameter Name="MinReplicaSetSize" Value="1" />
<Parameter Name="TargetReplicaSetSize" Value="1" />
<Parameter Name="UpgradeStatusPollInterval" Value="5" />
<Parameter Name="UpgradeHealthCheckInterval" Value="5" />
<Parameter Name="FabricUpgradeHealthCheckInterval" Value="5" />
</Section>
<Section Name="Diagnostics">
<Parameter Name="ProducerInstances" Value="ServiceFabricEtlFile,ServiceFabricPerfCtrFolder" />
<Parameter Name="MaxDiskQuotaInMB" Value="10240" />
<Parameter Name="EnableCircularTraceSession" Value="true" />
</Section>
<Section Name="DnsService">
<Parameter Name="InstanceCount" Value="-1" />
<Parameter Name="IsEnabled" Value="True" />
<Parameter Name="AllowMultipleListeners" Value="true" />
</Section>
<Section Name="FabricClient">
<Parameter Name="HealthReportSendInterval" Value="0" />
</Section>
<Section Name="Failover">
<Parameter Name="NodeUpRetryInterval" Value="1" />
<Parameter Name="SendToFMTimeout" Value="1" />
</Section>
<Section Name="FailoverManager">
<Parameter Name="ExpectedClusterSize" Value="1" />
<Parameter Name="IsSingletonReplicaMoveAllowedDuringUpgrade" Value="false" />
<Parameter Name="MinReplicaSetSize" Value="1" />
<Parameter Name="TargetReplicaSetSize" Value="1" />
<Parameter Name="ClusterStableWaitDuration" Value="0" />
<Parameter Name="PeriodicStateScanInterval" Value="1" />
<Parameter Name="ReconfigurationTimeLimit" Value="20" />
<Parameter Name="BuildReplicaTimeLimit" Value="20" />
<Parameter Name="CreateInstanceTimeLimit" Value="20" />
<Parameter Name="PlacementTimeLimit" Value="20" />
<Parameter Name="ServiceLocationBroadcastInterval" Value="1" />
<Parameter Name="ServiceLookupTableEmptyBroadcastInterval" Value="1" />
<Parameter Name="MinRebuildRetryInterval" Value="1" />
<Parameter Name="MaxRebuildRetryInterval" Value="1" />
</Section>
<Section Name="Federation">
<Parameter Name="NodeIdGeneratorVersion" Value="V4" />
<Parameter Name="ProcessAssertExitTimeout" Value="86400" />
<Parameter Name="UnresponsiveDuration" Value="0" />
</Section>
<Section Name="Hosting">
<Parameter Name="CacheCleanupScanInterval" Value="300" />
<Parameter Name="DeactivationGraceInterval" Value="2" />
<Parameter Name="DeactivationScanInterval" Value="600" />
<Parameter Name="DeploymentRetryBackoffInterval" Value="1" />
<Parameter Name="EnableProcessDebugging" Value="true" />
<Parameter Name="EndpointProviderEnabled" Value="true" />
<Parameter Name="RunAsPolicyEnabled" Value="true" />
<Parameter Name="ServiceTypeRegistrationTimeout" Value="20" />
</Section>
<Section Name="HttpGateway">
<Parameter Name="IsEnabled" Value="true" />
</Section>
<Section Name="ImageStoreService">
<Parameter Name="MinReplicaSetSize" Value="1" />
<Parameter Name="TargetReplicaSetSize" Value="1" />
</Section>
<Section Name="Management">
<Parameter Name="DisableChecksumValidation" Value="true" />
<Parameter Name="EnableDeploymentAtDataRoot" Value="true" />
<Parameter Name="ImageCachingEnabled" Value="false" />
<Parameter Name="ImageStoreConnectionString" Value="file:C:\SfDevCluster\Data\ImageStoreShare" />
</Section>
<Section Name="NamingService">
<Parameter Name="MinReplicaSetSize" Value="1" />
<Parameter Name="TargetReplicaSetSize" Value="1" />
<Parameter Name="PartitionCount" Value="1" />
</Section>
<Section Name="PlacementAndLoadBalancing">
<Parameter Name="MinLoadBalancingInterval" Value="300" />
<Parameter Name="QuorumBasedReplicaDistributionPerFaultDomains" Value="true" />
<Parameter Name="TraceCRMReasons" Value="false" />
</Section>
<Section Name="ReconfigurationAgent">
<Parameter Name="IsDeactivationInfoEnabled" Value="true" />
<Parameter Name="LocalHealthReportingTimerInterval" Value="5" />
<Parameter Name="MinimumIntervalBetweenRAPMessageRetry" Value="0.5" />
<Parameter Name="RAPMessageRetryInterval" Value="0.5" />
<Parameter Name="RAUpgradeProgressCheckInterval" Value="3" />
<Parameter Name="ServiceApiHealthDuration" Value="20" />
<Parameter Name="ServiceReconfigurationApiHealthDuration" Value="20" />
</Section>
<Section Name="Security">
<Parameter Name="ClusterCredentialType" Value="None" />
<Parameter Name="ServerAuthCredentialType" Value="None" />
</Section>
<Section Name="ServiceFabricEtlFile">
<Parameter Name="DataDeletionAgeInDays" Value="3" />
<Parameter Name="EtlReadIntervalInMinutes" Value="5" />
<Parameter Name="IsEnabled" Value="true" />
<Parameter Name="ProducerType" Value="EtlFileProducer" />
</Section>
<Section Name="ServiceFabricPerfCtrFolder">
<Parameter Name="DataDeletionAgeInDays" Value="3" />
<Parameter Name="FolderType" Value="ServiceFabricPerformanceCounters" />
<Parameter Name="IsEnabled" Value="true" />
<Parameter Name="ProducerType" Value="FolderProducer" />
</Section>
<Section Name="Setup">
<Parameter Name="FabricDataRoot" Value="C:\SfDevCluster\Data" />
<Parameter Name="FabricLogRoot" Value="C:\SfDevCluster\Log" />
<Parameter Name="SkipFirewallConfiguration" Value="true" />
</Section>
<Section Name="Trace/Etw">
<Parameter Name="Level" Value="4" />
</Section>
<Section Name="TransactionalReplicator">
<Parameter Name="CheckpointThresholdInMB" Value="64" />
</Section>
</FabricSettings>
</ClusterManifest>
from service-fabric-cli.
Unfortunately, it's hard to tell just from that. Can you repro the issue and upload your trace (.etl) file from c:\sfcluster? Before uploading, run logman update FabricTraces -fd -ets
to update the traces.
For the local clusters, I've found that re-installing msi and then creating a new cluster or rebooting the machine sometimes helps, if you have no data you want to keep.
Thanks!
from service-fabric-cli.
Where do I upload it to?
Plus i’ve reinstalled, rebooted, tried all sorts to get it working - to no avail
from service-fabric-cli.
You can upload it to here or provide a link I can download it from.
from service-fabric-cli.
from service-fabric-cli.
Thanks for the logs.
The nodes are unable to come up with error access denied because there is some issue with the certificate. I will update this post with more details / a fix in a bit.
from service-fabric-cli.
Cheers much appreciated
from service-fabric-cli.
Thanks for your patience :) How did you deploy this cluster (including the upgrade)?
The specific error is this: "CertCreateSelfSignCertificate failed: E_ACCESSDENIED"
From an elevated PowerShell instance, can you run the following command?
(Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access
It should list at least these 2 entries:
- NT Authority\Network Service: Allow: Read, Execute and Synchronize
- Everyone: Allow: Read, Write, Synchronize
If either are missing, that would explain the failures reported below. The mitigation is to run the following (from an elevated PowerShell instance):
$sid="*S-1-1-0" # everyone
$path=$env:ProgramData+'\Microsoft\Crypto\Rsa\MachineKeys' # machine key store
$perms="(RX,W)" # read, write, and execute
icacls $path /grant $sid`:$perms
If that doesn't work, please let me know and we can try something else.
from service-fabric-cli.
I just used web platform installer to install it
from service-fabric-cli.
from service-fabric-cli.
still not worked out
from service-fabric-cli.
Can you provide the output from the command (Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access
(from an admin PowerShell) from earlier?
Thanks!
from service-fabric-cli.
FileSystemRights : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited : False
InheritanceFlags : None
PropagationFlags : None
FileSystemRights : FullControl
AccessControlType : Allow
IdentityReference : NT AUTHORITY\SYSTEM
IsInherited : True
InheritanceFlags : ContainerInherit, ObjectInherit
PropagationFlags : None
FileSystemRights : FullControl
AccessControlType : Allow
IdentityReference : BUILTIN\Administrators
IsInherited : True
InheritanceFlags : ContainerInherit, ObjectInherit
PropagationFlags : None
FileSystemRights : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : BUILTIN\Users
IsInherited : True
InheritanceFlags : ContainerInherit, ObjectInherit
PropagationFlags : None
FileSystemRights : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited : True
InheritanceFlags : ContainerInherit, ObjectInherit
PropagationFlags : None
from service-fabric-cli.
Thanks for the output!
Looks like the network service doesn’t have access, there should be an entry like this:
FileSystemRights : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : NT AUTHORITY\NETWORK SERVICE
IsInherited : False
InheritanceFlags : None
PropagationFlags : None
Can you try running the following?
$sid="*S-1-5-20" # network service
$path=$env:ProgramData+'\Microsoft\Crypto\Rsa\MachineKeys' # machine key store
$perms="(RX,W)" # read, write, and execute
icacls $path /grant $sid`:$perms
This modifies the first line of the commands from earlier.
from service-fabric-cli.
I got this output now:
FileSystemRights : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited : False
InheritanceFlags : None
PropagationFlags : None
FileSystemRights : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : NT AUTHORITY\NETWORK SERVICE
IsInherited : False
InheritanceFlags : None
PropagationFlags : None
FileSystemRights : FullControl
AccessControlType : Allow
IdentityReference : NT AUTHORITY\SYSTEM
IsInherited : True
InheritanceFlags : ContainerInherit, ObjectInherit
PropagationFlags : None
FileSystemRights : FullControl
AccessControlType : Allow
IdentityReference : BUILTIN\Administrators
IsInherited : True
InheritanceFlags : ContainerInherit, ObjectInherit
PropagationFlags : None
FileSystemRights : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : BUILTIN\Users
IsInherited : True
InheritanceFlags : ContainerInherit, ObjectInherit
PropagationFlags : None
FileSystemRights : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited : True
InheritanceFlags : ContainerInherit, ObjectInherit
PropagationFlags : None
from service-fabric-cli.
no luck with creating a 5 node cluster
Log.zip
from service-fabric-cli.
In good news, it looks like the cert issue is gone! This looks like a firewall issue potentially. I will get in touch with the correct people for this error and get back to you. Thanks!
from service-fabric-cli.
Some of these traces look pretty old. Can you try cleaning the cluster and re-deploying? You can also try uninstalling and reinstalling (maybe rebooting if possible, but you should not need to). If it still doesn't work after that, can you re-upload the traces again? The traces should be up to date then. Thanks!
from service-fabric-cli.
So I can't extract those logs any more. I had to force delete the sfcluster folder because something had removed access from me. I had to reboot in safe mode and delete the directory. Now when I try and zip up the folder using shell zip or zip using 7z it just says access denied on everything. So I'm not sure what is causing that, but it ain't helping
from service-fabric-cli.
Hi @no1melman, apologies for the late reply. Are you still having issues with this?
from service-fabric-cli.
Yeah, it just isn't working, I've updated service fabric, reinstalled, rebooted, set the permissions as you said. What I found out is that the folder mentioned above just locks up and I can't remove it
from service-fabric-cli.
Are you trying to delete/move the log files while the cluster is running? Can you check if FabricHostSvc is running when you get permission denied? If so, then can you stop FabricHostSvc and then Fabric.exe (if any)? Afterwards, can you try again to see if you can get the log files?
from service-fabric-cli.
I've managed to perform the logman command again, and zip up the new log files without issue
from service-fabric-cli.
It's the same certificate issue. Can you run (Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access
(from an admin PowerShell) again to make sure that permissions have not been removed, possibly from a domain policy. If it has not been, then let's set up a time to chat offline and get the cluster up and running.
Thanks!
from service-fabric-cli.
@Christina-Kang can we set up some time to get this cluster running - it still all looks good my end...
from service-fabric-cli.
Sounds good. Can you send me an email at [email protected]? Thanks!
from service-fabric-cli.
Thank you @no1melman for your time working with us on this!
The below work around applies to Windows.
The start up issue was with network service losing permission after it being set, due to a reason unknown. While running the PowerShell commands did not work in this instance, going to the directory C:\ProgramData\Microsoft\Crypto\RSA
and changing permissions for the folder MachineKeys
allowed the cluster to come up correctly.
Permission was changed by right clicking on folder MachineKeys
and selecting the Security
tab and selecting the NETWORK SERVICE
group and giving at minimum read, write, and execute permissions.
A root cause fix will be implemented on the Service Fabric run time. No changes are required of sfctl for this issue.
from service-fabric-cli.
Related Issues (20)
- "(E_INVALIDARG) Invalid argument" sfctl compose create HOT 3
- Using "--no-verify" still getting "CERTIFICATE_VERIFY_FAILED" error HOT 7
- Timeout on provision HOT 8
- Update knack HOT 6
- Incorrect error message given if sudo is not provided
- ConnectionError: HTTPConnectionPool(host='localhost', port=19080) HOT 5
- Sfctl upload returns times out or returns cert error when other commands work HOT 5
- sfctl partition restart return (FMFailoverUnitNotFound) Null HOT 6
- Conflicts between service-fabric-cli and mock HOT 4
- sfctl throwing azure.servicefabric.models.fabric_error_py3.FabricErrorException: (NotFound) Null
- "sfctl cluster upgrade" fails with (FABRIC_E_FABRIC_ALREADY_IN_TARGET_VERSION) Fabric is already in this version HOT 7
- Add documentation about uploading and updating a single service
- Sfctl command to run Chaos test is failing with an error HOT 1
- Remove left over python 2 specifc code
- remove mesh commands
- [sfctl] TLS error: 'bad handshake: Unexpected EOF' when connecting to cluster HOT 1
- Add possibility to add unencrypted repository password on command line instead of promting it (sfctl compose)
- After Install via Windows Powershell, `sfctl` is not recognized
- migrate to new python SDK format
- This repo is missing important files
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from service-fabric-cli.