Coder Social home page Coder Social logo

Comments (33)

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Could you share the DNS settings in cluster manifest please, making sure to remove any sensitive information (if any).
Specifically, we are looking for a section that looks similar to the following:

<Section Name="DnsService">
      <Parameter Name="IsEnabled" Value="true" />
</Section>

What version of service fabric is your cluster? Thanks!

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

Fabric Version: 6.1.480.9494

<Section Name="DnsService">
  <Parameter Name="InstanceCount" Value="1" />
  <Parameter Name="IsEnabled" Value="True" />
</Section>

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

I get this error:

Unhealthy event: SourceId='System.FabricDnsService', Property='Socket', HealthState='Warning', ConsiderWarningAsError=false.
DnsService UDP listener is unable to start. Please make sure there are no processes listening on the DNS port 53.
List of processes listening on the DNS port:
UDP 0.0.0.0:53 : 6212

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

microsoft/service-fabric-issues#803

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Are you able to upgrade the cluster to a newer version? This is a known issue which has been fixed in newer releases (6.2 and up). If you are not able to update, you can use the mitigation with ICS mentioned in the thread. This issue will also not appear in cloud clusters, it shows up only in local clusters. Please let me know if you run into issues applying those mitigations. Thanks!

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

I've updated - now I am unable to start my cluster... where can I got to find the logs for the reason why, the c:\sfcluster only has etl logs which I have no way of understanding....

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

My manifest now:

<?xml version="1.0" encoding="utf-8"?>
<ClusterManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Name="DevCluster" Version="0" Description="This is a generated file. Do not modify." xmlns="http://schemas.microsoft.com/2011/01/fabric">
  <NodeTypes>
    <NodeType Name="NodeType0">
      <Endpoints>
        <ClientConnectionEndpoint Port="19000" />
        <LeaseDriverEndpoint Port="19001" />
        <ClusterConnectionEndpoint Port="19002" />
        <HttpGatewayEndpoint Port="19080" Protocol="http" />
        <HttpApplicationGatewayEndpoint Port="19081" Protocol="http" />
        <ServiceConnectionEndpoint Port="19006" />
        <ApplicationEndpoints StartPort="30001" EndPort="31000" />
      </Endpoints>
      <PlacementProperties>
        <Property Name="NodeTypeName" Value="NodeType0" />
      </PlacementProperties>
    </NodeType>
  </NodeTypes>
  <Infrastructure>
    <WindowsServer IsScaleMin="true">
      <NodeList>
        <Node NodeName="_Node_0" IPAddressOrFQDN="DESKTOP-ITQGDFK" IsSeedNode="true" NodeTypeRef="NodeType0" FaultDomain="fd:/0" UpgradeDomain="0" />
      </NodeList>
    </WindowsServer>
  </Infrastructure>
  <FabricSettings>
    <Section Name="ApplicationGateway/Http">
      <Parameter Name="IsEnabled" Value="true" />
    </Section>
    <Section Name="ClusterManager">
      <Parameter Name="MinReplicaSetSize" Value="1" />
      <Parameter Name="TargetReplicaSetSize" Value="1" />
      <Parameter Name="UpgradeStatusPollInterval" Value="5" />
      <Parameter Name="UpgradeHealthCheckInterval" Value="5" />
      <Parameter Name="FabricUpgradeHealthCheckInterval" Value="5" />
    </Section>
    <Section Name="Diagnostics">
      <Parameter Name="ProducerInstances" Value="ServiceFabricEtlFile,ServiceFabricPerfCtrFolder" />
      <Parameter Name="MaxDiskQuotaInMB" Value="10240" />
      <Parameter Name="EnableCircularTraceSession" Value="true" />
    </Section>
    <Section Name="DnsService">
      <Parameter Name="InstanceCount" Value="-1" />
      <Parameter Name="IsEnabled" Value="True" />
      <Parameter Name="AllowMultipleListeners" Value="true" />
    </Section>
    <Section Name="FabricClient">
      <Parameter Name="HealthReportSendInterval" Value="0" />
    </Section>
    <Section Name="Failover">
      <Parameter Name="NodeUpRetryInterval" Value="1" />
      <Parameter Name="SendToFMTimeout" Value="1" />
    </Section>
    <Section Name="FailoverManager">
      <Parameter Name="ExpectedClusterSize" Value="1" />
      <Parameter Name="IsSingletonReplicaMoveAllowedDuringUpgrade" Value="false" />
      <Parameter Name="MinReplicaSetSize" Value="1" />
      <Parameter Name="TargetReplicaSetSize" Value="1" />
      <Parameter Name="ClusterStableWaitDuration" Value="0" />
      <Parameter Name="PeriodicStateScanInterval" Value="1" />
      <Parameter Name="ReconfigurationTimeLimit" Value="20" />
      <Parameter Name="BuildReplicaTimeLimit" Value="20" />
      <Parameter Name="CreateInstanceTimeLimit" Value="20" />
      <Parameter Name="PlacementTimeLimit" Value="20" />
      <Parameter Name="ServiceLocationBroadcastInterval" Value="1" />
      <Parameter Name="ServiceLookupTableEmptyBroadcastInterval" Value="1" />
      <Parameter Name="MinRebuildRetryInterval" Value="1" />
      <Parameter Name="MaxRebuildRetryInterval" Value="1" />
    </Section>
    <Section Name="Federation">
      <Parameter Name="NodeIdGeneratorVersion" Value="V4" />
      <Parameter Name="ProcessAssertExitTimeout" Value="86400" />
      <Parameter Name="UnresponsiveDuration" Value="0" />
    </Section>
    <Section Name="Hosting">
      <Parameter Name="CacheCleanupScanInterval" Value="300" />
      <Parameter Name="DeactivationGraceInterval" Value="2" />
      <Parameter Name="DeactivationScanInterval" Value="600" />
      <Parameter Name="DeploymentRetryBackoffInterval" Value="1" />
      <Parameter Name="EnableProcessDebugging" Value="true" />
      <Parameter Name="EndpointProviderEnabled" Value="true" />
      <Parameter Name="RunAsPolicyEnabled" Value="true" />
      <Parameter Name="ServiceTypeRegistrationTimeout" Value="20" />
    </Section>
    <Section Name="HttpGateway">
      <Parameter Name="IsEnabled" Value="true" />
    </Section>
    <Section Name="ImageStoreService">
      <Parameter Name="MinReplicaSetSize" Value="1" />
      <Parameter Name="TargetReplicaSetSize" Value="1" />
    </Section>
    <Section Name="Management">
      <Parameter Name="DisableChecksumValidation" Value="true" />
      <Parameter Name="EnableDeploymentAtDataRoot" Value="true" />
      <Parameter Name="ImageCachingEnabled" Value="false" />
      <Parameter Name="ImageStoreConnectionString" Value="file:C:\SfDevCluster\Data\ImageStoreShare" />
    </Section>
    <Section Name="NamingService">
      <Parameter Name="MinReplicaSetSize" Value="1" />
      <Parameter Name="TargetReplicaSetSize" Value="1" />
      <Parameter Name="PartitionCount" Value="1" />
    </Section>
    <Section Name="PlacementAndLoadBalancing">
      <Parameter Name="MinLoadBalancingInterval" Value="300" />
      <Parameter Name="QuorumBasedReplicaDistributionPerFaultDomains" Value="true" />
      <Parameter Name="TraceCRMReasons" Value="false" />
    </Section>
    <Section Name="ReconfigurationAgent">
      <Parameter Name="IsDeactivationInfoEnabled" Value="true" />
      <Parameter Name="LocalHealthReportingTimerInterval" Value="5" />
      <Parameter Name="MinimumIntervalBetweenRAPMessageRetry" Value="0.5" />
      <Parameter Name="RAPMessageRetryInterval" Value="0.5" />
      <Parameter Name="RAUpgradeProgressCheckInterval" Value="3" />
      <Parameter Name="ServiceApiHealthDuration" Value="20" />
      <Parameter Name="ServiceReconfigurationApiHealthDuration" Value="20" />
    </Section>
    <Section Name="Security">
      <Parameter Name="ClusterCredentialType" Value="None" />
      <Parameter Name="ServerAuthCredentialType" Value="None" />
    </Section>
    <Section Name="ServiceFabricEtlFile">
      <Parameter Name="DataDeletionAgeInDays" Value="3" />
      <Parameter Name="EtlReadIntervalInMinutes" Value="5" />
      <Parameter Name="IsEnabled" Value="true" />
      <Parameter Name="ProducerType" Value="EtlFileProducer" />
    </Section>
    <Section Name="ServiceFabricPerfCtrFolder">
      <Parameter Name="DataDeletionAgeInDays" Value="3" />
      <Parameter Name="FolderType" Value="ServiceFabricPerformanceCounters" />
      <Parameter Name="IsEnabled" Value="true" />
      <Parameter Name="ProducerType" Value="FolderProducer" />
    </Section>
    <Section Name="Setup">
      <Parameter Name="FabricDataRoot" Value="C:\SfDevCluster\Data" />
      <Parameter Name="FabricLogRoot" Value="C:\SfDevCluster\Log" />
      <Parameter Name="SkipFirewallConfiguration" Value="true" />
    </Section>
    <Section Name="Trace/Etw">
      <Parameter Name="Level" Value="4" />
    </Section>
    <Section Name="TransactionalReplicator">
      <Parameter Name="CheckpointThresholdInMB" Value="64" />
    </Section>
  </FabricSettings>
</ClusterManifest>

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Unfortunately, it's hard to tell just from that. Can you repro the issue and upload your trace (.etl) file from c:\sfcluster? Before uploading, run logman update FabricTraces -fd -ets to update the traces.

For the local clusters, I've found that re-installing msi and then creating a new cluster or rebooting the machine sometimes helps, if you have no data you want to keep.

Thanks!

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

Where do I upload it to?

Plus i’ve reinstalled, rebooted, tried all sorts to get it working - to no avail

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

You can upload it to here or provide a link I can download it from.

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

Log.zip

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Thanks for the logs.

The nodes are unable to come up with error access denied because there is some issue with the certificate. I will update this post with more details / a fix in a bit.

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

Cheers much appreciated

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Thanks for your patience :) How did you deploy this cluster (including the upgrade)?

The specific error is this: "CertCreateSelfSignCertificate failed: E_ACCESSDENIED"

From an elevated PowerShell instance, can you run the following command?

(Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access

It should list at least these 2 entries:
- NT Authority\Network Service: Allow: Read, Execute and Synchronize
- Everyone: Allow: Read, Write, Synchronize

If either are missing, that would explain the failures reported below. The mitigation is to run the following (from an elevated PowerShell instance):
 

$sid="*S-1-1-0" # everyone
$path=$env:ProgramData+'\Microsoft\Crypto\Rsa\MachineKeys' # machine key store
$perms="(RX,W)" # read, write, and execute
icacls $path /grant $sid`:$perms

If that doesn't work, please let me know and we can try something else.

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

I just used web platform installer to install it

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

Log.zip

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

still not worked out

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Can you provide the output from the command (Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access (from an admin PowerShell) from earlier?

Thanks!

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024
FileSystemRights  : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited       : False
InheritanceFlags  : None
PropagationFlags  : None

FileSystemRights  : FullControl
AccessControlType : Allow
IdentityReference : NT AUTHORITY\SYSTEM
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : FullControl
AccessControlType : Allow
IdentityReference : BUILTIN\Administrators
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : BUILTIN\Users
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Thanks for the output!

Looks like the network service doesn’t have access, there should be an entry like this:

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : NT AUTHORITY\NETWORK SERVICE
IsInherited       : False
InheritanceFlags  : None
PropagationFlags  : None

Can you try running the following?

$sid="*S-1-5-20" # network service
$path=$env:ProgramData+'\Microsoft\Crypto\Rsa\MachineKeys' # machine key store
$perms="(RX,W)" # read, write, and execute
icacls $path /grant $sid`:$perms

This modifies the first line of the commands from earlier.

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

I got this output now:

FileSystemRights  : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited       : False
InheritanceFlags  : None
PropagationFlags  : None

FileSystemRights  : Write, ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : NT AUTHORITY\NETWORK SERVICE
IsInherited       : False
InheritanceFlags  : None
PropagationFlags  : None

FileSystemRights  : FullControl
AccessControlType : Allow
IdentityReference : NT AUTHORITY\SYSTEM
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : FullControl
AccessControlType : Allow
IdentityReference : BUILTIN\Administrators
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : BUILTIN\Users
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

FileSystemRights  : ReadAndExecute, Synchronize
AccessControlType : Allow
IdentityReference : Everyone
IsInherited       : True
InheritanceFlags  : ContainerInherit, ObjectInherit
PropagationFlags  : None

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

no luck with creating a 5 node cluster
Log.zip

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

In good news, it looks like the cert issue is gone! This looks like a firewall issue potentially. I will get in touch with the correct people for this error and get back to you. Thanks!

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Some of these traces look pretty old. Can you try cleaning the cluster and re-deploying? You can also try uninstalling and reinstalling (maybe rebooting if possible, but you should not need to). If it still doesn't work after that, can you re-upload the traces again? The traces should be up to date then. Thanks!

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

So I can't extract those logs any more. I had to force delete the sfcluster folder because something had removed access from me. I had to reboot in safe mode and delete the directory. Now when I try and zip up the folder using shell zip or zip using 7z it just says access denied on everything. So I'm not sure what is causing that, but it ain't helping

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Hi @no1melman, apologies for the late reply. Are you still having issues with this?

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

Yeah, it just isn't working, I've updated service fabric, reinstalled, rebooted, set the permissions as you said. What I found out is that the folder mentioned above just locks up and I can't remove it

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Are you trying to delete/move the log files while the cluster is running? Can you check if FabricHostSvc is running when you get permission denied? If so, then can you stop FabricHostSvc and then Fabric.exe (if any)? Afterwards, can you try again to see if you can get the log files?

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

I've managed to perform the logman command again, and zip up the new log files without issue

Log.zip

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

It's the same certificate issue. Can you run (Get-Acl -Path ($env:ProgramData+'\Microsoft\Crypto\rsa\MachineKeys\')).Access (from an admin PowerShell) again to make sure that permissions have not been removed, possibly from a domain policy. If it has not been, then let's set up a time to chat offline and get the cluster up and running.

Thanks!

from service-fabric-cli.

no1melman avatar no1melman commented on July 21, 2024

@Christina-Kang can we set up some time to get this cluster running - it still all looks good my end...

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Sounds good. Can you send me an email at [email protected]? Thanks!

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Thank you @no1melman for your time working with us on this!

The below work around applies to Windows.

The start up issue was with network service losing permission after it being set, due to a reason unknown. While running the PowerShell commands did not work in this instance, going to the directory C:\ProgramData\Microsoft\Crypto\RSA and changing permissions for the folder MachineKeys allowed the cluster to come up correctly.

Permission was changed by right clicking on folder MachineKeys and selecting the Security tab and selecting the NETWORK SERVICE group and giving at minimum read, write, and execute permissions.

A root cause fix will be implemented on the Service Fabric run time. No changes are required of sfctl for this issue.

from service-fabric-cli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.