Coder Social home page Coder Social logo

Timeout on provision about service-fabric-cli HOT 8 OPEN

jfalameda avatar jfalameda commented on July 21, 2024
Timeout on provision

from service-fabric-cli.

Comments (8)

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Hi @jfalameda, sorry to hear that you're running into this issue. To help us debug, can you share the output from running this command with the --debug option? Could you also share the cluster traces and the approximate time at which you ran the provision command? The traces would be in /etc/servicefabric/FabricDataRoot. Thanks!

from service-fabric-cli.

jfalameda avatar jfalameda commented on July 21, 2024

Hi @Christina-Kang,

It seems that the provision is taking few minutes and the timeout for the call to provision is 100.

You can download the tracing from here:

https://drive.google.com/file/d/17cAPVzd6v3v7PULhgxyN2_h4SSGRIMmL/view?usp=sharing

Kind Regards,
/ José.

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Hi @jfalameda,

Thanks for your patience! It looks like the actual timeout duration is not the issue here, but rather, an issue with permissions, which holds up the operation until it eventually times out.

Issue is with certain folders not being ACLed correctly. From your traces, it looks like the source is this:
/home/ClusterDeployer/ClusterData/Data/N0030/Fabric/work/Applications/__FabricSystem_App4294967295/work/Store and the destination is this: /home/ClusterDeployer/ClusterData/Data/N0030/Fabric/work/ImageBuilderProxy/

There are also paths under different "nodes", represented in this path by "N0010" instead of the "N0030" in the previous path: /home/ClusterDeployer/ClusterData/Data/N0010/Fabric/work/Applications/__FabricSystem_App4294967295/work/Store. The same is for the destination path.

Can you check all these paths (N0010, N0020, N0030) to make sure that users have the correct permissions, and if not, chmod everything under those paths to give user permissions to rwx?

You should be able to retry upload after this is done. You should not need to, but if the above doesn't work, try restarting fabric and seeing if that plus the permission changes allows provision to continue. Please let me know if this mitigates the issue for you.

This issue happens rarely, so if you were to have a cluster in azure, or on a different dev machine, this issue most likely won't reoccur.

Hope this helps!
Christina

from service-fabric-cli.

jfalameda avatar jfalameda commented on July 21, 2024

Hi @Christina-Kang ,

I can see the application provisioning now. I still get the timeout error as the application takes a lot of time to provision.

It is weird that I need to fix permission, as I am using the docker based service fabric.

Thanks.

Kind Regards,
/ José.

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

Hi @jfalameda,

Is it timing out with the same error / stack as posted above still? Where the timeout is 100 for read timeouts, rather than the passed in time out.

Thanks!

from service-fabric-cli.

jfalameda avatar jfalameda commented on July 21, 2024

Hi @Christina-Kang ,

Yes, the application times out the same way. I can see on SF that the application is still provisioning. The application disappears when the provisioning is done without any error, therefore I cannot publish it.

PS: After checking the checksums it shows "unprovisioning." then it disappears from the applications list.

Kind Regards,
/ José.

from service-fabric-cli.

jfalameda avatar jfalameda commented on July 21, 2024

Hi @Christina-Kang

Unfortunately 7.1.0 didn't resolve this issue. The timeout is not about uploading the application, but to provision it. The application takes few minutes to provision, while sfctl timeouts before it has completed. By the time it is 100% provisioned, the application unprovision itself automatically.

This is the error I get on the client, I can see the applicaiton is still provisioning on the service fabric explorer.

Error occurred in request., ReadTimeout: HTTPConnectionPool(host='localhost', port=19080): Read timed out. (read timeout=100)
Traceback (most recent call last):
  File "/home/jose/.local/lib/python2.7/site-packages/knack/cli.py", line 206, in invoke
    cmd_result = self.invocation.execute(args)
  File "/home/jose/.local/lib/python2.7/site-packages/sfctl/entry.py", line 81, in execute
    return super(SFInvoker, self).execute(args)
  File "/home/jose/.local/lib/python2.7/site-packages/knack/invocation.py", line 188, in execute
    cmd_result = parsed_args.func(params)
  File "/home/jose/.local/lib/python2.7/site-packages/knack/commands.py", line 105, in __call__
    return self.handler(*args, **kwargs)
  File "/home/jose/.local/lib/python2.7/site-packages/knack/commands.py", line 212, in _command_handler
    result = op(client, **command_args) if client else op(**command_args)
  File "/home/jose/.local/lib/python2.7/site-packages/sfctl/custom_app_type.py", line 117, in provision_application_type
    request, header_parameters, body_content_sorted)
  File "/home/jose/.local/lib/python2.7/site-packages/msrest/service_client.py", line 336, in send
    pipeline_response = self.config.pipeline.run(request, **kwargs)
  File "/home/jose/.local/lib/python2.7/site-packages/msrest/pipeline/__init__.py", line 197, in run
    return first_node.send(pipeline_request, **kwargs)  # type: ignore
  File "/home/jose/.local/lib/python2.7/site-packages/msrest/pipeline/__init__.py", line 150, in send
    response = self.next.send(request, **kwargs)
  File "/home/jose/.local/lib/python2.7/site-packages/msrest/pipeline/requests.py", line 72, in send
    return self.next.send(request, **kwargs)

Kind Regards,
/ Jose.

from service-fabric-cli.

Christina-Kang avatar Christina-Kang commented on July 21, 2024

@jfalameda While the timeout may occur on the client side (this is a bug on our part, that the timeout can occur before the cluster times out), the cluster should continue its work with provisioning. There should be no unprovision happening, even on error, from the cluster side. If you are running the upload / provision / create process as part of a script, this may be causing the unprovisioning to happen. If this is not the case, are you available for a call with us so we can debug this faster? You can contact me at [email protected]

Thanks!

from service-fabric-cli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.