Coder Social home page Coder Social logo

Comments (8)

nchammas avatar nchammas commented on June 13, 2024

Are you seeing in the AWS Console that the instances (especially the master instance) are still online when you try to SSH back in?

If SSH is timing out, then it seems like either the instance is down, or the security group rules changed somehow, or the SSH keys were changed somehow. There could be other possibilities of course, but of those three I feel like the first is most likely so that's where I would start.

from flintrock.

dorienh avatar dorienh commented on June 13, 2024

Yes I can see in the aws console that the instances are still running.

I just rebooted my Mac and somehow I could login again. Not sure if the ssh was blocking me or so? Hope it doesn't happen again.

from flintrock.

nchammas avatar nchammas commented on June 13, 2024

Perhaps it's something weird happening with your network? Anyway, feel free to reopen this issue if you have more resolution on where the problem might be coming from.

from flintrock.

dorienh avatar dorienh commented on June 13, 2024

Actually it came back the next day again, can't fix it now. I am at university on Eduroam, also tried mobile hotspot. Same thing.

Even if I login with ssh it times out. My instances are running and 2/2 checks passed.

When I flintrock launch cluster8

It now gives me:

opt/homebrew/lib/python3.10/site-packages/paramiko/transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
2023-03-31 09:28:20,913 - flintrock.ec2       - INFO  - Launching 4 instances...
2023-03-31 09:28:37,173 - flintrock.ec2       - DEBUG - 4 instances not in state 'running': 'i-09436562527109c97', 'i-09688b6b7e02ff6c0', 'i-0386eb57946f277df', ...
2023-03-31 09:28:42,171 - flintrock.ec2       - DEBUG - 4 instances not in state 'running': 'i-09436562527109c97', 'i-0386eb57946f277df', 'i-073c7c86259375c10', ...
2023-03-31 09:28:46,037 - flintrock.ec2       - DEBUG - 4 instances not in state 'running': 'i-09436562527109c97', 'i-0386eb57946f277df', 'i-073c7c86259375c10', ...

endlessly...

from flintrock.

nchammas avatar nchammas commented on June 13, 2024

What do you see in the console for these instances that don't seem to launch normally?

from flintrock.

dorienh avatar dorienh commented on June 13, 2024

Allow me a few days to see if it re-appears. I ended up doing a full reset of the aws environment (was using a learners lab). Any command in particular I should be sure to check in the console?

from flintrock.

dorienh avatar dorienh commented on June 13, 2024

It just happened again.

flintrock login cluster
/opt/homebrew/lib/python3.10/site-packages/paramiko/transport.py:219: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
ssh: connect to host 54.205.234.98 port 22: Operation timed out

My ~/.aws/credentials are up to date.

Looking at the dashbboard I see nothing special. See screenshot here.

This is a fresh cluster. Only logged in once after launch. I setup pydoop/yarn. Then exited. A few days later try to login and it keeps timing out.

Anything else I can check or reports I can generate?

I also tried the AWSSupport-TroubleshootSSH in System Services but that process got stuck and some steps failed:

cda7fc3a-cee3-4f29-9c4c-b04734b91d65 | 1 | assertInstanceIsManagedInstance | aws:assertAwsResourceProperty | Failed | Wed, 05 Apr 2023 01:38:00 GMT | Wed, 05 Apr 2023 01:38:00 GMT
-- | -- | -- | -- | -- | -- | --

3cb6b036-5b86-40ef-a54c-07dade48c315 | 2 | assertAllowOffline | aws:assertAwsResourceProperty | Success | Wed, 05 Apr 2023 01:38:01 GMT | Wed, 05 Apr 2023 01:38:01 GMT

f28f0344-e556-46bd-8634-00f96827e9c6 | 3 | assertActionIsFixAll | aws:assertAwsResourceProperty | Success | Wed, 05 Apr 2023 01:38:01 GMT | Wed, 05 Apr 2023 01:38:02 GMT

0dcc8230-715c-4514-84e3-7d81042ec3bf | 4 | assertSubnetId | aws:assertAwsResourceProperty | Success | Wed, 05 Apr 2023 01:38:02 GMT | Wed, 05 Apr 2023 01:38:02 GMT

b88be25f-d38f-446f-af2d-b6c308912c6d | 5 | describeSourceInstance | aws:executeAwsApi | Success | Wed, 05 Apr 2023 01:38:03 GMT | Wed, 05 Apr 2023 01:38:03 GMT

74f0b54d-c5ee-4ac0-9858-c82efd345bdc | 6 | troubleshootSSHOfflineWithSubnetId | aws:executeAutomation | Failed | Wed, 05 Apr 2023 01:38:03 GMT | Wed, 05 Apr 2023 01:38:12 GMT

f631870d-2d3b-40e7-b2df-0d7d4dc7419a | 7 | installEC2Rescue | aws:runCommand | Pending | - | -

28666d80-ecaa-437f-a668-f2cfece634d2 | 8 | troubleshootSSH | aws:runCommand | Pending | - | -

1822d358-a7e4-4208-a99f-ee1a6b109fbc | 9 | troubleshootSSHOffline | aws:executeAutomation | Pending | - | -

from flintrock.

nchammas avatar nchammas commented on June 13, 2024

The lockout after some days smells like a firewall or network issue that's independent of Flintrock. But I'm not sure how to debug this because it could be so many different things. Do you have an AWS admin who can help you investigate?

from flintrock.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.