Coder Social home page Coder Social logo

awslabs / ec2-spot-labs Goto Github PK

View Code? Open in Web Editor NEW
929.0 73.0 319.0 41.21 MB

Collection of tools and code examples to demonstrate best practices in using Amazon EC2 Spot Instances.

Home Page: https://aws.amazon.com/ec2/spot/

License: Other

Python 8.65% Shell 2.69% Dockerfile 0.19% Jupyter Notebook 87.76% Batchfile 0.03% Java 0.17% HCL 0.50%

ec2-spot-labs's Introduction

ec2-spot-labs

ec2-spot-labs is a collection of code examples and scripts that illustrates some of the best practices in using Amazon EC2 Spot Instances.

Issues

Please address any issues or feedback via issues.

ec2-spot-labs's People

Contributors

4pits avatar ashivadi avatar black-mirror-1 avatar chakravn avatar horsfieldsa avatar hyandell avatar ivallhon avatar laxmank2908 avatar michaelaw320 avatar mperi avatar nadaahm avatar oak2278 avatar obaallen avatar ranshn avatar ruecarlo avatar schmutze avatar senkinnar avatar shashankprasanna avatar shultzbs avatar wbenhal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ec2-spot-labs's Issues

price calculation error ?

I think something strange is happening in this script:

$ python get_spot_duration.py --region us-east-1 --product-description 'Linux/UNIX' --bids c3.2xlarge:0.42
Duration Instance Type Availability Zone
11.4 c3.2xlarge us-east-1a
1.0 c3.2xlarge us-east-1e
0.6 c3.2xlarge us-east-1d
0.4 c3.2xlarge us-east-1b
0.3 c3.2xlarge us-east-1c

I think AZ us-east-1a is under stable price of 0.42 $ during 11 hours dollar but in aws spot history graph shows that the price is stable with 4.2 $ instead of 0.42 $

screenshot at 16-54-00

Greetings and thanks!

Failed to update node registry: Unable to get first autoscaling.Group for [node-group-name]

Hello,
I am new to EKS, I am following this link to create worker nodes where i am using combination of on demand as well as spot instances.

https://aws.amazon.com/blogs/compute/run-your-kubernetes-workloads-on-amazon-ec2-spot-instances-with-amazon-eks/

[ec2-user@ip-192-168-100-253 ec2-spot-eks-solution]$ kubectl get nodes
NAME                                               STATUS   ROLES    AGE   VERSION
ip-192-168-101-67.eu-central-1.compute.internal    Ready    <none>   13m   v1.13.8-eks-cd3eb0
ip-192-168-103-103.eu-central-1.compute.internal   Ready    <none>   13m   v1.13.8-eks-cd3eb0
ip-192-168-103-70.eu-central-1.compute.internal    Ready    <none>   13m   v1.13.8-eks-cd3eb0

While using cluster auto-scaler, i am getting below errors.

E0828 16:27:42.353452       1 static_autoscaler.go:168] Failed to update node registry: Unable to get first autoscaling.Group for [REDACTED]
I0828 16:27:42.895068       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:44.905532       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:46.915096       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:48.924381       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:50.934511       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
I0828 16:27:52.353611       1 static_autoscaler.go:114] Starting main loop
E0828 16:27:52.450797       1 static_autoscaler.go:168] Failed to update node registry: Unable to get first autoscaling.Group for [REDACTED]

Here is cluster-autoscaler policy which is attached to NodeInstanceRole

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "autoscaling:DescribeAutoScalingGroups",
                "autoscaling:DescribeAutoScalingInstances",
                "autoscaling:DescribeLaunchConfigurations",
                "autoscaling:SetDesiredCapacity",
                "autoscaling:DescribeTags",
                "autoscaling:TerminateInstanceInAutoScalingGroup",
                "autoscaling:DescribeTags"
            ],
            "Resource": "*",
            "Effect": "Allow",
            "Sid": "K8NodeASGPerms"
        }
    ]
}

cluster-autoscaler-ds.yaml
command:
            - ./cluster-autoscaler
            - --v=4
            - --stderrthreshold=info
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste
            - --nodes=1:3:[<REDACTED>]
            - --nodes=1:3:[<REDACTED>]
            - --nodes=1:3:[<REDACTED>]
            - --skip-nodes-with-system-pods=false
          env:
            - name: AWS_REGION
              value: eu-central-1

Am i missing something?

Encountered non numeric value for property VolumeSize

I tried to use the amazon-eks-nodegroup-with-spot.yaml to provision some EKS nodes. I entered all parameters correctly, but cluster creation fails with:

Encountered non numeric value for property VolumeSize

and

Encountered non numeric value for property MinSize

I quadruple-checked all values entered ARE indeed numeric.

What am I missing?

echoing pid is a static number

$ grep 'echo' /etc/init/spot-instance-termination-notice-handler.conf
echo 2791 > /var/run/spot-instance-notice-handler.pid

This is because the $$ isn't escaped like the other dollar signs in other files. Other option is to quote the EOF so the dollar signs arent processed on the fly (and therefore not need to quote the dollar signs):

cat <<"EOF" > /etc/init/spot-instance-termination-notice-handler.conf

Breaking change in "spot-instance-termination-notice-handler" due to curl version

https://github.com/awslabs/ec2-spot-labs/blob/master/ecs-ec2-spot-fleet/ecs-ec2-spot-fleet.yaml#L442

[ec2-user@ip-172-31-7-218 ~]$ curl --version
curl 7.61.1 (x86_64-koji-linux-gnu) libcurl/7.61.1 OpenSSL/1.0.2k zlib/1.2.7 libidn2/2.3.0 libssh2/1.4.3 nghttp2/1.41.0
Release-Date: 2018-09-05
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz HTTP2 UnixSockets HTTPS-proxy Metalink 
[ec2-user@ip-172-31-7-218 ~]$ [ -z $(curl -Isf http://169.254.169.254/latest/meta-data/spot/termination-time) ];
[ec2-user@ip-172-31-7-218 ~]$ 
[ec2-user@ip-172-31-7-218 ~]$ curl --version
curl 7.76.1 (x86_64-koji-linux-gnu) libcurl/7.76.1 OpenSSL/1.0.2k-fips zlib/1.2.7 libidn2/2.3.0 libssh2/1.4.3 nghttp2/1.41.0
Release-Date: 2021-04-14
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp 
Features: alt-svc AsynchDNS GSS-API HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz Metalink NTLM NTLM_WB SPNEGO SSL UnixSockets
[ec2-user@ip-172-31-7-218 ~]$ [ -z $(curl -Isf http://169.254.169.254/latest/meta-data/spot/termination-time) ];
-bash: [: too many arguments

If customer uses spot-instance-termination-notice-handler.sh script in their environment due the change curl version output format, the code will always execute else statement and thereby will keep all the on boarded registered container instances into draining state for lifetime despite whether the instance is on demand or spot

No data

On a mac.

MacBook-Pro:ec2-spot-duration XXXXXX$ python get_spot_duration.py --r us-east-1 --product-description 'Linux/UNIX' --bids c5.18xlarge:3
Duration	Instance Type	Availability Zone

I have messed around with the bids, nothing. Additionally have used --region instead of --r

AWS CLI is up date, as well as AWS Shell

Not possible to bootstrap Spot Instance from userdata

Hi,

I'm trying to spin up ec2 spot instances and attach them to my EKS cluster, but your user-data instructions specified in https://github.com/awslabs/ec2-spot-labs/blob/master/ec2-spot-eks-solution/provision-worker-nodes/amazon-eks-nodegroup-with-spot.yaml are not working:

aws eks describe-cluster --region=us-east-1 --service-name=aws-eks-spot-serverless-demo-dev --query 'cluster.{certificateAuthorityData: certificateAuthority.data, endpoint: endpoint}' --debug
2018-09-23 17:40:03,396 - MainThread - awscli.clidriver - DEBUG - CLI version: aws-cli/1.16.13 Python/2.7.14 Linux/4.14.62-70.117.amzn2.x86_64 botocore/1.12.3
2018-09-23 17:40:03,397 - MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['eks', 'describe-cluster', '--region=us-east-1', '--service-name=aws-eks-spot-serverless-demo-dev', '--query', 'cluster.{certificateAuthorityData: certificateAuthority.data, endpoint: endpoint}', '--debug']
2018-09-23 17:40:03,397 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function add_scalar_parsers at 0x7f9b98c43758>
2018-09-23 17:40:03,397 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,398 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function register_uri_param_handler at 0x7f9b92696ed8>
2018-09-23 17:40:03,398 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,398 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function inject_assume_role_provider_cache at 0x7f9b92663230>
2018-09-23 17:40:03,398 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,398 - MainThread - botocore.session - DEBUG - Loading variable credentials_file from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable config_file from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable metadata_service_timeout from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,399 - MainThread - botocore.session - DEBUG - Loading variable metadata_service_num_attempts from defaults.
2018-09-23 17:40:03,400 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,401 - MainThread - botocore.hooks - DEBUG - Event session-initialized: calling handler <function attach_history_handler at 0x7f9b91e501b8>
2018-09-23 17:40:03,401 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,401 - MainThread - botocore.session - DEBUG - Loading variable profile from defaults.
2018-09-23 17:40:03,401 - MainThread - botocore.session - DEBUG - Loading variable api_versions from defaults.
2018-09-23 17:40:03,402 - MainThread - botocore.loaders - DEBUG - Loading JSON file: /root/.aws/models/eks/2017-11-01/service-2.json
2018-09-23 17:40:03,418 - MainThread - botocore.hooks - DEBUG - Event service-data-loaded.eks: calling handler <function register_retries_for_service at 0x7f9b9383c7d0>
2018-09-23 17:40:03,418 - MainThread - awscli.clidriver - DEBUG - Exception caught in main()
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 207, in main
    return command_table[parsed_args.command](remaining, parsed_args)
  File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 341, in __call__
    service_parser = self._create_parser()
  File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 381, in _create_parser
    command_table = self._get_command_table()
  File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 326, in _get_command_table
    self._command_table = self._create_command_table()
  File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 348, in _create_command_table
    service_model = self._get_service_model()
  File "/usr/lib/python2.7/site-packages/awscli/clidriver.py", line 334, in _get_service_model
    self._service_name, api_version=api_version)
  File "/usr/lib/python2.7/site-packages/botocore/session.py", line 540, in get_service_model
    service_description = self.get_service_data(service_name, api_version)
  File "/usr/lib/python2.7/site-packages/botocore/session.py", line 568, in get_service_data
    service_name=service_name, session=self)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 356, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 228, in emit
    return self._emit(event_name, kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 211, in _emit
    response = handler(**kwargs)
  File "/usr/lib/python2.7/site-packages/botocore/handlers.py", line 278, in register_retries_for_service
    service_event_name = hyphenize_service_id(service_id)
  File "/usr/lib/python2.7/site-packages/botocore/utils.py", line 981, in hyphenize_service_id
    return service_id.replace(' ', '-').lower()
AttributeError: 'NoneType' object has no attribute 'replace'
2018-09-23 17:40:03,419 - MainThread - awscli.clidriver - DEBUG - Exiting with rc 255

'NoneType' object has no attribute 'replace'

I guess something wrong with model.

AMI (official from AWS documentation): ami-0440e4f6b9713faf6

aws --version
aws-cli/1.16.13 Python/2.7.14 Linux/4.14.62-70.117.amzn2.x86_64 botocore/1.12.3

Syntax errors in policy

Hi,

I have been following this blog to spawn multiple spot instances. However, while, creating the policy using the command below,

aws iam create-policy \
    --policy-name ec2-permissions-dl-training  \
    --policy-document ec2-permissions-dl-training.json

I am facing the error:

An error occurred (MalformedPolicyDocument) when calling the CreatePolicy operation: Syntax errors in policy.

Could you suggest what could be the issue with this policy document that is used.

/meta-data/spot/ not supported anymore

the daemon set returns 404 as /meta-data/spot/ is not available. I checked thru the EC2 console that Lifecycle is spot
Any other ways to get interruption notices?
this is what I get when I kubectl exec to the pod that runs the spot-sig.

/ # curl http://169.254.169.254/latest/meta-data/spot
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
  <h1>404 - Not Found</h1>
 </body>
</html>

Reference - ec2-spot-allocation-strategies

Hi, I've a use case where I need to use the spot fleet. I'm referring to the solution from "asg-capacity-optimized.json" . I would like to know how do I handle the interruption if I use this solution. Also for this capacity-optimized spot fleet startergy. How do I can use the combination of on demand and spot instances.

I'm going to try this from the kops launch configuration template to provision the self managed k8's cluster . I believe this should work. Please advise.

Thanks for making it clear. I've also watched your https://www.youtube.com/watch?v=whFb8YHjdFo its pretty clear demonstration. We're planning to use this in dev environment first. The plan is keep away from spotinst tool.

Workshop and Builders Session Resource Materials

Thanks for making this list. There are so many wonderful sessions at re:Invent, it’s hard to get to all of them. Do you think the detailed walkthrough documentation that each Workshop and Builders Session has will be made public? They are fantastic learning resources.

Error while running ec2_spot_keras_training.py

Hi,

While trying to follow the example on this blog post - I keep getting this error in the ec2_spot_keras_training.py script - 'AttributeError: 'SpotTermination' object has no attribute 'on_train_batch_begin' and 'on_train_batch_end' -
image

Please help with this.

Thanks and Regards,
Arpit

use Fn:Base64 for template values

This is difficult to understand and provides no value:

        UserData:
          IyEvYmluL2Jhc2gKeXVtIHVwZGF0ZSAteQphbWF6b24tbGludXgtZXh0cmFzIGluc3RhbGwgLXkgbGFtcC1tYXJpYWRiMTAuMi1waHA3LjIgcGhwNy4yCnl1bSBpbnN0YWxsIC15IGh0dHBkIG1hcmlhZGItc2VydmVyCnN5c3RlbWN0bCBzdGFydCBodHRwZApzeXN0ZW1jdGwgZW5hYmxlIGh0dHBkCnVzZXJtb2QgLWEgLUcgYXBhY2hlIGVjMi11c2VyCmNob3duIC1SIGVjMi11c2VyOmFwYWNoZSAvdmFyL3d3dwpjaG1vZCAyNzc1IC92YXIvd3d3CmZpbmQgL3Zhci93d3cgLXR5cGUgZCAtZXhlYyBjaG1vZCAyNzc1IHt9IFw7CmZpbmQgL3Zhci93d3cgLXR5cGUgZiAtZXhlYyBjaG1vZCAwNjY0IHt9IFw7CmVjaG8gIjw/cGhwIHBocGluZm8oKTsgPz4iID4gL3Zhci93d3cvaHRtbC9waHBpbmZvLnBocA==

https://github.com/awslabs/ec2-spot-labs/blob/master/builder-sessions/ec2-asg-with-lt/create-asg-with-lt.yaml#L11

Please consider using the Fn:Base64 intrinsic function so we can understand what is happening there.
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/intrinsic-function-reference-base64.html

[stderr] require-dev.mikey179/vfsStream is invalid

When running the deployment of the app with CodeDeploy it fails due to the fact that vfsStream is called with an uppercase S.
The fix is very simple:composer.json should be edited and vfsStream should be vfsstream.
I did the change locally and verified it works. I will create a pull request with the fix soon.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.