Coder Social home page Coder Social logo

aws-samples / aws-research-workshops Goto Github PK

View Code? Open in Web Editor NEW
117.0 17.0 60.0 53.88 MB

This repo provides a managed SageMaker jupyter notebook with a number of notebooks for hands on workshops in data lakes, AI/ML, Batch, IoT, and Genomics.

License: MIT No Attribution

Jupyter Notebook 79.02% Python 11.93% HTML 2.71% JavaScript 2.90% Shell 3.28% CSS 0.07% Dockerfile 0.09%

aws-research-workshops's Introduction

AWS Research Workshops

This repo provides a managed SageMaker jupyter notebook with a number of notebooks for hands on workshops in data lakes, AI/ML, Batch, IoT, and Genomics.

Quickstart

To get the AWS Research Workshop Notebook up and running in your own AWS account, follow these steps (if you do not have an AWS account, please see How do I create and activate a new Amazon Web Services account?):

  1. Save the research-env.yml to local file system
  2. Log into the AWS console if you are not already
  3. Choose Launch Stack to open the AWS CloudFormation console and create a new stack.
    Launch Stack
  4. Continue through the CloudFormation wizard steps
    1. Name your stack, i.e. ResearchWorkshopNotebook
    2. Select "Upload a template file" and use research-env.yml as the template file
    3. After reviewing, check the blue box for creating IAM resources.
  5. Choose Create stack. This will take ~20 minutes to complete.
  6. The output of the CloudFormation stack creation will provide a Notebook URL (in the Outputs section of your stack details

Workshops

Please review and complete all prerequisites before attempting these workshops.

Title Description
Introduction to AWS Basics Learn about core AWS services for compute, storage, database and networking. This workshop has a hands-on lab where you will be able to launch an auto-scaled Apache web server behind an ALB, S3 bucket hosting content of the home page, and how to define the approriate roles for each resource.
Building Data Lakes In this series of hands-on workshops, you will learn how to understand what data you have, how to drive insights, and how to make predictions using purpose-built AWS services. Learn about the common pitfalls of building data lakes, and discover how to successfully drive analytics and insights from your data. Also learn how services such as Amazon S3, AWS Glue, Amazon Athena, and Amazon AI/ML services work together to build a serverless data lake for various roles, including data scientists and business users.
Tensorflow with Amazon SageMaker Amazon SageMaker is a fully- managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Amazon SageMaker removes all the barriers that typically slow down developers who want to use machine learning. We will show you how to train and build a ML model on SageMaker then how to deploy the inference end points on tools like AWS Greengrass or Serverless applications.
Cost-effective Research leveraging AWS Spot With Amazon Web Services (AWS), you can spin up EC2 compute capacity on demand with no upfront commitments. You can do this even more cost effectively by using Amazon EC2 Spot Instances to bid on spare Amazon EC2 computing capacity. This allows users to get 90% off on demand prices (often as little as 1c per core hour) and has helped them run very large scale workloads cost effectively. For example, at USC a computational chemist spun up 156,000 core in three days. Also, with the recent release of the Spot fleet API, a researcher or scientist can easily have access to some of the most cost effective compute capacity at a very large scale. Learn how to effectively use these tools for your research needs.
AWS Batch on AWS In this workshop you will setup an AWS Batch environment for processing FastQC files leveraging the 1000 Genome dataset. Get started with AWS Batch by creating a job definition, compute environment, and a job queue for AWS Batch with the python SDK.
AWS ParallelCluster on AWS In this workshop you will setup an AWS ParallelCluster environment with Slurm REST API endpoint. You will be running Priceton's Athena++ MHD simulation on the cluster and visualize the result, all from the Jupyter Notebooks.
Introduction to Containers on AWS In this workshop, we will introduce containers for researchers. You will learn the basics of containers and how to run your workload with containers on AWS.

Prerequisites

AWS Account

In order to complete these workshops you'll need a valid, usable AWS Account with Admin permissions. The code and instructions in these workshops assume only one student is using a given AWS account at a time. If you try sharing an account with another student, you'll run into naming conflicts for certain resources.

Use a personal account or create a new AWS account to ensure you have the neccessary access. This should not be an AWS account from the company you work for.

If you are doing this workshop as part of an AWS sponsored event, you will receive credits to cover the costs.

Browser

We recommend you use the latest version of Chrome or Firefox to complete this workshop.

Text Editor

For any workshop module that requires use of the AWS Command Line Interface (see above), you also will need a plain text editor for writing scripts. Any editor that inserts Windows or other special characters potentially will cause scripts to fail.

IAM Role for Notebook Instance

A new IAM Role will be required for the workshops. The Notebook Instance requires sagemaker.amazonaws.com and glue.amazonaws.com trust permissions and the AdministratorAccess policy to access the required services in the workshops. Follow the instruction below to create the role in Python or use the research-env.yml file in CloudFormation to launch the notebook.

Python script instructions for creating the IAM Role (expand for details)

import logging
import os
import time
import argparse
import botocore.session
import botocore.exceptions

def create_role(iam, policy_name, assume_role_policy_document, inline_policy_name=None, policy_str=None):
    """Creates a new role if there is not already a role by that name"""
    if role_exists(iam, policy_name):
        logging.info('Role "%s" already exists. Assuming correct values.', policy_name)
        return get_role_arn(iam, policy_name)
    else:
        response = iam.create_role(RoleName=policy_name,
                                   AssumeRolePolicyDocument=assume_role_policy_document)
        
        if policy_str is not None:
            iam.put_role_policy(RoleName=policy_name,
                            PolicyName=inline_policy_name, PolicyDocument=policy_str)
        logging.info('response for creating role = "%s"', response)
        return response['Role']['Arn']

def role_exists(iam, role_name):
    """Checks if the role exists already"""
    try:
        iam.get_role(RoleName=role_name)
    except botocore.exceptions.ClientError:
        return False
    return True

def get_role_arn(iam, role_name):
    """Gets the ARN of role"""
    response = iam.get_role(RoleName=role_name)
    return response['Role']['Arn']

iam = boto3.client('iam')

role_doc = {
        "Version": "2012-10-17", 
        "Statement": [
            {"Sid": "", 
             "Effect": "Allow", 
             "Principal": {
                 "Service": [
                     "sagemaker.amazonaws.com",
                     "glue.amazonaws.com"
                 ]
             }, 
             "Action": "sts:AssumeRole"
        }]
    }

inline_policy = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "*",
                    "*"
                ],
                "Resource": [
                    "*"
                ],
                "Effect": "Allow"
            }
        ]
    }

role_arn = workshop.create_role(iam, firehose_role_name, json.dumps(role_doc), firehose_policy_name, json.dumps(inline_policy))
print(role_arn)

Launching Research Notebook Instance

SageMaker provides hosted Jupyter notebooks that require no setup, so you can begin processing your training data sets immediately. With a few clicks in the SageMaker console, you can create a fully managed notebook instance, pre-loaded with useful libraries for machine learning.

Step-by-step instructions (expand for details)

  1. In the upper-right corner of the AWS Management Console, confirm you are in the desired AWS region. Select a Region with SageMaker support.

  2. From the Services drop-down menu type SageMaker to filter the list of all services. This will bring you to the Amazon CloudFormation console homepage.

Service Search

  1. On the left hand side click Notebook instances, and click the Create notebook instance button at the top of the browser window.

Notebook Instances

  1. In Notebook instance settings type aws-research-workshops-notebook into the Notebook instance name text box, select ml.t2.medium for the Notebook instance type, and enter 50 for Volume Size in GB leaving the other as defaults.

Create Notebook Instance

  1. For IAM role, choose Create a new role, (steps to come) will require sagemaker.amazonaws.com and glue.amazonaws.com trust permissions and AdministratorAccess policy for access required services.

  2. In the Git Repositories section clone this repo to be included in the notebook instance.

Notebook Git

  1. Click Create notebook instance.

Accessing the Notebook Instance

  1. Wait for the server status to change to InService. This will take several minutes but likely less.

Access Notebook

  1. Click Open. You will now see the Jupyter homepage for your notebook instance.

Open Notebook

License Summary

This sample code is made available under a modified MIT license. See the LICENSE file.

aws-research-workshops's People

Contributors

dependabot[bot] avatar hyandell avatar josiahbjorgaard avatar jxuamazon avatar randyridgley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-research-workshops's Issues

S3 path to New York City Taxi and Limousine Commission (TLC) Trip Record Data

Describe the bug
In the ny-taxi-orchestration notebook, I try to execute the line to cp the data and it comes back with does not exist

!aws s3 cp s3://nyc-tlc/trip\ data/yellow_tripdata_2017-01.csv s3://$bucket/datalake/raw/yellow/
!aws s3 cp s3://nyc-tlc/trip\ data/yellow_tripdata_2017-02.csv s3://$bucket/datalake/raw/yellow/

To Reproduce
Steps to reproduce the behavior:

  1. Go to notebook cell for Copy Sample Data to S3 buck
  2. Click on 'Run.'
  3. Scroll down to results and see error below
  4. See error

fatal error: An error occurred (404) when calling the HeadObject operation: Key "trip/data/yellow_tripdata_2017-01.csv" does not exist
fatal error: An error occurred (404) when calling the HeadObject operation: Key "trip data/yellow_tripdata_2017-02.csv" does not exist

Expected behavior
A clear and concise description of what you expected to happen.

I expect the data to copied in the bucket created in previous step. There is a problem in the path to get to the training data most likely.

Additional context
I again feel like it's a simple path problem, but I don't know what that path is. This extends into the other notebooks with the taxi trip data as well.

Missing view file for notebook data_lake_sql_server_source

Describe the bug
In the /aws-research-workshops/notebooks/building_data_lakes/db-scripts, there is no sample-view.sql file. The notebook has a step calling this and it errors

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on cell with 'run_sql_file('db-scripts/sample-view.sql', conn)'
  3. Scroll down to results
  4. See error
    FileNotFoundError: [Errno 2] No such file or directory: 'db-scripts/sample-view.sql'

Expected behavior
A clear and concise description of what you expected to happen.

Once the file is there, I expect the command to complete successfully
Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

node --version gives "node: command not found"

Describe the bug
After executing the instructions to install node I restart the kernel and the node --version cell fails with

/bin/sh: node: command not found

Instructions for restarting the kernel are also incorrect.

Also kernel is consistently misspelled kernal.

To Reproduce
Steps to reproduce the behavior:

  1. Follow instructions to install nodejs
  2. Execute cell
  3. See error

Expected behavior
node --version should work.

ParallelCluster Workshop Not Working for Graviton Cluster

Describe the bug
Currently the workshop's script to use the Graviton processors is not working. There are several changes the users must make before getting the Head Node up and running, but still cannot connect to the Slurm API to submit jobs.

To Reproduce
Steps to reproduce the behavior:

  1. Download a fresh install of https://github.com/aws-samples/aws-research-workshops into a zip file, then upload to a SageMaker notebook

  2. Unzip the folder into your directory

  3. Navigate to the pcluster-athena++.ipynb notebook (aws-research-workshops-mainline -> notebooks -> parallelcluster -> pcluster-athena++.ipynb)

  4. Before starting, need to make some modifications to the pcluster_athena.py script for cluster creation later

    1. Change to use hard-coded VPC
    vpc_filter = [{'Name':'isDefault', 'Values':['true']}]
    default_vpc = ec2_client.describe_vpcs(Filters=vpc_filter)
    
    ### On lines 146-147 ###
    ### CHANGE TO ###
    
    vpc_filter = [{'Name':'research-workshop', 'Values':['true']}]
    default_vpc = ec2_client.describe_vpcs(VpcIds=['<YOUR VPC ID HERE>',])

    1. Change to pull from custom VPCs
    if '-1a' in sn['AvailabilityZone']:
            subnet_id = sn['SubnetId']
    if '-1b' in sn['AvailabilityZone']:
            subnet_id2 = sn['SubnetId']
    
    ### On lines 154-157 ###
    ### CHANGE TO ###
        
    if 'public1' in sn['Tags'][0]['Value'] :
            subnet_id = sn['SubnetId']
    if 'public2' in sn['Tags'][0]['Value'] :
            subnet_id2 = sn['SubnetId'] 

    1. Add try/except for bucket creation
    try:
            self.my_bucket_name = workshop.create_bucket(self.region, self.session, bucket_prefix, False)
            print(self.my_bucket_name)
    except:
            pass
    ### On line 172 ###

    1. The given config-c6g.ini file does not work in ParallelCluster 3 ("ParallelCluster 3 requires configuration files to be valid YAML documents"). Update the config file to be in a YAML template
    Region: ${REGION}
    Image:
      Os: alinux2
    SharedStorage:
      - Name: myebs
        StorageType: Ebs
        MountDir: /shared
        EbsSettings:
          VolumeType: gp2
          Size: 200
    HeadNode:
      InstanceType: c6g.medium
      Networking:
        SubnetId: ${SUBNET_ID}
        ElasticIp: true 
      Ssh:
        KeyName: ${KEY_NAME}
      CustomActions:
        OnNodeConfigured:
          Script: ${POST_INSTALL_SCRIPT_LOCATION}
          Args:
            - ${POST_INSTALL_SCRIPT_ARGS_1}
            - ${POST_INSTALL_SCRIPT_ARGS_2}
            - ${POST_INSTALL_SCRIPT_ARGS_3}
            - ${POST_INSTALL_SCRIPT_ARGS_4}
            - ${POST_INSTALL_SCRIPT_ARGS_5}
            - ${POST_INSTALL_SCRIPT_ARGS_6}
            - ${POST_INSTALL_SCRIPT_ARGS_7}
            - ${POST_INSTALL_SCRIPT_ARGS_8}
            - ${POST_INSTALL_SCRIPT_ARGS_9}
      Iam:
        AdditionalIamPolicies:
          - Policy: arn:aws:iam::aws:policy/SecretsManagerReadWrite
        S3Access:
          - EnableWriteAccess: true
            BucketName: '*'
    Scheduling:
      Scheduler: slurm
      SlurmQueues:
        - Name: q1
          CapacityType: ONDEMAND
          ComputeResources:
            - Name: cr1
              InstanceType: c6g.2xlarge
              MinCount: 0
              MaxCount: 20
              Efa:
                Enabled: false
          CustomActions:
            OnNodeConfigured:
              Script: ${POST_INSTALL_SCRIPT_LOCATION}
              Args:
                - ${POST_INSTALL_SCRIPT_ARGS_1}
                - ${POST_INSTALL_SCRIPT_ARGS_2}
                - ${POST_INSTALL_SCRIPT_ARGS_3}
                - ${POST_INSTALL_SCRIPT_ARGS_4}
                - ${POST_INSTALL_SCRIPT_ARGS_5}
                - ${POST_INSTALL_SCRIPT_ARGS_6}
                - ${POST_INSTALL_SCRIPT_ARGS_7}
                - ${POST_INSTALL_SCRIPT_ARGS_8}
                - ${POST_INSTALL_SCRIPT_ARGS_9}
          Iam:
            AdditionalIamPolicies:
              - Policy: arn:aws:iam::aws:policy/SecretsManagerReadWrite
            S3Access:
              - EnableWriteAccess: true
                BucketName: '*'
          Networking:
            SubnetIds:
              - ${SUBNET_ID}
            AssignPublicIp: true
            PlacementGroup:
              Enabled: true

    5. Need to change `pcluster_athena.py` configurations to accommodate the new template
ph = {
, '${REGION}':self.region 
, '${VPC_ID}': self.vpc_id 
, '${SUBNET_ID}': subnet_id 
, '${KEY_NAME}': self.ssh_key_name 
, '${POST_INSTALL_SCRIPT_LOCATION}': post_install_script_location 
, '${POST_INSTALL_SCRIPT_ARGS_1}': "'"+rds_secret['host']+"'"
, '${POST_INSTALL_SCRIPT_ARGS_2}': "'"+str(rds_secret['port'])+"'"
, '${POST_INSTALL_SCRIPT_ARGS_3}': "'"+rds_secret['username']+"'"
, '${POST_INSTALL_SCRIPT_ARGS_4}': "'"+rds_secret['password']+"'"
, '${POST_INSTALL_SCRIPT_ARGS_5}': "'"+self.pcluster_name+"'"
, '${POST_INSTALL_SCRIPT_ARGS_6}': "'"+self.region+"'"
, '${POST_INSTALL_SCRIPT_ARGS_7}': "'"+self.slurm_version+"'"
, '${POST_INSTALL_SCRIPT_ARGS_8}': "'"+self.dbd_host+"'"       
, '${POST_INSTALL_SCRIPT_ARGS_9}': "'"+self.federation_name+"'"
, '${BUCKET_NAME}': self.my_bucket_name
    }    
self.template_to_file("config/"+self.config_name+".ini", "build/"+self.config_name, ph)

### On lines ~250-270 ###
### CHANGE TO ###

ph = {
'${REGION}': self.region 
, '${VPC_ID}': self.vpc_id 
, '${SUBNET_ID}': subnet_id 
, '${KEY_NAME}': self.ssh_key_name 
, '${POST_INSTALL_SCRIPT_LOCATION}': post_install_script_location 
, '${POST_INSTALL_SCRIPT_ARGS_1}': rds_secret['host']
, '${POST_INSTALL_SCRIPT_ARGS_2}': "'"+str(rds_secret['port'])+"'" ### YAML must take integer values as strings
, '${POST_INSTALL_SCRIPT_ARGS_3}': rds_secret['username']
, '${POST_INSTALL_SCRIPT_ARGS_4}': rds_secret['password']
, '${POST_INSTALL_SCRIPT_ARGS_5}': self.pcluster_name
, '${POST_INSTALL_SCRIPT_ARGS_6}': self.region
, '${POST_INSTALL_SCRIPT_ARGS_7}': self.slurm_version
, '${POST_INSTALL_SCRIPT_ARGS_8}': self.dbd_host       
, '${POST_INSTALL_SCRIPT_ARGS_9}': "'"+self.federation_name+"'" ### federation name is empty, need quotes
, '${BUCKET_NAME}': self.my_bucket_name
    }
self.template_to_file("config/"+self.config_name+"-template.yaml", "build/"+self.config_name+".yaml", ph)
  1. Follow the first steps in the notebook to get nodejs installed
  2. Run the next two cells
  3. In the cell where you install the packages, comment the default pcluster_name, config_name, REGION values and uncomment the three variables for the Graviton Cluster (pcluster_name, config_name, post_install_script_prefix), then run the cell
  4. Run the cell in "Load the PClusterHelper module"
  5. When running Create the parallel cluster, change !pcluster create-cluster --cluster-name $pcluster_helper.pcluster_name --rollback-on-failure False --cluster-configuration build/$config_name --region $pcluster_helper.region to !pcluster create-cluster --cluster-name $pcluster_helper.pcluster_name --rollback-on-failure False --cluster-configuration build/$config_name'.yaml' --region $pcluster_helper.region to pick up the YAML template version
  6. Run everything until Inspect the Slurm REST API Schema. The cluster will be created. When you run the Inspect the Slurm REST API Schema cell, you get a Connection Refused error. Here is the actual error ConnectionError: HTTPConnectionPool(host='10.0.0.62', port=8082): Max retries exceeded with url: /openapi/v3 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f924d2bbf28>: Failed to establish a new connection: [Errno 111] Connection refused',))

Expected behavior
What we expect to happen is JSON response of the header, however we're getting a connection refused error instead. It looks like this might be due to a networking error. I've contacted AWS support, but they have not found any issues on my end surrounding connection.

Additional context
pcluster version = 3.1.4
sinfo -V = slurm 21.08.8

Here is my slurm.conf

#
# Example slurm.conf file. Please run configurator.html
# (in doc/html) to build a configuration file customized
# for your environment.
#
#
# slurm.conf file generated by configurator.html.
#
# See the slurm.conf man page for more information.
#
# CLUSTER SETTINGS
ClusterName=mypc6g2
SlurmUser=slurm
SlurmctldPort=6820-6829
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/var/spool/slurm.state
SlurmdSpoolDir=/var/spool/slurmd
SwitchType=switch/none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ReconfigFlags=KeepPartState
#
# CLOUD CONFIGS OPTIONS
SlurmctldParameters=idle_on_node_suspend,power_save_min_interval=30,cloud_dns
CommunicationParameters=NoAddrCache
SuspendProgram=/opt/parallelcluster/scripts/slurm/slurm_suspend
ResumeProgram=/opt/parallelcluster/scripts/slurm/slurm_resume
ResumeFailProgram=/opt/parallelcluster/scripts/slurm/slurm_suspend
SuspendTimeout=120
ResumeTimeout=1800
PrivateData=cloud
ResumeRate=0
SuspendRate=0
#
# TIMERS
SlurmctldTimeout=300
SlurmdTimeout=180
UnkillableStepTimeout=180
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
MessageTimeout=60
#
# SCHEDULING, JOB, AND NODE SETTINGS
EnforcePartLimits=ALL
SchedulerType=sched/backfill
ProctrackType=proctrack/cgroup
MpiDefault=none
ReturnToService=1
TaskPlugin=task/affinity,task/cgroup
#
# TRES AND GPU CONFIG OPTIONS
GresTypes=gpu
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU
#
# LOGGING
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
JobCompType=jobcomp/none
#
# WARNING!!! The slurm_parallelcluster.conf file included
# get updated by pcluster process, be careful
# when manually editing!
include slurm_parallelcluster.conf
# Enable jwt auth for Slurmrestd
AuthAltTypes=auth/jwt
#
## /opt/slurm/etc/slurm.conf
#
# ACCOUNTING
JobAcctGatherType=jobacct_gather/linux
JobAcctGatherFrequency=30
#
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageHost=<IP ADDRESS> # cluster headnode's DNS
AccountingStorageUser=db_user
AccountingStoragePort=6839

Here is the output of sudo journalctl -u slurmrestd

-- Logs begin at Thu 2022-05-12 10:46:45 UTC, end at Thu 2022-05-26 03:00:38 UTC. --
May 25 22:30:25 ip-10-0-0-62 systemd[1]: Started Slurm restd daemon.
May 25 22:30:25 ip-10-0-0-62 slurmrestd[12872]: debug:  _establish_config_source: using config_file=/opt/slurm/etc/slurmrestd.conf (environment)
May 25 22:30:25 ip-10-0-0-62 slurmrestd[12872]: debug:  slurm_conf_init: using config_file=/opt/slurm/etc/slurmrestd.conf
May 25 22:30:25 ip-10-0-0-62 slurmrestd[12872]: debug:  Reading slurm.conf file: /opt/slurm/etc/slurmrestd.conf
May 25 22:30:25 ip-10-0-0-62 slurmrestd[12872]: debug:  NodeNames=q1-dy-cr1-[1-20] setting Sockets=8 based on CPUs(8)/(CoresPerSocket(1)/ThreadsPerCore(1))
May 25 22:30:25 ip-10-0-0-62 systemd[1]: slurmrestd.service: main process exited, code=killed, status=11/SEGV
May 25 22:30:25 ip-10-0-0-62 systemd[1]: Unit slurmrestd.service entered failed state.
May 25 22:30:25 ip-10-0-0-62 systemd[1]: slurmrestd.service failed.

Here is the output of sudo systemctl status slurmrestd

● slurmrestd.service - Slurm restd daemon
   Loaded: loaded (/etc/systemd/system/slurmrestd.service; disabled; vendor preset: disabled)
   Active: failed (Result: signal) since Wed 2022-05-25 22:30:25 UTC; 4h 31min ago
 Main PID: 12872 (code=killed, signal=SEGV)

May 25 22:30:25 ip-10-0-0-62 systemd[1]: Started Slurm restd daemon.
May 25 22:30:25 ip-10-0-0-62 slurmrestd[12872]: debug:  _establish_config_source: using config_file=/opt/slurm/etc/slurmrestd.conf (environment)
May 25 22:30:25 ip-10-0-0-62 slurmrestd[12872]: debug:  slurm_conf_init: using config_file=/opt/slurm/etc/slurmrestd.conf
May 25 22:30:25 ip-10-0-0-62 slurmrestd[12872]: debug:  Reading slurm.conf file: /opt/slurm/etc/slurmrestd.conf
May 25 22:30:25 ip-10-0-0-62 slurmrestd[12872]: debug:  NodeNames=q1-dy-cr1-[1-20] setting Sockets=8 based on CPUs(8)/(CoresPerSocket(1)/ThreadsPerCore(1))
May 25 22:30:25 ip-10-0-0-62 systemd[1]: slurmrestd.service: main process exited, code=killed, status=11/SEGV
May 25 22:30:25 ip-10-0-0-62 systemd[1]: Unit slurmrestd.service entered failed state.
May 25 22:30:25 ip-10-0-0-62 systemd[1]: slurmrestd.service failed.

Here is the output of systemctl status slurmctld

● slurmctld.service - Slurm controller daemon
   Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2022-05-25 22:30:25 UTC; 4h 33min ago
 Main PID: 12899 (slurmctld)
   CGroup: /system.slice/slurmctld.service
           ├─12899 /opt/slurm/sbin/slurmctld -D
           └─12903 slurmctld: slurmscriptd

May 26 00:00:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 00:20:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 00:40:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 01:00:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 01:20:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 01:40:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 02:00:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 02:20:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 02:40:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds
May 26 03:00:01 ip-10-0-0-62 slurmctld[12899]: slurmctld: auth/jwt: auth_p_token_generate: created token for root for 1800 seconds

Building Data Lake- S3 Bucket creation error in us-east-1

aws-research-workshops/notebooks/building_data_lakes/building_data_lakes.ipynb

session.resource('s3').create_bucket(Bucket=bucket, CreateBucketConfiguration={'LocationConstraint': region})
throw an error in us-east-1. Removing the constraint seems to address the issue. It looks like us-east-1 isn't a valid input for LocationConstraint

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.