Coder Social home page Coder Social logo

aws-solutions / efs-backup Goto Github PK

View Code? Open in Web Editor NEW
95.0 23.0 45.0 111 KB

EFS backup solution performs backup from source EFS to destination EFS. It utilizes fpsync utils (fpart + rysnc) for efficient incremental backups on the file system.

Home Page: https://aws.amazon.com/answers/infrastructure-management/efs-backup

License: Apache License 2.0

Shell 34.82% Python 65.18%

efs-backup's Introduction

AWS EFS-to-EFS Backup Solution

Description

The EFS-to-EFS backup solution leverages Amazon CloudWatch and AWS Lambda to automatically create incremental backups of an Amazon Elastic File System (EFS) file system on a customer- defined schedule. The solution is easy to deploy and provides automated backups for data recovery and protection. For example, an organization can use this backup solution in a production environment to automatically create backups of their file system(s) on daily basis, and keep only a specified number of backups. For customers who do not have a mechanism for backing up their Amazon EFS file systems, this solution provides an easy way to improve data protection and recoverability.

Architectural Workflow

• The orchestrator lambda function is first invoked by CW event (start backup) schedule defined by the customer. The lambda function creates a 'Stop Backup' CWE event and add the orchestrator (itself) lambda function as the target. It also updates desired capacity of the autoscaling group (ASG) to 1 (one). Auto Scaling Group (ASG) launches an EC2 instance that mounts the source and target EFS and backup the primary EFS.

• The orchestrator lambda function writes backup metadata to the DDB table with backup id as the primary key.

• Fifteen minutes before the backup window defined by the customer, the 'Stop' CWE invokes orchestrator lambda to change the desired capacity of ASG to 0 (zero).

• The lifecycle hook CWE is triggered by ASG event (EC2_Instance_Terminating). This CWE invokes the orchestrator lambda function that use ‘AWS-RunShellScript’ document name to make send_command api call to the SSM service.

• During the lifecycle hook event, the EC2 instance will stop/cleanup rsync process gracefully and update the DDB table with the KPIs, upload logs to the S3 bucket.

• The EC2 successful termination trigger another lifecycle hook event. This event triggers the orchestrator lambda function to send the anonymous metrics, notify customer if complete backup was not done.

Setup

Run Unit Tests (pytest)

Note: Use sudo if necessary to install python dependencies

$ bash deployment/run-unit-tests.sh

Build S3 Assets

  • Configure the build paraemters.
export EFS_BACKUP_PATH=`pwd`
export DIST_OUTPUT_BUCKET=my-bucket-name # bucket where customized code will reside
export VERSION=my-version # version number for the customized code
export SOLUTION_NAME=efs-backup # solution name for the customized code

Note: You would have to create an S3 bucket with the prefix 'my-bucket-name-<aws_region>' as whole Lambda functions are going to get the source codes from the 'my-bucket-name-<aws_region>' bucket; aws_region is where you are deployting the customized solution (e.g. us-east-1, us-east-2, etc.).

  • Build the customized solution
cd $EFS_BACKUP_PATH/deployment
chmod +x ./build-s3-dist.sh
./build-s3-dist.sh $DIST_OUTPUT_BUCKET $SOLUTION_NAME $VERSION
  • Deploy the source codes to an Amazon S3 bucket in your account. Note: You must have the AWS Command Line Interface installed and create the Amazon S3 bucket in your account prior to copy source codes.
export AWS_REGION=us-east-1 # the AWS region you are going to deploy the solution in your account.
export AWS_PROFILE=default # the AWS Command Line Interface profile

aws s3 cp $EFS_BACKUP_PATH/deployment/global-s3-assets/ s3://$DIST_OUTPUT_BUCKET-$AWS_REGION/$SOLUTION_NAME/$VERSION/ --recursive --acl bucket-owner-full-control --profile $AWS_PROFILE
aws s3 cp $EFS_BACKUP_PATH/deployment/regional-s3-assets/ s3://$DIST_OUTPUT_BUCKET-$AWS_REGION/$SOLUTION_NAME/$VERSION/ --recursive --acl bucket-owner-full-control --profile $AWS_PROFILE

Deploying the customized solution

  • Get the link of the efs-to-efs-backup.template and efs-to-efs-restore.template uploaded to your Amazon S3 bucket.
  • Deploy the EFS Backup solution to your account by launching a new AWS CloudFormation stack using the link of the efs-to-efs-backup.template and efs-to-efs-restore.template.

Collection of operational metrics

This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the implementation guide.


Copyright 2017-2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at

http://www.apache.org/licenses/LICENSE-2.0

or in the "license" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

efs-backup's People

Contributors

georgebearden avatar hyandell avatar shsenior avatar tomnight avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

efs-backup's Issues

Lambda functions using deprecated nodejs4.3

I have a client with this deployed in production and they are receiving warning notices that the lambda calls are using nodejs4.3 which is now deprecated. Can the scripts be updated to 8.10?

Both the backup and restore templates contain nodejs4.3 references.

Problems with the restore script

I found the restore process very confusing with the field names (/Source EFS Id/ and /Backup EFS Id/), and I also couldn't get it to work due to the EC2 instance not having a Security Group which our existing EFS instances required.

I have addressed these issues in PR #4.

uuid lib

When deploying the cloudformation and testing the lambda I always get the same error message
{ "errorMessage": "module initialization error" }

Does anyone know which could be the reason?
Thanks

Incomplete backups

Hello,

I see that this issue has been closed but I'm experiencing a very similar problem.

I had deployed this solution last week (2018-08-09) and it initially ran successfully, but now reports incomplete backups:

{
"BackupId": "9d74c8a7",
"BackupPrefix": "/",
"BackupStartTime": "2018-08-13T16:08:17",
"BackupStatus": "Incomplete",
"BackupStopTime": "2018-08-13T16:19:10",
"BackupWindow": "180",
"CreateHardlinksStartTime": "2018-08-13T16:11:12",
"CreateHardlinksStopTime": "2018-08-13T16:14:39",
"DestinationEfsId": "fs-387c3591",
"DestinationEfsSize": 54811144192,
"DestinationPerformanceMode": "maxIO",
"EC2Logs": "https://s3.amazonaws.com/nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/ec2-logs/efs-backup-backup-20180813-1619.log",
"ExpireItem": "1541952496",
"InstanceType": "c5.xlarge",
"IntervalTag": "daily",
"Message": "The EFS backup was incomplete. The backup window expired before the full backup was completed.",
"NumberOfFiles": 41948,
"NumberOfFilesTransferred": 1785,
"RemoveSnapshotStartTime": "2018-08-13T16:10:16",
"RemoveSnapshotStopTime": "2018-08-13T16:11:12",
"RetainPeriod": "7",
"S3BucketSize": 7380943,
"SourceBurstCreditBalance": 2308974418330,
"SourceBurstCreditBalancePostBackup": 2308974418330,
"SourceEfsId": "fs-35aae49c",
"SourceEfsSize": 59084480512,
"SourcePerformanceMode": "generalPurpose",
"SourcePermittedThroughput": 104857600,
"TotalFileSize": 60585862733,
"TotalTransferredFileSize": 8997712481
}
I have the backup window set to 6 hours, although the scripts only run for a couple minutes, which seems to contradict the error message.

SSM stderr:

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 10 100 10 0 0 16835 0 --:--:-- --:--:-- --:--:-- 10000
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 19 100 19 0 0 35514 0 --:--:-- --:--:-- --:--:-- 19000
kill: sending signal to 15243 failed: No such process
SSM stdout:

-- 2018-08-13T16:19:08 -- uploading cloud init logs
Completed 75.6 KiB/75.6 KiB (196.1 KiB/s) with 1 file(s) remaining
upload: var/log/cloud-init-output.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/ec2-logs/efs-backup-backup-20180813-1619.log
-- 2018-08-13T16:19:09 -- upload ec2 cloud init logs to S3, status: 0
-- 2018-08-13T16:19:09 -- uploading backup (fpsync) logs
Completed 256.0 KiB/314.1 KiB (689.4 KiB/s) with 1 file(s) remaining
Completed 314.1 KiB/314.1 KiB (745.1 KiB/s) with 1 file(s) remaining
upload: tmp/efs-backup.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/efs-backup-logs/efs-backup-backup-fpsync-20180813-1619.log
-- 2018-08-13T16:19:10 -- upload backup fpsync logs to S3 status: 0
-- 2018-08-13T16:19:10 -- uploading backup (rsync delete) logs
Completed 139 Bytes/139 Bytes (367 Bytes/s) with 1 file(s) remaining
upload: tmp/efs-backup-rsync.log to s3://nexus-efs-backup-efslogbucket-1uwzom8ipy8y0/efs-backup-logs/efs-backup-backup-rsync-delete-20180813-1619.log
-- 2018-08-13T16:19:10 -- upload rsync delete logs to S3 status: 0
-- 2018-08-13T16:19:10 -- fpsync foreground process-id: 15243
-- 2018-08-13T16:19:10 -- kill with SIGTERM, status: 0
-- 2018-08-13T16:19:10 -- exiting loop
-- 2018-08-13T16:19:10 -- Number of files: 41948
-- 2018-08-13T16:19:10 -- Number of files transferred: 1785
-- 2018-08-13T16:19:10 -- Total file size: 60585862733
-- 2018-08-13T16:19:10 -- Total transferred file size: 8997712481
-- 2018-08-13T16:19:10 -- source efs BurstCreditBalance after backup: 2.30897441833e+12
fpsyncStatus: 1
rsync delete status: 0
-- 2018-08-13T16:19:10 -- backup finish time: 2018-08-13T16:19:10
-- 2018-08-13T16:19:10 -- backup incomplete (id: 9d74c8a7)
-- 2018-08-13T16:19:11 -- dynamo db update status: 0
-- 2018-08-13T16:19:11 -- updating lifecycle hook
-- 2018-08-13T16:19:11 -- lifecycle hook update status: 0

efs-backup-backup-20180813-1619.log

EC2 log attached. Any advice or tips would be much appreciated, thanks!

Backups failed after stack update

Hey there, did a stack update from 1.4 to 1.5, and all of a sudden the backups failed. EC2 log showed the Dest EFS could not be mounted. Traced it to the Ingress rule in the security group on the ENI used by EFS. It had no ingress rules at all. Something to do with the stack update i believe. Might hurt others...

The SSM script could not update the DynamoDB table with backup status, please check the logs in the S3 bucket for the details.

After launching the EFS-to-EFS backup solution, it appears to have failed on first run.

The error message is: "The SSM script could not update the DynamoDB table with backup status, please check the logs in the S3 bucket for the details."

However, I was unable to find any logs in the S3 bucket after the failure. It appears empty.

I have double checked the prerequisites listed in the solution, and it appears to me that everything is in order. Not sure how to further debug.

I first launched the solution 2 days ago.
This backup is intended to backup an EFS Cloudstor volume used with Docker4AWS, which created the EFS volume through it's cloudformation stack. Docker4AWS also created the VPC and subnet within which this solution was run.

Here is the full output of the error message.

{
"BackupId": "f4436f50",
"BackupPrefix": "/",
"BackupStartTime": "2018-10-17T05:00:07",
"BackupStatus": "Unknown",
"BackupWindow": "180",
"DestinationEfsId": "fs-bf5a9ff5",
"DestinationEfsSize": 6144,
"DestinationPerformanceMode": "generalPurpose",
"ExpireItem": "1547528406",
"InstanceType": "c5.xlarge",
"IntervalTag": "daily",
"Message": "The SSM script could not update the DynamoDB table with backup status, please check the logs in the S3 bucket for the details.",
"RetainPeriod": "14",
"S3BucketSize": "0",
"SourceBurstCreditBalance": 465542823779.75,
"SourceEfsId": "fs-0d637f44",
"SourceEfsSize": 2004533248,
"SourcePerformanceMode": "generalPurpose",
"SourcePermittedThroughput": 104857600
}

Happy to provide additional info as needed.

Make required parameters required

CFn templates(both of backup/restore) use parameters and all of them are required according to docs.

But you can deploy them with empty values for some parameters.
Restore's Restore Log Bucket parameter is one such example.

Once instances are terminated, there's no way to recover logs.

I will appreciate if you make CFn templates more robust.

The EC2 instance was unable to find the mount IP OR mount EFS

I made a cloudformation template for the prerequisites, and here is the template:

AWSTemplateFormatVersion: '2010-09-09'
Description: Build Apache server, AutoScaling group, ELB and handle two different domain names
Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.40.0.0/16
      Tags:
      - Key: Application
        Value:
          Ref: AWS::StackId
      - Key: Name
        Value: FWagehPublicVPC
  InternetGateway:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
      - Key: Application
        Value:
          Ref: AWS::StackId
      - Key: Network
        Value: Public
  GatewayToInternet:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId:
        Ref: VPC
      InternetGatewayId:
        Ref: InternetGateway
  PublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId:
        Ref: VPC
      CidrBlock: 10.40.0.0/24
      AvailabilityZone: !Select 
        - 1
        - Fn::GetAZs: !Ref 'AWS::Region'
      Tags:
      - Key: Application
        Value:
          Ref: AWS::StackId
      - Key: Name
        Value: fwageh_PublicSubnet
    PublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: VPC
      Tags:
      - Key: Application
        Value:
          Ref: AWS::StackId
      - Key: Name
        Value: PublicRouteTable
  PublicSubnetRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId:
        Ref: PublicSubnet
      RouteTableId:
        Ref: PublicRouteTable
  PublicRoute:
    Type: AWS::EC2::Route
    DependsOn: GatewayToInternet
    Properties:
      RouteTableId:
        Ref: PublicRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId:
        Ref: InternetGateway
  PrivateSubnetMountOne:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId:
        Ref: VPC
      CidrBlock: 10.40.3.0/24
      AvailabilityZone: !Select 
        - 0
        - Fn::GetAZs: !Ref 'AWS::Region'
      Tags:
      - Key: Application
        Value:
          Ref: AWS::StackId
      - Key: Name
        Value: fwageh_PrivateSubnetMountOne
  PrivateSubnetMountTwo:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId:
        Ref: VPC
      CidrBlock: 10.40.4.0/24
      AvailabilityZone: !Select 
        - 1
        - Fn::GetAZs: !Ref 'AWS::Region'
      Tags:
      - Key: Application
        Value:
          Ref: AWS::StackId
      - Key: Name
        Value: fwageh_PrivateSubnetMountTwo
  NATGateway:
    DependsOn: GatewayToInternet
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId:
        Fn::GetAtt:
        - ElasticIP
        - AllocationId
      SubnetId:
        Ref: PublicSubnet
  ElasticIP:
    Type: AWS::EC2::EIP
    DependsOn: GatewayToInternet
    Properties:
      Domain: vpc
  PrivateRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId:
        Ref: VPC
  PrivateRouteToInternet:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId:
        Ref: PrivateRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId:
        Ref: NATGateway
  PrivateSubnetOneRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId:
        Ref: PrivateSubnetMountOne
      RouteTableId:
        Ref: PrivateRouteTable
  PrivateSubnetTwoRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId:
        Ref: PrivateSubnetMountTwo
      RouteTableId:
        Ref: PrivateRouteTable
  FileSystem:
    Type: 'AWS::EFS::FileSystem'
    Properties:
      Encrypted: true
      KmsKeyId: !GetAtt 
        - key
        - Arn
  key:
    Type: 'AWS::KMS::Key'
    Properties:
      KeyPolicy:
        Version: 2012-10-17
        Id: key-default-1
        Statement:
          - Sid: Allow administration of the key
            Effect: Allow
            Principal:
              AWS: !Join 
                - ''
                - - 'arn:aws:iam::'
                  - !Ref 'AWS::AccountId'
                  - ':root'
            Action:
              - 'kms:*'
            Resource: '*'

And the SNS email Keeps telling me "The EFS backup was unsuccessful. The EC2 instance was unable to find the mount IP OR mount EFS"

And here is the ec2-log in my bucket:

Cloud-init v. 0.7.6 running 'init-local' at Mon, 30 Jul 2018 17:03:21 +0000. Up 6.78 seconds.
Cloud-init v. 0.7.6 running 'init' at Mon, 30 Jul 2018 17:03:21 +0000. Up 6.94 seconds.
ci-info: +++++++++++++++++++++++Net device info+++++++++++++++++++++++
ci-info: Device Up Address Mask Hw-Address
ci-info: lo True 127.0.0.1 255.0.0.0 .
ci-info: eth0 True 10.40.4.166 255.255.255.0 12:02:1a:0d:9b:d2
ci-info: ++++++++++++++++++++++++++++++Route info++++++++++++++++++++++++++++++
ci-info: Route Destination Gateway Genmask Interface Flags
ci-info: 0 0.0.0.0 10.40.4.1 0.0.0.0 eth0 UG
ci-info: 1 10.40.4.0 0.0.0.0 255.255.255.0 eth0 U
ci-info: 2 169.254.169.254 0.0.0.0 255.255.255.255 eth0 UH
Generating public/private rsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_rsa_key.
Your public key has been saved in /etc/ssh/ssh_host_rsa_key.pub.
The key fingerprint is:
SHA256:vhv8fVQpM7DBmFLIXVDl/iaXTKB9uIzQK9xAWpvbJxQ root@ip-10-40-4-166
The key's randomart image is:
+---[RSA 2048]----+
| . +oBo.. |
| + + +. |
| .o E+o .|
| + +.=+o..|
| .S= + +++ |
| o. B + B .|
| ++ * * * |
| +..o = |
| o.. .. |
+----[SHA256]-----+
Generating public/private dsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_dsa_key.
Your public key has been saved in /etc/ssh/ssh_host_dsa_key.pub.
The key fingerprint is:
SHA256:izZkrNY292nTQxNM/XUdOPRArX2TdpwxjquvWG/R9qs root@ip-10-40-4-166
The key's randomart image is:
+---[DSA 1024]----+
| o=o.o|
| .o+++|
| o +B|
| . oo B=|
| + S .+ +|
| = . . oo o |
| o B o +..o .|
| . o + .=o+. .|
| ooo+E...|
+----[SHA256]-----+
Generating public/private ecdsa key pair.
Your identification has been saved in /etc/ssh/ssh_host_ecdsa_key.
Your public key has been saved in /etc/ssh/ssh_host_ecdsa_key.pub.
The key fingerprint is:
SHA256:9JT5t0+TVY3nfs/irvWGs28V4frzuWmVzmKxismTQ8k root@ip-10-40-4-166
The key's randomart image is:
+---[ECDSA 256]---+
| |
| o o.|
| . + o =|
| . o . =.|
| S o . o =|
| E + +=|
| . . O+
|
| .+o =+@o|
| +o.++*@@|
+----[SHA256]-----+
Cloud-init v. 0.7.6 running 'modules:config' at Mon, 30 Jul 2018 17:03:22 +0000. Up 7.60 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper
1 package(s) needed (+0 related) for security, out of 3 available
Resolving Dependencies
--> Running transaction check
---> Package gnupg2.x86_64 0:2.0.28-2.31.amzn1 will be updated
---> Package gnupg2.x86_64 0:2.0.28-2.32.amzn1 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package Arch Version Repository Size

Updating:
gnupg2 x86_64 2.0.28-2.32.amzn1 amzn-updates 2.6 M

Transaction Summary

Upgrade 1 Package

Total download size: 2.6 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Updating : gnupg2-2.0.28-2.32.amzn1.x86_64 1/2
Cleanup : gnupg2-2.0.28-2.31.amzn1.x86_64 2/2
Verifying : gnupg2-2.0.28-2.32.amzn1.x86_64 1/2
Verifying : gnupg2-2.0.28-2.31.amzn1.x86_64 2/2

Updated:
gnupg2.x86_64 0:2.0.28-2.32.amzn1

Complete!
Cloud-init v. 0.7.6 running 'modules:final' at Mon, 30 Jul 2018 17:03:27 +0000. Up 13.28 seconds.
Loaded plugins: priorities, update-motd, upgrade-helper
Existing lock /var/run/yum.pid: another copy is running as pid 2061.
Another app is currently holding the yum lock; waiting for it to exit...
The other application is: yum
Memory : 87 M RSS (334 MB VSZ)
Started: Mon Jul 30 17:03:26 2018 - 00:02 ago
State : Running, pid: 2061
Examining /var/tmp/yum-root-uoTNk3/amazon-ssm-agent.rpm: amazon-ssm-agent-2.2.800.0-1.x86_64
Marking /var/tmp/yum-root-uoTNk3/amazon-ssm-agent.rpm as an update to amazon-ssm-agent-2.2.120.0-1.amzn1.x86_64
Resolving Dependencies
--> Running transaction check
---> Package amazon-ssm-agent.x86_64 0:2.2.120.0-1.amzn1 will be updated
---> Package amazon-ssm-agent.x86_64 0:2.2.800.0-1 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package Arch Version Repository Size

Updating:
amazon-ssm-agent x86_64 2.2.800.0-1 /amazon-ssm-agent 40 M

Transaction Summary

Upgrade 1 Package

Total size: 40 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
amazon-ssm-agent stop/waiting
Updating : amazon-ssm-agent-2.2.800.0-1.x86_64 1/2
Cleanup : amazon-ssm-agent-2.2.120.0-1.amzn1.x86_64 2/2
amazon-ssm-agent start/running, process 2183
Verifying : amazon-ssm-agent-2.2.800.0-1.x86_64 1/2
Verifying : amazon-ssm-agent-2.2.120.0-1.amzn1.x86_64 2/2

Updated:
amazon-ssm-agent.x86_64 0:2.2.800.0-1

Complete!
start: Job is already running: amazon-ssm-agent
--2018-07-30 17:03:36-- https://s3.amazonaws.com/solutions-reference/efs-backup/latest/efs-ec2-backup.sh
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.104.85
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.104.85|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3268 (3.2K) [binary/octet-stream]
Saving to: ‘/home/ec2-user/efs-ec2-backup.sh’

 0K ...                                                   100% 10.5M=0s

2018-07-30 17:03:36 (10.5 MB/s) - ‘/home/ec2-user/efs-ec2-backup.sh’ saved [3268/3268]

--2018-07-30 17:03:36-- https://s3.amazonaws.com/solutions-reference/efs-backup/latest/efs-backup-fpsync.sh
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.168.157
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.168.157|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5903 (5.8K) [binary/octet-stream]
Saving to: ‘/home/ec2-user/efs-backup-fpsync.sh’

 0K .....                                                 100% 11.2M=0.001s

2018-07-30 17:03:36 (11.2 MB/s) - ‘/home/ec2-user/efs-backup-fpsync.sh’ saved [5903/5903]

�[H�[JThis is the master script to perform efs backup

input from user

_source_efs: fs-d0464798
_destination_efs: fs-5b7f7e13
_interval: daily
_retain: 7
_folder_label: fwageh-efs-backup
_backup_prefix: /
region is us-east-1
instance-id is i-0582c39e3a29996e5
-- 2018-07-30T17:03:38 -- resolving source efs address fs-d0464798.efs.us-east-1.amazonaws.com
10.40.4.202
-- 2018-07-30T17:03:38 -- src mount ip: 10.40.4.202
-- 2018-07-30T17:03:38 -- resolving backup efs address fs-5b7f7e13.efs.us-east-1.amazonaws.com
10.40.4.200
-- 2018-07-30T17:03:38 -- dst mount ip: 10.40.4.200
-- 2018-07-30T17:03:38 -- running EFS backup script

input from user

source: 10.40.4.202:/
destination: 10.40.4.200:/
interval: daily
retain: 7
efsid: fwageh-efs-backup
-- 2018-07-30T17:03:38 -- sudo yum -y update
Loaded plugins: priorities, update-motd, upgrade-helper
Resolving Dependencies
--> Running transaction check
---> Package amazon-ssm-agent.x86_64 0:2.2.800.0-1 will be updated
---> Package amazon-ssm-agent.x86_64 0:2.2.800.0-1.amzn1 will be an update
---> Package kernel.x86_64 0:4.14.55-62.37.amzn1 will be installed
---> Package kernel-tools.x86_64 0:4.14.47-56.37.amzn1 will be updated
---> Package kernel-tools.x86_64 0:4.14.55-62.37.amzn1 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package Arch Version Repository Size

Installing:
kernel x86_64 4.14.55-62.37.amzn1 amzn-updates 21 M
Updating:
amazon-ssm-agent x86_64 2.2.800.0-1.amzn1 amzn-updates 12 M
kernel-tools x86_64 4.14.55-62.37.amzn1 amzn-updates 124 k

Transaction Summary

Install 1 Package
Upgrade 2 Packages

Total download size: 33 M
Downloading packages:

Total 16 MB/s | 33 MB 00:02
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : kernel-4.14.55-62.37.amzn1.x86_64 1/5
Updating : kernel-tools-4.14.55-62.37.amzn1.x86_64 2/5
Updating : amazon-ssm-agent-2.2.800.0-1.amzn1.x86_64 3/5
Cleanup : kernel-tools-4.14.47-56.37.amzn1.x86_64 4/5
Cleanup : amazon-ssm-agent-2.2.800.0-1.x86_64 5/5
Verifying : amazon-ssm-agent-2.2.800.0-1.amzn1.x86_64 1/5
Verifying : kernel-tools-4.14.55-62.37.amzn1.x86_64 2/5
Verifying : kernel-4.14.55-62.37.amzn1.x86_64 3/5
Verifying : kernel-tools-4.14.47-56.37.amzn1.x86_64 4/5
Verifying : amazon-ssm-agent-2.2.800.0-1.x86_64 5/5

Installed:
kernel.x86_64 0:4.14.55-62.37.amzn1

Updated:
amazon-ssm-agent.x86_64 0:2.2.800.0-1.amzn1
kernel-tools.x86_64 0:4.14.55-62.37.amzn1

Complete!
-- 2018-07-30T17:04:08 -- sudo yum -y install nfs-utils
Loaded plugins: priorities, update-motd, upgrade-helper
Existing lock /var/run/yum.pid: another copy is running as pid 7603.
Another app is currently holding the yum lock; waiting for it to exit...
The other application is: yum
Memory : 35 M RSS (282 MB VSZ)
Started: Mon Jul 30 17:04:08 2018 - 00:00 ago
State : Running, pid: 7603
Package 1:nfs-utils-1.3.0-0.21.amzn1.x86_64 already installed and latest version
Nothing to do
-- 2018-07-30T17:04:10 -- sudo mkdir /backup
-- 2018-07-30T17:04:10 -- sudo mkdir /mnt/backups
-- 2018-07-30T17:04:10 -- sudo mount -t nfs -o nfsvers=4.1 -o rsize=1048576 -o wsize=1048576 -o timeo=600 -o retrans=2 -o hard 10.40.4.202:/ /backup
mount.nfs: Connection timed out
mount status for source efs: 32
-- 2018-07-30T17:08:33 -- sudo mount -t nfs -o nfsvers=4.1 -o rsize=1048576 -o wsize=1048576 -o timeo=600 -o retrans=2 -o hard 10.40.4.200:/ /mnt/backups
mount status for backup efs: 0
-- 2018-07-30T17:08:36 -- ERROR:efs_not_mounted
-- 2018-07-30T17:08:36 -- Backup script finished before the backup window, stopping the ec2 instance.
ci-info: no authorized ssh keys fingerprints found for user ec2-user.
Cloud-init v. 0.7.6 finished at Mon, 30 Jul 2018 17:08:37 +0000. Datasource DataSourceEc2. Up 322.79 seconds

I don't know where is the issue

Run solution in VPC without Internet gateway and on-premises proxy

Hi,
we have a setup with a on-premises ntml proxy solution through a VPN connection. We have no Internet gateway in our VPC. I have spent days trying to get the solution to work in this situation and have solved all but one issue now.
My biggest problem has been to get the proxy environment variables to work everywhere for the AWS CLI command to work. For most of the things I have solved it by creating VPC endpoints in the VPC but for aws autoscaling command there is no endpoint available.

My final problem no is in the ssm.sh script that is sent to the EC2 instance when the backup is done. It can run all but the last aws autoscaling command, hence the instance is not shutdown directly, it has to wait until the stop_backup rule is triggered. Not a major problem but something I would love to solve.

The problem with the ssm.sh script seems to be that it's executed in a shell which dosen't pick up the proxy enviornments that I have configured in the UserData that is executed when the instance starts up. I have tried to insert variables in /etc/profile, /etc/bashrc, /etc/bash.bashrv and so on but it seems they are not picked up when the ssm.sh is executed.

I can of course go in to the lambda function and edit the ssm.sh file manually, but I would like to automate the installation fully and then that will not work since the file is downloaded from AWS when the solution is built, I can't see any way to edit it in yaml file.

Please advice how I should solve this issue.

Best regards,
Staffan

Inconsistencies of scripts.

I'm seeing inconsistencies of running this. Some days it will run fine but some days I get weird errors like the following:

"Message": "The SSM script could not update the DynamoDB table with backup status, please check the logs in the S3 bucket for the details.",

Checked logs but they are not really providing a ton of information. It will run like this for a few days and still transfer files and then will backup fine w/o any issues.

Backing up about 20 GB of data right now.

Read/Write Capacity Units for DDB

Is the RCU/WCU setting of 5 for DynamoDB a little high for this implementation?

Especially as the table is effectively per-stack and the system will only run at most daily with a handful of read/writes at the beginning/end of the backup execution.

Surely an RCU/WCU set to 1 is more than enough, especially with the burst capacity that DDB has.

Mount is timing out

1/2 of the nfs mounts has a connection timeout when using this solution. Here is what the logs from the S3 bucket show in relation to mounts.

-- 2018-08-08T21:52:11 -- sudo mkdir /backup
-- 2018-08-08T21:52:11 -- sudo mkdir /mnt/backups
-- 2018-08-08T21:52:11 -- sudo mount -t nfs -o nfsvers=4.1 -o rsize=1048576 -o wsize=1048576 -o timeo=600 -o retrans=2 -o hard 10.149.2.120:/ /backup
mount.nfs: Connection timed out
mount status for source efs: 32
-- 2018-08-08T21:56:34 -- sudo mount -t nfs -o nfsvers=4.1 -o rsize=1048576 -o wsize=1048576 -o timeo=600 -o retrans=2 -o hard 10.149.0.220:/ /mnt/backups
mount status for backup efs: 0
-- 2018-08-08T21:56:37 -- ERROR:efs_not_mounted

Support deployment in VPCs where AmazonProvidedDNS is not used

source/scripts/efs-ec2-backup.sh uses dig to resolve the FQDN of the EFS mount points. In our VPC it is necessary to set the DHCP options to point to internal DNS. Since we query our internal servers, we cannot resolve the IPs of the mount points. If the dig commands were modified in source/scripts/efs-ec2-backup.sh so that they queried the VPC DNS ("plus 2 IP"), the EFS Mount Points should be properly resolved.

For example:
dig @${_vpc_dns} ${_source_efs}.efs.${_region}.amazonaws.com +short

Orchestrator-Role: Member must have length less than or equal to 64

In instances where this solution is used in a nested stack configuration, the RoleName will almost certainly exceed 64 characters:

HH:MM:SS UTC-0000	CREATE_FAILED	AWS::IAM::Role	OrchestratorRole	1 validation error detected: Value 'Orchestrator-Role-{{some-long-stack-name}}-BackupStack-1T6EMIU72SCD3-us-west-2' at 'roleName' failed to satisfy constraint: Member must have length less than or equal to 64 (Service: AmazonIdentityManagement; Status Code: 400; Error Code: ValidationError; Request ID: 5a918106-f766-11e8-94af-3578535a537a)

Is in imperative that the OrchestratorRole specified a RoleName, or can it be left empty to be auto-generated?

Why using "cp -l" for rotating latest backup "0" to backup "1"

Hi,

maybe its more a question about script efs-backup-fpsync.sh.

Before this script triggers the creation of a new backup, it is rotating all existing backups.

echo "create_snapshot_start:$(date -u +%FT%T)"
# copy first backup with hard links, then replace first backup with new backup
if sudo test -d /mnt/backups/$efsid/$interval.0 ; then
  echo "-- $(date -u +%FT%T) --  sudo cp -al /mnt/backups/$efsid/$interval.0 /mnt/backups/$efsid/$interval.1"
  sudo cp -al /mnt/backups/$efsid/$interval.0 /mnt/backups/$efsid/$interval.1
fi
echo "create_snapshot_stop:$(date -u +%FT%T)"

Why "cp -al" is used here and not just "cp -a" ? Why using hard links here?

Thx,

best
Falko

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.