Comments (7)
This is a good point. Thanks for diving deep into this!
Probably 12 is not even safe enough, because we want the babysit to think that $(get_agent_pid) is not empty, that requires a wait for a full stop + start, in worst case Shutdown can take 11 seconds, but the start could also take sometime, what do you think?
from amazon-kinesis-agent.
After thinking more about this, I now realize that sleeping for any amount of time will not solve the underlying issue -- it just merely pushes ahead the time in which a race condition may still exist. For example, imagine if my restart were to occur 12 seconds after the babysit
started, then we're back to the same situation.
I think I need to noodle on this some more ...
What do you think about maybe having the babysit
make a status
call to the agent's SysVinit script
, with the do_status
function wrapped around the same MUTEXFILE
as the do_start()
and do_stop()
functions?
And then the babysit
script can look more like the awslogs-nanny
script (reproduced below):
#!/bin/sh
# Version: 1.1.2-rpm
# This script will restart the awslogs service if it has died
service awslogs status >/dev/null
status=$?
if [ "$status" -eq "1" -o "$status" -eq "2" ]; then
service awslogs restart
fi
from amazon-kinesis-agent.
After some more testing, I believe even the real underlying issue is mutex-resistant, since the daemonizing of start-aws-kinesis-agent
is continued into the background outside of the mutexed region.
If another process then enters the do_stop()
while the initialization of start-aws-kinesis-agent
is still in progress, then a potential core dump is possible. See for example this real world error from /tmp/aws-kinesis-agent.*.initlog
Terminated (core dumped)
awk: (FILENAME=- FNR=3) warning: error writing standard output (Broken pipe)
However, if we can wrap the do_status
call with the same MUTEXFILE
, then we can at least prevent the babysit
from falsely assuming the agent has unexpectedly died, and thus preventing a double restart of the agent.
from amazon-kinesis-agent.
Thanks @chris-gilmore for looking deep into this, I appreciate it. I will take a look at this next week :)
from amazon-kinesis-agent.
Actually I don't remember any specific reason why I put the conditions separately, how about we simply check the PID file and process at the same time? That would reduce the chance for a race condition too
if [ -f
start_agent
fi
exit 0
What do you think?
from amazon-kinesis-agent.
I tested your proposal and confirmed that it does not help alleviate the problem.
The issue is with running the babysit check
while another process is in the middle of running the do_stop()
from the sysvinit script
. There is a small but perceptible amount of time within the do_stop()
in which the agent_pid
has been killed while the PIDFILE
has not yet been deleted. Therefore, we wish to prevent the babysit check
from running simultaneously as the do_stop()
.
from amazon-kinesis-agent.
Makes sense. I'll test it out and merge the PR. Thanks!
from amazon-kinesis-agent.
Related Issues (20)
- Allow ./setup --build as non-root
- aws-kinesis-agent-latest.amzn1.noarch.rpm is a vulnerable version HOT 3
- Latest S3 RPM agent download is not the latest agent version
- Build fails on AL2
- Kinesis Agent using EC2 metadata
- Use a vuln free version of fasterxml HOT 1
- No Log output when running with start-aws-kinesis-agent HOT 2
- use aws profile credentials for RPi
- Kinesis Agent 1.1.6 fails to send records when IMDSv2 Tokens are required
- Service fails to connect to checkpoint database on startup HOT 1
- How does kinesis agent log getting logrotate? HOT 1
- Unable to Install kinesis agent HOT 2
- openjdk-11-jdk support HOT 2
- maxBufferAgeMillis does work independently
- workaround for debian / ubuntu install agent with openjdk 1.8 by script
- Kinesis Agent is using out of support AWS Java SDK 1.11.700
- Kinesis Agent 2.0.8 fails to send records using IAM role when IMDSv2 is enabled HOT 1
- Build on SLES fails pointing to Redhat
- Is it possible to user a specific value from the data as the partition key
- Kinesis agent failing to start on ubuntu HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazon-kinesis-agent.