elastic / curator Goto Github PK

Curator: Tending your Elasticsearch indices

License: Other

Python 99.25% Shell 0.58% Dockerfile 0.17%

curator's Introduction

Curator

Have indices in Elasticsearch? This is the tool for you!

Like a museum curator manages the exhibits and collections on display, Elasticsearch Curator helps you curate, or manage your indices.

ANNOUNCEMENT

Curator is breaking into version dependent releases. Curator 6.x will work with Elasticsearch 6.x, Curator 7.x will work with Elasticsearch 7.x, and when it is released, Curator 8.x will work with Elasticsearch 8.x.

Watch this space for updates when that is coming.

New Client Configuration

Curator now connects using the es_client Python module. This separation makes it much easier to update the client connection portion separate from Curator. It is largely derived from the original Curator client configuration, but with important updates.

The updated configuration file structure requires elasticsearch at the root level:

---
elasticsearch:
  client:
    hosts: https://10.11.12.13:9200
    cloud_id:
    bearer_auth:
    opaque_id:
    request_timeout: 60
    http_compress:
    verify_certs:
    ca_certs:
    client_cert:
    client_key:
    ssl_assert_hostname:
    ssl_assert_fingerprint:
    ssl_version:
  other_settings:
    master_only:
    skip_version_test:
    username:
    password:
    api_key:
      id:
      api_key:

logging:
  loglevel: INFO
  logfile: /path/to/file.log
  logformat: default
  blacklist: []

Action File Configuration

Action file structure is unchanged, for now. A few actions may have had the options modified a bit.

curator's People

Contributors

Stargazers

Watchers

Forkers

edgeofnite brandonlamb leftybc danburke benh57 leibniz137 benjaminws markwillis82 tedcaoboston honzakral gelim zothar arieb nywilken conetix mbbx6spp gitshaw web5design duydo ushios jblomberg alex-ikanow mohabusama awingu nickethier namdle imperialwicket xuanhan863 mrobee anandaverma joningle darkwarriors lithiumtech jeffsteinmetz cygnusnetworks ericmcornelius bbuchacher bradvido cazacugmihai ekristen laurentedel murnieza c10l actin tpiscitell jb11211 ferki butangero kamaradclimber gheppner bobrik wjimenez5271 tinle brainbot-com tomcashman mikeatm lowess shlomihassan paulmoriarty magnusbaeck tchyzh malagoli rtkmhart brin100 nik-petrov adjust djs52 hakobera rsomcio shotishu feltnerm hectorhua tarjei andrewshalimov justin2061 mgawinski freimer sshere cnbird1999 mauricioroman lbjay dmreiland obennymoore petitout gingerwizard dlotterman robin13 linzhonghong modulexcite steffo poneyo rstruber eroussel shejianmin ivosandoval maheedhar-cc zz basster alejandrotrev kargreat

curator's Issues

show-snapshots fails on Windows (no such file /dev/null)

Using python 3.4 and curator-script.py 1.1.2 on Windows Server 2012

Command:
curator show --show-snapshots --repository "Kibana_Repository"

Result:

Traceback (most recent call last):
  File "C:\Python34\Scripts\curator-script.py", line 9, in <module>
    load_entry_point('elasticsearch-curator==1.1.2', 'console_scripts', 'curator')()
  File "C:\Python34\lib\site-packages\curator\curator.py", line 643, in main
    stream=open(arguments.log_file, 'a') if arguments.log_file else sys.stderr)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/null'

Could not find a valid timestamp from the index

hy all,

i'm trying to use curator with my ES cluster, but whatever options i uses it always says:
2014-04-24T12:00:56.665 ERROR find_expired_indices:201 Could not find a valid timestamp from the index: [name of my indices]

my indices are named like that : collectd-logstash-DD.MM.YYYY

i've tried this options:
curator --host localhost -p collectd-logstash- -b 2 -c 4

but no luck

Fix logging whitelist in es_repo_mgr

In the haste to fix the oversight in 1.2.1, this file did not get the change.

Confirmation prompt for deletion

Would be nice to add a confirmation prompt before actually triggering deletion of indices, eg. when the delete command is invoked, print out the list of indices it has identified to be deleted, and allow the end user to specify Y/N to proceed? (just in case the end user mistyped the older-than value)

curator.py is unconditionally dry running for space-based deletion

I've created a pull request for a trivial logic fix for dry-run when deleting based on space usage.

Delete with prefix, doesn't work (on its own)

Hi,

Thanks for the wonderful tool. I'm trying to delete indices matching a pattern with this command:

curator -p .marvel- -d 0

Unfortunately, this doesn't work. It seems to suggest that it would work in the documentation without a prefix, i.e.,

curator --host blah -d 0

Could curator support prefixes?

Thank you.

-=david=-

Use a regexp to parse index date instead of split

I was thinking to schedule a delete for old indices. Unfortunately, the pattern of the indices name does not include any separator (e.g. myindex_20140624) and an error is raised by the split function line 288:

parts = unprefixed_object_name.split(separator)

Would it be possible to parse the date part of an index using a regexp instead of using a split?
Thanks in advance!

Update elasticsearch curator in pip

I've installed the curator 0.6.1 version via pip onto centos 6 and I'm seeing a problem where the time_unit is incorrect in for hours in find_expired_indicies. It is expecting hourly while the argument being passed to the script is hours.

    required_parts = 4 if time_unit == 'hourly' else 3

It's correct in github and it'd be nice if you could update the script in pip as well.

Enhancement: support Months in date math

Some of my indices are named foo-%Y.%m and curator doesn't work on them.

add support for more precise index pruning

Hi - it would be really cool if curator was able to pick which indexes to preform its magic on based on a partial index name

for example
logstash-YYYY.MM.dd - keep these for 30 days
custom-app-YYYY.MM.dd - keep these for only 5 days

Cheers

1.0 API is not revealing closed indices, so they won't get deleted

Not sure what's going on yet, but closed indices are not recognized by the 1.0 API calls and so are not deleted. This behavior did not apparently exist with the 0.4.4 API calls for ES versions < 1.0

Evidence:

Exhibit A:

$ ls -1 | grep logstash
logstash-2014.02.13
logstash-2014.02.15
logstash-2014.02.16
logstash-2014.02.17
logstash-2014.02.18
logstash-2014.02.19
logstash-2014.02.20
logstash-2014.02.21
logstash-2014.02.22

Exhibit B:

$ curator --host blackbox -D -d 6
2014-02-22T13:54:49.193 INFO                        main:333  Job starting...
2014-02-22T13:54:49.194 INFO                   _new_conn:257  Starting new HTTP connection (1): blackbox
2014-02-22T13:54:49.195 DEBUG              _make_request:374  Setting read timeout to 30
2014-02-22T13:54:49.197 DEBUG              _make_request:414  "GET / HTTP/1.1" 200 297
2014-02-22T13:54:49.198 INFO         log_request_success:49   GET http://blackbox:9200/ [status:200 request:0.004s]
2014-02-22T13:54:49.198 DEBUG        log_request_success:51   > None
2014-02-22T13:54:49.198 DEBUG        log_request_success:52   < {
  "status" : 200,
  "name" : "Blackbox",
  "version" : {
    "number" : "1.0.0",
    "build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
    "build_timestamp" : "2014-02-12T16:18:34Z",
    "build_snapshot" : false,
    "lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}

2014-02-22T13:54:49.198 DEBUG                       main:346  Detected Elasticsearch version 1.0.0
2014-02-22T13:54:49.198 INFO                        main:359  Deleting indices older than 6 days...
2014-02-22T13:54:49.199 DEBUG              _make_request:374  Setting read timeout to 30
2014-02-22T13:54:49.202 DEBUG              _make_request:414  "GET /logstash-*/_settings HTTP/1.1" 200 1543
2014-02-22T13:54:49.202 INFO         log_request_success:49   GET http://blackbox:9200/logstash-*/_settings [status:200 request:0.003s]
2014-02-22T13:54:49.202 DEBUG        log_request_success:51   > None
2014-02-22T13:54:49.202 DEBUG        log_request_success:52   < {"logstash-2014.02.21":{"settings":{"index":{"uuid":"GMamNSN-TmKodXvgYkktmg","number_of_replicas":"1","number_of_shards":"5","refresh_interval":"5s","version":{"created":"1000099"}}}},"logstash-2014.02.18":{"settings":{"index":{"codec":{"bloom":{"load":"false"}},"uuid":"s9V9b2tIRxyanZJ4s0P5vQ","number_of_replicas":"1","analysis":{"analyzer":{"default":{"type":"standard","stopwords":"_none_"}}},"number_of_shards":"5","refresh_interval":"5s","version":{"created":"901199"}}}},"logstash-2014.02.19":{"settings":{"index":{"codec":{"bloom":{"load":"false"}},"uuid":"2k3v2gl2RROXYcD3vcMslQ","number_of_replicas":"1","analysis":{"analyzer":{"default":{"type":"standard","stopwords":"_none_"}}},"number_of_shards":"5","refresh_interval":"5s","version":{"created":"901199"}}}},"logstash-2014.02.20":{"settings":{"index":{"codec":{"bloom":{"load":"false"}},"uuid":"E6IlpHOqQauIbKMC0QjqEQ","number_of_replicas":"1","analysis":{"analyzer":{"default":{"type":"standard","stopwords":"_none_"}}},"number_of_shards":"5","refresh_interval":"5s","version":{"created":"901199"}}}},"logstash-2014.02.17":{"settings":{"index":{"codec":{"bloom":{"load":"false"}},"uuid":"eaWwVnnuQ-eoWoyu5Dyl4Q","number_of_replicas":"1","analysis":{"analyzer":{"default":{"type":"standard","stopwords":"_none_"}}},"number_of_shards":"5","refresh_interval":"5s","version":{"created":"901199"}}}},"logstash-2014.02.22":{"settings":{"index":{"uuid":"JU64q1s0TaWkO1hFsOaqkA","number_of_replicas":"1","number_of_shards":"5","refresh_interval":"5s","version":{"created":"1000099"}}}}}
2014-02-22T13:54:49.205 INFO        find_expired_indices:209  logstash-2014.02.17 is 1 day, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO        find_expired_indices:209  logstash-2014.02.18 is 2 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO        find_expired_indices:209  logstash-2014.02.19 is 3 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO        find_expired_indices:209  logstash-2014.02.20 is 4 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO        find_expired_indices:209  logstash-2014.02.21 is 5 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO        find_expired_indices:209  logstash-2014.02.22 is 6 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO                  index_loop:309  DELETE index operations completed.
2014-02-22T13:54:49.206 INFO                        main:379  Done in 0:00:00.015639.

As you can see, it clearly does not see the closed indices. We will need to correct something to fix this before officially releasing curator 1.0

Support custom actions to be triggered

I'd like to see the ability for users to register their own actions to be run at given thresholds. Example API (rough draft):

@curator.action('close')
def close_index(client, index_name):
    client.indices.close(index=index_name)

Which can then be used via the CLI by specifying the 'close' action type.

Problem is how do we do the discovery - we could ask the users to wrap the code in their at this point:

#!/usr/bin/env python
import curator
# ... actions here
if __name__ == '__main__':
    curator.main()

or maybe just provide an env variable with a list of python modules to be loaded before the run:

CURATOR_ACTIONS='myapp.curator_actions' curator.py -....

This way curator itself wouldn't have to support all the options people might wish to run but will instead focus on selecting the indices for those actions and dispatching the calls to those actions.

Off by one error in date calculations

We should be able to act on the previous day's index as soon as the day rolls over. The current settings disallow this:

$ date
Tue Apr 15 11:12:10 CDT 2014
$ ./curator.py -d 1 --dry-run
…
2014-04-15T11:12:14.167 INFO                        main:369  Deleting indices older than 1 days...
2014-04-15T11:12:14.172 INFO                  index_loop:294  Would have attempted deleting index logstash-2014.04.13 because it is 1 day, 0:00:00 older than the calculated cutoff.
2014-04-15T11:12:14.172 INFO        find_expired_indices:212  logstash-2014.04.14 is 0:00:00 above the cutoff.
2014-04-15T11:12:14.173 INFO        find_expired_indices:212  logstash-2014.04.15 is 1 day, 0:00:00 above the cutoff.
…

logstash-2014.04.14 is 0:00:00 above the cutoff also suggests that it should be able to be acted on.

Delete by tag

It should be possible to go in and delete individual documents by tag - or the non-existence of a tag. This would allow you to delete unimportant documents in your collection.

Could also be useful to allow the movement of "important" docs to a different index.

Prefix not passed in get_object_list

curator.py Line 250 reads:

            object_list = get_snaplist(client, repository)

But should read:

            object_list = get_snaplist(client, repository, prefix=prefix)

Missing Copyright information

Hi,

I'm unable to find any Copyright information. I would like to close waja-archive/elasticsearch-curator#1

Thanks, Jan.

Time-window alias management

Some logging use cases benefit from not having to compute what is a "daily" or "weekly" or whatever window of indices.

We could have curator, with its knowledge of index ages, allow users to manage aliases pointing at defined ranges of time. For example, we could configure a 'weekly' alias pointing always to the last 7 days of indices or a 'yesterday' alias pointing at, well, yesterday.

Not sure how much value this provides because it's not something I hear requested frequently.

Does this script have to be run nightly or is it a one time shot?

Quick Question:
Does this script have to be run nightly (since logstash creates a new index every day),
or do I just run it once?

For instance, here is the command to run:
curator.py --host my-elasticsearch -c 30 -d 90

Thanks

Support weekly indices

Talking to @untergeek in IRC, he mentioned that curator doesn't yet support weekly indices, which I use in an app that isnt logstash, but uses the same index name formatting. It would be great to see the feature added!

Publish as egg

It would be quite convenient if one could simply pip install curator.
If there's interest I could work up a pull request to do that.
(Note however that https://pypi.python.org/pypi/curator already exists and is an image gallery tool)

TypeError: Tuple or struct_time argument required while executing logstash_index_cleaner.py

the error message received is :
File "logstash_index_cleaner.py", line 63, in get_index_epoch return time.mktime([int(part) for part in year_month_day_optionalhour] + [0, 0, 0, 0, 0]) TypeError: Tuple or struct_time argument required

the command used is:- python logstash_index_cleaner.py -d 14

this is on a 64 bit windows 2012 server with no internet access, therefore not sure if it has to do with the representation of the date format in windows or if it depends on the internet connectivity

Curator failing to optimize indexes

Curator 0.6.1 - installed with pip
Elasticsearch version 0.90.9
When trying to run an optimize task, curator fails immediately:

# /usr/local/bin/curator --host localhost --port 9201 -t 3600 -o 2 --max_num_segments 1 
2014-02-13T14:09:17.747 INFO                        main:325  Job starting...
2014-02-13T14:09:17.747 INFO                        main:360  Optimizing indices older than 2 days...
2014-02-13T14:09:17.748 INFO                   _new_conn:257  Starting new HTTP connection (1): localhost
2014-02-13T14:09:17.761 INFO         log_request_success:49   GET http://localhost:9201/_settings [status:200 request:0.013s]
2014-02-13T14:09:17.765 INFO                  index_loop:290  Attempting to optimize index logstash-2014.01.15 because it is 27 days, 0:00:00 older than cutoff.
2014-02-13T14:09:17.766 WARNING         log_request_fail:68   GET /_cluster/state/metadata/logstash-2014.01.15 [status:400 request:0.001s]
2014-02-13T14:09:17.766 INFO            log_request_fail:70   > None

Master branch doesn't work with Elasticsearch 1.0.1

Hi,

I can't run the latest master branch with Elasticsearch version 1.0.1. I've installed the latest version by cloning the master branch and then run the command setup.py install. This was suggested in a comment of this blog post.

This is the output:

python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to elasticsearch_curator.egg-info/requires.txt
writing elasticsearch_curator.egg-info/PKG-INFO
writing top-level names to elasticsearch_curator.egg-info/top_level.txt
writing dependency_links to elasticsearch_curator.egg-info/dependency_links.txt
writing entry points to elasticsearch_curator.egg-info/entry_points.txt
reading manifest file 'elasticsearch_curator.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '__pycache__' found under directory '*'
warning: no previously-included files matching '*.py[co]' found under directory '*'
writing manifest file 'elasticsearch_curator.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/curator
copying build/lib/curator/curator.py -> build/bdist.linux-x86_64/egg/curator
copying build/lib/curator/__init__.py -> build/bdist.linux-x86_64/egg/curator
byte-compiling build/bdist.linux-x86_64/egg/curator/curator.py to curator.pyc
byte-compiling build/bdist.linux-x86_64/egg/curator/__init__.py to __init__.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/entry_points.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/elasticsearch_curator-1.0.0_dev-py2.6.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing elasticsearch_curator-1.0.0_dev-py2.6.egg
creating /usr/lib/python2.6/site-packages/elasticsearch_curator-1.0.0_dev-py2.6.egg
Extracting elasticsearch_curator-1.0.0_dev-py2.6.egg to /usr/lib/python2.6/site-packages
Adding elasticsearch-curator 1.0.0-dev to easy-install.pth file
Installing curator script to /usr/bin

Installed /usr/lib/python2.6/site-packages/elasticsearch_curator-1.0.0_dev-py2.6.egg
Processing dependencies for elasticsearch-curator==1.0.0-dev
Searching for elasticsearch>=1.0.0,<2.0.0
Reading http://pypi.python.org/simple/elasticsearch/
Best match: elasticsearch 1.0.0
Downloading https://pypi.python.org/packages/source/e/elasticsearch/elasticsearch-1.0.0.tar.gz#md5=ac087d3f7a704b2c45079e7e25b56b9f
Processing elasticsearch-1.0.0.tar.gz
Running elasticsearch-1.0.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-T98TOX/elasticsearch-1.0.0/egg-dist-tmp-1CKF3Z
error: /tmp/easy_install-T98TOX/elasticsearch-1.0.0/README.rst: No such file or directory

As you can see, the install quits with an error.

If I try to install the script with pip install . then it seems to work.

Unpacking /root/curator
  Running setup.py (path:/tmp/pip-Zn9sKP-build/setup.py) egg_info for package from file:///root/curator
    warning: no previously-included files matching '__pycache__' found under directory '*'
    warning: no previously-included files matching '*.py[co]' found under directory '*'
Downloading/unpacking elasticsearch>=1.0.0,<2.0.0 (from elasticsearch-curator==1.0.0-dev)
  Downloading elasticsearch-1.0.0-py2.py3-none-any.whl (47kB): 47kB downloaded
Downloading/unpacking urllib3>=1.5,<2.0 (from elasticsearch>=1.0.0,<2.0.0->elasticsearch-curator==1.0.0-dev)
  Downloading urllib3-1.7.1.tar.gz (67kB): 67kB downloaded
  Running setup.py (path:/tmp/pip_build_root/urllib3/setup.py) egg_info for package urllib3
Installing collected packages: elasticsearch, elasticsearch-curator, urllib3
  Running setup.py install for elasticsearch-curator
    warning: no previously-included files matching '__pycache__' found under directory '*'
    warning: no previously-included files matching '*.py[co]' found under directory '*'
    Installing curator script to /usr/bin
  Running setup.py install for urllib3
Successfully installed elasticsearch elasticsearch-curator urllib3
Cleaning up...

But now it shows the following error if I execute the curator command.

Traceback (most recent call last):
  File "/usr/bin/curator", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2655, in <module>
    working_set.require(__requires__)
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 648, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 546, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: elasticsearch>=1.0.0,<2.0.0

Last but not least the installed elasticsearch version:

curl -XGET 'localhost:9200' 
{
  "status" : 200,
  "name" : "Lucky Luke",
  "version" : {
    "number" : "1.0.1",
    "build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
    "build_timestamp" : "2014-02-25T15:52:53Z",
    "build_snapshot" : false,
    "lucene_version" : "4.6"
  },
  "tagline" : "You Know, for Search"
}

TypeError: string indices must be integers

Traceback (most recent call last):
  File "/usr/bin/curator", line 9, in <module>
    load_entry_point('elasticsearch-curator==1.0.0', 'console_scripts', 'curator')()
  File "/usr/lib/python2.6/site-packages/curator/curator.py", line 345, in main
    version_number = get_version(client)
  File "/usr/lib/python2.6/site-packages/curator/curator.py", line 163, in get_version
    version = client.info()['version']['number']
TypeError: string indices must be integers

elasticsearch 1.1.1

show command broken

$ curator show --show-indices
Traceback (most recent call last):
File "/usr/bin/curator", line 9, in
load_entry_point('elasticsearch-curator==1.2.1', 'console_scripts', 'curator')()
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 704, in main
if arguments.timestring:
AttributeError: 'Namespace' object has no attribute 'timestring'

Related to new timestring and time-unit missing from parser_show but always checking for them on arguments. Not sure if you'd prefer to add those args under parser instead of skipping show or work around by setting the defaults differently.

source/origin date param

Hello guys! Thanks for your work.

I was thinking about a new feature.

If you want to re-import some data with logstash because you lost someting, may be you want to remove old data before and then, re-import the new data. So, as a feature, Could you add a source/origin date to apply a interval with -d (by example) instead of using datetime.utcnow(). It my be to use a new param in the command line.

Thanks for reading....

Future index preparation

Proposed roughly from a discussion in #logstash IRC -

Create a future index with dynamic settings, such as doing shard allocation based on some computed value, etc.

Installing elasticsearch-curator on Jenkins CI Fails

Installing elasticsearch-curator (as a dependency via pip) on Jenkins CI fails because setup.py uses environment variable BUILD_NUMBER which is set by default in Jenkins.

https://wiki.jenkins-ci.org/display/JENKINS/Building+a+software+project#Buildingasoftwareproject-JenkinsSetEnvironmentVariables

The workaround is to unset the BUILD_NUMBER before the pip install step, but I think that a better solution would be updating setup.py to check for a more specific environment variable (e.g. CURATOR_BUILD_NUMBER)

Thanks

Refactor common input patterns to methods

Per @jordansissel's comment:

@@ -96,7 +96,7 @@ def make_parser():
    parser_allocation.set_defaults(func=command_loop)
    parser_allocation.add_argument('-p', '--prefix', help='Prefix for the indices. Indices that do not have this prefix are skipped. Default: logstash-', default=DEFAULT_ARGS['prefix'])
    parser_allocation.add_argument('--timestring', help="Python strftime string to match your index definition, e.g. 2014.07.15 would be %%Y.%%m.%%d", type=str, default=None)
 -    parser_allocation.add_argument('-T', '--time-unit', dest='time_unit', action='store', help='Unit of time to reckon by: [hours|days|weeks] Default: days', default=DEFAULT_ARGS['time_unit'], type=str)
 +    parser_allocation.add_argument('-T', '--time-unit', dest='time_unit', action='store', help='Unit of time to reckon by: [hours|days|weeks|months] Default: days', default=DEFAULT_ARGS['time_unit'], type=str)

Don't worry about this right now, but long term, probably worth refactoring some of these "shared" arguments into a single method?

add_common_arguments(parser_allocation)

def add_common_arguments(parser):
    parser.add_argument('-T'...)

Delete specfic types

Hi,

Would it be possible to extend curator to delete not only specific indices like "logstash-2014.05.05", "logstash-2014.05.06", but also specific types like "logstash-2014.05.05/syslog" or "logstash-2014.05.05/apache" ? This would allow to manage different lifecycles for different types.

Update curator.py for 1.0.1

Curator currently looks for 1.0.0 as a max version. Do we want to have to increase this for every minor release?

curator fresh install error

I install curator by sudo pip install elasticsearch-curator in aws ec2 instance, after installation complete, I call command "curator --help", and got error below:

anything wrong with my setting?

Traceback (most recent call last):
File "/usr/bin/curator", line 5, in
from pkg_resources import load_entry_point
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2655, in
working_set.require(requires)
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 648, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 546, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: elasticsearch>=1.0.0,<2.0.0

Add small blurb "what is it good for"

I tried to grasp in a few second what curator is really for and now that I found the blog entry http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/ I think I barely got it.
The current README.md and the start of the Wiki IMHO fail to summarize what the tool is good for in one/two sentence.
All it currently says it "Have time-series indices in Elasticsearch? This is the tool for you!" but that doesn't give away anything.

Curator writes in Stderr instead of Stdout

Hi,

i'm experiencing problems with curator running as a cronjob.
Cron sends emails as soon as a job writes to stderr.
So far so good - but as curator seems to write its log to stderr when there is no logfile defined, cron sends emails each time the job is running.
I think this might be a bug?

Here are the code parts where stderr is used:
https://github.com/elasticsearch/curator/search?q=stderr&ref=cmdform

kind regards
Dominik

Incompatible with Python 2.6

When running curator on python 2.6 I get the following issue:

Traceback (most recent call last):
File "/usr/bin/curator", line 9, in
load_entry_point('elasticsearch-curator==1.1.0', 'console_scripts', 'curator')()
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 668, in main
logging.debug("argdict = {}".format(argdict))
ValueError: zero length field name in format

It's simple enough to patch by adding the '0' to the script. Afterwards it works like a champ.

Why not upgrade python? My version of yum is tied to python V2.6, so upgrading will break my OS.

Housekeeping + Logging update

Update CHANGELOG and do a little logging tweak.

Python 2.6 Logging Errors

I installed latest version (1.0.0) with pip. No matter which curator command I run, I get these errors:

Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/logging/init.py", line 723, in emit
msg = self.format(record)
File "/usr/local/lib/python2.6/dist-packages/logging/init.py", line 609, in format
return fmt.format(record)
File "/usr/local/lib/python2.6/dist-packages/logging/init.py", line 402, in format
s = self._fmt % record.dict
KeyError: 'funcName'

Ubuntu 10.04 LTS, Python 2.6.5.

Thank you.

tool doesn't check for closed indicies

I want to add expire logs command to a cron job and run everyday.

python logstash_index_cleaner.py --host localhost --port 9200 -p logstash- -d 30 --keep-open-days 14

Running the above command fails because the tool doesn't check the status of an index before it tries to close the index.

python logstash_index_cleaner.py --host localhost --port 9200 -p logstash- -d 30 --keep-open-days 14
2014-01-13T23:34:20+0000 INFO                      main:178  Job starting...
2014-01-13T23:34:20+0000 INFO                      main:214  Index CLOSE operations commencing...
2014-01-13T23:34:20+0000 INFO                      main:225  Closing indices older than 14 days.
2014-01-13T23:34:20+0000 INFO                 _new_conn:257  Starting new HTTP connection (1): localhost
2014-01-13T23:34:20+0000 INFO       log_request_success:49   GET http://localhost:9200/_settings [status:200 request:0.015s]
2014-01-13T23:34:20+0000 INFO                      main:238  Attempting to close index logstash-2013.12.06 because it is 25 days, 0:34:20.625264 older than cutoff.
2014-01-13T23:34:20+0000 WARNING       log_request_fail:68   POST /logstash-2013.12.06/_close [status:500 request:0.011s]
2014-01-13T23:34:20+0000 INFO          log_request_fail:70   > None
Traceback (most recent call last):
  File "logstash_index_cleaner.py", line 255, in <module>
    main()
  File "logstash_index_cleaner.py", line 241, in main
    do_operation = IndicesClient.close(index_name)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 70, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/indices.py", line 105, in close
    params=params)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 269, in perform_request
    status, headers, raw_data = connection.perform_request(method, url, params, body, ignore=ignore)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 55, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 83, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(500, u'RemoteTransportException[[savvy_es_chef_master_1][inet[/10.35.91.184:9300]][indices/close]]; nested: NullPointerException; ')

Permit staged expiration: open -> closed -> deleted

http://www.elasticsearch.org/guide/reference/api/admin-indices-open-close/

Closed indexes incur basically no run-time resource usage and reads/writes to them should fail.

Closing an index before deleting it allows you to make it unsearchable before you finally purge it. This can be useful in panicked situations where you might need that data you thought you deleted.

Scenario: Want 90 days of logs.

close logs older than 90 days
delete logs older than 120 days

This gives you a 30 day window to go "oops!" and recover anything closed if you need it but otherwise only consumes disk space (but not other resources) until deleted.

delete future indexes

I occasionally run into an issue where something creates a bunch of future indexes. (time being set incorrectly on a host being the typical reason)

Since I trust the timestamp I receive (if i don't trust it, it makes reindexing impossible) I'd like to be able to delete the future timestamps with curator. Doing it by hand is not much fun.

curator does not honor '--max_num_segments' arg on optimize

The default is 2 and no matter what setting is specified at the commandline, curator optimizes to 2 segments.

time-series indices like "custom_log20140702" cannot work well, enhancing index time format method?

I suggest that check index time using time format, not using string split.
What do you think?

like this:

curator delete [-p prefix] [-f time_format]

Curator does not work with elasticsearch release candidate

Hello,

I use Elasticsearch 1.0.0.RC2 on Debian Wheezy from the deb package available on the Elasticsearch site.

When I use curator I get the following error :
xlogerais@virt-cha-xman ~ $ curator --host localhost -d5 -c2
2014-02-28T23:59:44.865 INFO main:332 Job starting...
2014-02-28T23:59:44.866 INFO _new_conn:257 Starting new HTTP connection (1): localhost
2014-02-28T23:59:44.871 INFO log_request_success:49 GET http://localhost:9200/ [status:200 request:0.005s]
Traceback (most recent call last):
File "/usr/local/bin/curator", line 9, in
load_entry_point('elasticsearch-curator==0.6.2', 'console_scripts', 'curator')()
File "/usr/local/lib/python2.7/dist-packages/curator/curator.py", line 344, in main
version_number = get_version(client)
File "/usr/local/lib/python2.7/dist-packages/curator/curator.py", line 148, in get_version
return tuple(map(int, version.split('.')))
ValueError: invalid literal for int() with base 10: 'RC2'

It seems that the naming used for elasticsearch for theirs release candidates cause problem to curator.

Thanks.

How do you install 0.6?

We don't have elaticsearch 1.0. We have elasticsearch 0.9x, and as per the readme, it's not compatible:
Expected Elasticsearch version range > 1.0.0 < 2.0.0
ERROR: Incompatible with version 0.90.3 of Elasticsearch. Exiting.

I tried to do a:
pip install elasticsearch-curator==0.6
but it ends up installing curator 1.0

I also tried doing a:
yolk -V elasticsearch-curator
And it only reports back:
elasticsearch-curator 1.0.0

Is there a way to install curator 0.6 via pip?
(Sorry if this a pip newbie question...)

Snapshot --delete-older-than uses get_index_time

--delete-older-than DELETE_OLDER_THAN
Delete snapshots older than n TIME_UNITs.

$ curator snapshot --repository example --prefix test --delete-older-than 30
...
2014-07-29 23:25:35,894 ERROR     Could not find a valid timestamp for test_snapshot with timestring %Y.%m.%d
...

I would expect this to delete snapshots older than 30 days with a given prefix. Instead, find_expired_data always uses find_index_time to check timestring in index name, but snapshot names do not necessarily correspond to index names. Instead it would make sense to use end_time on the snapshots themselves.

I can submit a patch if you would like, although it's not clear to me if this is broken or just misleading?

Feature Request: Snapshot/Backup

With snapshots being part of the new features of ES 1.0 it makes sense to have Curator be able to capture snapshots and save to a designated target (S3, for example). Bonus points if we get it to delete the old index after a successful snapshot/backup.

Adding SSL + url_prefix + optparse compatibilty (python 2.6)

Hi there,

it's more than an inclusion request, than an issue.
Here is my modifications to include ssl+url_prefix to the Elasticsearch() call and also I added optparse backward compatibility for old systems (but still used) with python 2.6.
Else, thanks for this useful script :)

help message looks like this now :

$ ./curator.py --help
Usage: curator.py [options]

Curator for Elasticsearch indices.  Can delete (by space or time), close,
disable bloom filters and optimize (forceMerge) your indices.

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  --host=HOST           Elasticsearch host. Default: localhost
  --url_prefix=URL_PREFIX
                        Elasticsearch http url prefix. Default: none
  --port=PORT           Elasticsearch port. Default: 9200
  --ssl                 Connect to Elasticsearch through SSL. Default: false
  -t TIMEOUT, --timeout=TIMEOUT
                        Elasticsearch timeout. Default: 30
  -p PREFIX, --prefix=PREFIX
                        Prefix for the indices. Indices that do not have this
                        prefix are skipped. Default: logstash-
  -s SEPARATOR, --separator=SEPARATOR
                        Time unit separator. Default: .
  -C CURATION_STYLE, --curation-style=CURATION_STYLE
                        Curate indices by [time, space] Default: time
  -T TIME_UNIT, --time-unit=TIME_UNIT
                        Unit of time to reckon by: [days, hours] Default: days
  -d DELETE_OLDER, --delete=DELETE_OLDER
                        Delete indices older than n TIME_UNITs.
  -c CLOSE_OLDER, --close=CLOSE_OLDER
                        Close indices older than n TIME_UNITs.
  -b BLOOM_OLDER, --bloom=BLOOM_OLDER
                        Disable bloom filter for indices older than n
                        TIME_UNITs.
  -g DISK_SPACE, --disk-space=DISK_SPACE
                        Delete indices beyond n GIGABYTES.
  --max_num_segments=MAX_NUM_SEGMENTS
                        Maximum number of segments, post-optimize. Default: 2
  -o OPTIMIZE, --optimize=OPTIMIZE
                        Optimize (Lucene forceMerge) indices older than n
                        TIME_UNITs.  Must increase timeout to stay connected
                        throughout optimize operation, recommend no less than
                        3600.
  -n, --dry-run         If true, does not perform any changes to the
                        Elasticsearch indices.
  -D, --debug           Debug mode
  -l LOG_FILE, --logfile=LOG_FILE
                        log file

Examples highlighting new features :

$ curator.py --host foo.bar --port 443 --ssl --url_prefix "backend" --prefix "windows-" --delete 31 -n
2014-01-27T13:22:42.000 INFO                        main:333  Job starting...
2014-01-27T13:22:42.000 INFO                        main:352  Deleting indices older than 31 days...
2014-01-27T13:22:42.001 INFO                   _new_conn:635  Starting new HTTPS connection (1): foo.bar
2014-01-27T13:22:42.025 INFO         log_request_success:49   GET http://foo.bar:443/backend/_settings [status:200 request:0.024s]
[...]
2014-01-27T13:22:42.122 INFO        find_expired_indices:196  windows-2014.01.27 is 32 days, 0:00:00 above the cutoff.
2014-01-27T13:22:42.123 INFO                  index_loop:309  DELETE index operations completed.
2014-01-27T13:22:42.123 INFO                        main:372  Done in 0:00:00.124427.

Please find the following patch,
Cheers!

diff --git a/curator.py b/curator.py
index 11c8272..0dd3ff8 100755
--- a/curator.py
+++ b/curator.py
@@ -34,7 +34,6 @@
 import sys
 import time
 import logging
-import argparse
 from datetime import timedelta, datetime

 import elasticsearch
@@ -55,12 +54,21 @@ logger = logging.getLogger(__name__)

 def make_parser():
     """ Creates an ArgumentParser to parse the command line options. """
-    parser = argparse.ArgumentParser(description='Curator for Elasticsearch indices.  Can delete (by space or time), close, disable bloom filters and optimize (forceMerge) your indices.')
-
-    parser.add_argument('-v', '--version', action='version', version='%(prog)s '+__version__)
-
+    help_desc = 'Curator for Elasticsearch indices.  Can delete (by space or time), close, disable bloom filters and optimize (forceMerge) your indices.'
+    try:
+        import argparse
+        parser = argparse.ArgumentParser(description=help_desc)
+        parser.add_argument('-v', '--version', action='version', version='%(prog)s '+__version__)
+    except ImportError:
+        import optparse
+        parser = optparse.OptionParser(description=help_desc, version='%prog '+ __version__)
+        parser.parse_args_orig = parser.parse_args
+        parser.parse_args = lambda: parser.parse_args_orig()[0]
+        parser.add_argument = parser.add_option
     parser.add_argument('--host', help='Elasticsearch host. Default: localhost', default='localhost')
+    parser.add_argument('--url_prefix', help='Elasticsearch http url prefix. Default: none', default='')
     parser.add_argument('--port', help='Elasticsearch port. Default: 9200', default=9200, type=int)
+    parser.add_argument('--ssl', help='Connect to Elasticsearch through SSL. Default: false', action='store_true', default=False)
     parser.add_argument('-t', '--timeout', help='Elasticsearch timeout. Default: 30', default=30, type=int)

     parser.add_argument('-p', '--prefix', help='Prefix for the indices. Indices that do not have this prefix are skipped. Default: logstash-', default='logstash-')
@@ -332,8 +340,7 @@ def main():
         logger.error('Malformed arguments: {0}'.format(';'.join(check_args)))
         parser.print_help()
         return
-
-    client = elasticsearch.Elasticsearch('{0}:{1}'.format(arguments.host, arguments.port), timeout=arguments.timeout)
+    client = elasticsearch.Elasticsearch(host=arguments.host, port=arguments.port, url_prefix=arguments.url_prefix, timeout=arguments.timeout, use_ssl=arguments.ssl)

     # Delete by space first
     if arguments.disk_space:

Curate Index Shard Allocation

Curator would be a great place to implement time based shard allocation for indicies.

Example use case:
After n days set index tags so that indicies move to elasticsearch instances with larger slower disks.

Not counting closed indices in curate by space

I run an ELK stack and keep the last 30 days' worth of indices open for searching with Kibana. I automatically close any indexes older than 30 days using Curator, which works fine.

However, I'm looking to build in automation for keeping on top of disk space use and I've found that when I run

 /usr/local/bin/curator --host localhost --prefix logstash- -C space -g 200

It seems to only take into account the currently-open indices, which implies it will run into issues when I run up against the 200G limit I've set, as the open indexes will never be that large.

Is this an intended feature or does something need altering in order to account for closed indexes as well when running a space-based cleanup?

logstash_index_cleaner generates a traceback

Hey there!

CentOS 6.5, EPEL repo installed via rpm and pip installed via EPEL.

Running the following command results in the error.

[root@logstash expire-logs]# python logstash_index_cleaner.py --host my-elasticsearch -d 14
2014-01-10T00:53:43+0000 INFO                      main:178  Job starting...
Traceback (most recent call last):
  File "logstash_index_cleaner.py", line 255, in <module>
    main()
  File "logstash_index_cleaner.py", line 182, in main
    h = logging.NullHandler()
AttributeError: 'module' object has no attribute 'NullHandler'

Multiple iterations of the command result in the same error. If I can provide any further information, please let me know!