Coder Social home page Coder Social logo

terane's People

Contributors

msfrank avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

terane's Issues

fix segment rotation off-by-one behavior

currently segments are rotated for every segment rotation policy + 1 events. need to look at the logic in terane/outputs/store.py and figure out why we are off by one.

create interactive console application

an interactive console application should have the following initial features:

  • can be given a query to execute and display it.
  • query results are scrollable.
  • query results can be expanded (to show all fields) and collapsed.
  • query results should wrap lines appropriately.
  • the screen should reflow appropriately when the window is resized.

use store workers to support multiple cores

python is single-threaded, and so a single process cannot take advantage of multiple cores. we can get around this limitation by spawning multiple child worker processes, since berkeley db supports multiple concurrent processes.

compress db data

currently, 1 event on average takes up about 10kb of space on disk. we may be able to reduce this by adding data compression. there are a lot of different compression libraries available out there, but the current state-of-the-art seems to be LZMA/LZMA2:

http://www.7-zip.org/sdk.html

zlib or bzip2 are also obvious choices. both of these are much more likely to be present on a system than lzma, but lzma seems to be gaining in popularity. since the lzma sdk is small and public domain, we can also easily copy it into terane if needed.

implement 'iter' xmlrpc method

this method would be similar to search, except it will a date range and an iter key as required parameters, and as such will not accept DATE or ID clauses in the search query. this method is used to walk through a time series of events in an efficient way.

terane console window management

window management in the terane console should be improved. there needs to be a window list, so the user can determine which windows are open. windows should be able to be jumped to if the window number is known.

colorize console output

we should add an option to colorize results in the console. this will necessitate building a more general framework for getting/setting options.

change timezone for search result timestamps

all timestamps in terane are stored in UTC. however, sometimes its easier to work with timestamps when they are in the native timezone. we should add an option to convert UTC timestamps to a local timezone, or any other specified timezone.

fix bug when expanding search results in console

  1. start console
  2. perform a search. it can be any search, as long as there are results.
  3. press c to expand the results.

expected: should show expanded results.

result: causes unhandled exception.

implement phrase search

we should add term positional data to the index so search queries such as "foo bar" will match text "foo bar baz" but not "foo baz bar".

handle file rotation correctly in file input

currently if a watched file is moved (its inode or vfs device number changes) or if its size shrinks, the file input instance terminates. we need to gracefully handle this situation.

merge messages filter into syslog filter

conceptually, the messages filter is the same as the syslog filter.. the only difference is that messages is meant to filter input from the file plugin, whereas the syslog filter is meant for the syslog input plugin. with a bit of heuristics, we can merge both filters into one.

parse postfix log events

postfix sends log messages to syslog. within the syslog message, there can be a well known structure to messages. the structured messages start with a hexadecimal number + ':', then consist of comma-separated name-value pairs.

add XMLRPC method to describe an index

we need a way to programmatically get information about an index. some useful information (not an exhaustive list) would be:

  • size of the index (number of events)
  • index schema
  • last modified
  • last id

add ISearchableOutput method to poll for changes

the xmlrpc tail method would be more efficient if it could wait to be notified when new events arrive, instead of having the client continually poll for new events. This would also make the client seem more responsive.

allow terane-server to switch uid/gid

it would be better to run terane-server as a separate user, not root, but still be able to read files owned by other users. to do this we'll need to spawn child monitor processes with appropriate permissions for each file input which is owned by a different user, monitor the file, and write events over a pipe between the server and the monitor process.

add configuration option to disable database

certain installations may not write any events locally, instead choosing to forward all events to a remote server. in this case, DatabaseService does not need to be loaded, and libdb (a large package) would not need to be installed. perhaps we should move the terane.db package into terane.outputs.store, so it only is loaded if [plugin:output:store] is defined in the server configuration.

efficiently iterate through all events

A common search query is "* WHERE DATE ", in other words, give me every event that falls within the date range. We can handle this efficiently by returning an iterator to the documents database directly, since the document keys are ordered by date.

normalize search query terms before searching

searching using terms like 'TTY' or 'foo-bar-baz' won't return any results, because the terms are normalized before inserting into the index but not normalized before searching.

we need to run search terms through a normalizing function, which will convert terms to lowercase, split complex terms into multiple simple terms joined with AND, etc.

cooperative scheduling for long running tasks

certain tasks inside terane server are long running; for example, search queries could potentially take a long time if iterating over large ranges or executing particularly large plans. we can leverage the twisted Cooperator class to schedule these tasks fairly. other long running tasks (potentially infinitely long, in fact) are input processing loops, and route processing loops.

add XMLRPC method to explain a query

useful information this method could return:

  • a normalized representation of the query
  • an estimate of how many events will be returned
  • all the fields may be present
  • which indices will be used

input widget should accept long lines

currently if a user types more characters into the input widget than fit on one screen's width, the console crashes. we need to gracefully handle this condition in the same manner as vim or irssi, where the text in the widget shifts to the left and centers the end of the line in the middle of the screen.

fix stale segment and field counts in TOC

the current method of getting segment and field counts for an Index will possibly return out of date information, specifically a count that is too small. i think this depends on whether there has been a checkpoint, but it could be some other internal detail in libdb.

the current method uses DB->stat() with DB_FAST_STAT specified. the two ways this could be solved are

  1. use counter variables in the TOC to keep track of segment and field counts. this is what i did to count documents in a Segment. the disadvantage here is that a TOC cannot be opened twice, but i don't think that's a serious issue.
  2. continue to use DB->stat(), but don't specify DB_FAST_STAT. this is easier to implement, but i don't know the performance implications. in theory, segment and field counts should be small enough that a full stat() won't take too long.

implement authentication/authorization

currently, anyone who can access a server which is running the xmlrpc listener can execute any xmlrpc command. we should implement an auth system to verify users and restrict who has permission to execute xmlrpc commands.

optimize rotated segments

if optimize segments is set to true for a store output, then run Segment.optimize() on segments after they have been rotated.

add debug window to terane console

it would aid in debugging to have a window which displays the debug log while to console is still running. this window would be opened with the ":debug" command and closed with the normal ":close" command. it would have a configurable scrollback buffer.

give each index a UUID

generate a UUID for each index at creation time, and set the same UUID in each segment.

implement searching within a results window

the user should be able to search within a results window, like you would do inside of less or vim. the user would use a vi-like syntax, searching forward using '/regex' or just '/' to use the current regex, and '?regex' or just '?'. it would be good to highlight the matching search term.

store db keys as json strings and use rich compare

the key format in the postings database is getting too ugly. in order to make keys any more complex (for example, to encode ITC stamps) it would be more intuitive to do a rich comparison of the data encoded in the key, not just a simple string comparison.

we should store the key as a string encoded using JSON, then unmarshal the JSON into a python object when comparing.

catch SIGHUP and reopen log files

if terane-server receives SIGHUP, it should reopen the log file. in order to properly support log file rotation (through logrotate or something similar).

implement range search

we want to be able to match events that are greater than or less than a specified term.

add database configuration option for DB locking

The limits that you can configure are as follows:

  • The maximum number of lockers supported by the environment. This value is used by the environment when it is opened to estimate the amount of space that it should allocate for various internal data structures. By default, 1,000 lockers are supported. To configure this value, use the DB_ENV->set_lk_max_lockers() method.
  • The maximum number of locks supported by the environment. By default, 1,000 locks are supported. To configure this value, use the DB_ENV->set_lk_max_locks() method.
  • The maximum number of locked objects supported by the environment. By default, 1,000 objects can be locked. To configure this value, use the DB_ENV->set_lk_max_objects() method.

handle file input denial of service

if a file has changed significantly since the last stat(), such that the difference in st_size is very large (how large?), then the server could get stuck in the _tail() processing loop for a noticeable amount of time. during this time, the reactor won't be able to do any processing.

remove usage of DB->stat() to count DB items

when a DB gets large, DB->stat() requires lots of locks, and this doesn't scale well. instead of using DB->stat() to count segment documents, we should keep an internal counter.

store numbers and datetimes in db

if an output receives a field value that is not a string, it should try to store the value in the db in its native format. we can implement this by creating shadow fields, named something like ":". there also needs to be a way to specify the type in a query.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.