grafana / loki Goto Github PK

Like Prometheus, but for logs.

License: GNU Affero General Public License v3.0

Makefile 0.57% Shell 0.39% Go 94.73% Yacc 0.25% Dockerfile 0.15% Ruby 0.48% Jsonnet 2.20% CSS 0.01% JavaScript 0.05% HTML 0.10% Python 0.11% HCL 0.30% Ragel 0.01% Open Policy Agent 0.01% SCSS 0.01% Smarty 0.57% Nix 0.07%

loki grafana prometheus logging cloudnative hacktoberfest

loki's Introduction

Loki: like Prometheus, but for logs.

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very cost effective and easy to operate. It does not index the contents of the logs, but rather a set of labels for each log stream.

Compared to other log aggregation systems, Loki:

does not do full text indexing on logs. By storing compressed, unstructured logs and only indexing metadata, Loki is simpler to operate and cheaper to run.
indexes and groups log streams using the same labels you’re already using with Prometheus, enabling you to seamlessly switch between metrics and logs using the same labels that you’re already using with Prometheus.
is an especially good fit for storing Kubernetes Pod logs. Metadata such as Pod labels is automatically scraped and indexed.
has native support in Grafana (needs Grafana v6.0).

A Loki-based logging stack consists of 3 components:

promtail is the agent, responsible for gathering logs and sending them to Loki.
loki is the main server, responsible for storing logs and processing queries.
Grafana for querying and displaying the logs.

Loki is like Prometheus, but for logs: we prefer a multidimensional label-based approach to indexing, and want a single-binary, easy to operate system with no dependencies. Loki differs from Prometheus by focusing on logs instead of metrics, and delivering logs via push, instead of pull.

Getting started

Upgrading

Upgrading Loki

Documentation

Latest release
Upcoming release, at the tip of the main branch

Commonly used sections:

API documentation for getting logs into Loki.
Labels
Operations
Promtail is an agent which tails log files and pushes them to Loki.
Pipelines details the log processing pipeline.
Docker Driver Client is a Docker plugin to send logs directly to Loki from Docker containers.
LogCLI provides a command-line interface for querying logs.
Loki Canary monitors your Loki installation for missing logs.
Troubleshooting presents help dealing with error messages.
Loki in Grafana describes how to set up a Loki datasource in Grafana.

Getting Help

If you have any questions or feedback regarding Loki:

Search existing thread in the Grafana Labs community forum for Loki: https://community.grafana.com
Ask a question on the Loki Slack channel. To invite yourself to the Grafana Slack, visit https://slack.grafana.com/ and join the #loki channel.
File an issue for bugs, issues and feature suggestions.
Send an email to [email protected], or use the web interface.
UI issues should be filed directly in Grafana.

Your feedback is always welcome.

Contributing

Refer to CONTRIBUTING.md

Building from source

Loki can be run in a single host, no-dependencies mode using the following commands.

You need go, we recommend using the version found in our build Dockerfile

$ go get github.com/grafana/loki
$ cd $GOPATH/src/github.com/grafana/loki # GOPATH is $HOME/go by default.

$ go build ./cmd/loki
$ ./loki -config.file=./cmd/loki/loki-local-config.yaml
...

To build Promtail on non-Linux platforms, use the following command:

$ go build ./clients/cmd/promtail

On Linux, Promtail requires the systemd headers to be installed if Journal support is enabled. To enable Journal support the go build tag flag promtail_journal_enabled should be passed

With Journal support on Ubuntu, run with the following commands:

$ sudo apt install -y libsystemd-dev
$ go build --tags=promtail_journal_enabled ./clients/cmd/promtail

With Journal support on CentOS, run with the following commands:

$ sudo yum install -y systemd-devel
$ go build --tags=promtail_journal_enabled ./clients/cmd/promtail

Otherwise, to build Promtail without Journal support, run go build with CGO disabled:

$ CGO_ENABLED=0 go build ./clients/cmd/promtail

Adopters

Please see ADOPTERS.md for some of the organizations using Loki today. If you would like to add your organization to the list, please open a PR to add it to the list.

License

Grafana Loki is distributed under AGPL-3.0-only. For Apache-2.0 exceptions, see LICENSING.md.

loki's People

Contributors

Stargazers

Watchers

Forkers

raghu999 xyntrix jranson coolbrg nashasha1 kidmam solidnerd olivierlacan loisaidasam benhall avaussant thosuperman greenpau sameeraksc sysbot g4b1s lightoyou tchen0123 jgsqware gpsbird yut148 adityacs fuhaodev tawawhite aylei shaunstanislauslau angadgill92 zach14c jdbaldry metalmatze evertramos ahmedriza emuhedo stshow jackieff luyulong hadoop835 daixiang0 jangocheng zeniel-oroi rikiyafujii raymondseger abuharis097 zhy1991 pballada lilic shotishu cxz negbie lmangani shaneutt onophris apodkutin sh1nu11bi pengjun666 nicosto dimpu47 isgasho kovetskiy bergquist gouthamve winnerineast itpizza harisiri74 prvnmali2017 huydinhle mfasanya dalavancloud mikesplain simonswine mikeraimondi yanggongwang youngxstar rleungx sjanulonoks vkatsikaros smartbit nullck mistshi talits cbenhagen sangchan marcosarruda jicanghaixb jiyulongxu spencerx xujpxm zhiqinyang jangocity huongbn lwolf sampathinturi liangliang12 ceh-forks joewen85 trendingtechnology rmoorman weiwendi naveensrinivasan gitviscodeorg

loki's Issues

Error while installing ksonnet bundle

While following the instructions on https://github.com/grafana/loki/tree/master/ksonnet, I got to the step of installing loki to my cluster. However running the following command in my app directory resulted in an error:

jb install github.com/grafana/loki/ksonnet/loki

The console output with error:

+ jb install github.com/grafana/loki/ksonnet/loki
Cloning into 'vendor/.tmp/jsonnetpkg-loki-master672665173'...
remote: Enumerating objects: 13, done.
remote: Counting objects: 100% (13/13), done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 7301 (delta 4), reused 5 (delta 1), pack-reused 7288
Receiving objects: 100% (7301/7301), 9.34 MiB | 3.45 MiB/s, done.
Resolving deltas: 100% (2771/2771), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
Cloning into 'vendor/.tmp/jsonnetpkg-ksonnet-util-master101695920'...
remote: Enumerating objects: 4872, done.
remote: Total 4872 (delta 0), reused 0 (delta 0), pack-reused 4872
Receiving objects: 100% (4872/4872), 13.41 MiB | 3.41 MiB/s, done.
Resolving deltas: 100% (1429/1429), done.
Already on 'master'
Your branch is up to date with 'origin/master'.
Cloning into 'vendor/.tmp/jsonnetpkg-consul-consul703605583'...
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 4875 (delta 0), reused 0 (delta 0), pack-reused 4874
Receiving objects: 100% (4875/4875), 13.41 MiB | 3.61 MiB/s, done.
Resolving deltas: 100% (1432/1432), done.
error: pathspec 'consul' did not match any file(s) known to git
jb: error: failed to install: failed to install package: exit status 1

I don't think I missed anything, but please let me know if I can provide any more details.

Time series for log volume

The UI shows a graph of the log messages distribution over the selected time range broken down by log level. Currently this is being produced ad-hoc based on the returned rows (1000). Would be great to have this as timeseries that exist independently, but can still be broken down by log level.

Add metrics for per tenant usage

We need to track the following per-tenant metrics:

The number of active streams (currently being reported as 1)
Number of uncompressed bytes received
Number of uncompressed bytes queried
Number of log lines received
Number of log lines queried
Bytes of compressed data stored in ObjectStore

Deduping API

Deduping of Logging is currently implemented in the browser. It would be great to examine how a backend solution might perform in comparison:
Browser deduping needs all data to be transferred to the browser, and then might block the main thread causing the UI to be laggy. Backend deduping should result in less data being transferred while keeping the UI responsive. However, switching between deduping modes will need to issue a new request and introduce a whole roundtrip to see results.

Modes of deduping:

Let's consider some typical log lines.

level=info ts=2018-11-26T07:00:11.259316Z caller=head.go:488 component=tsdb msg="head GC completed" duration=2.377771ms
level=info ts=2018-11-26T09:00:13.8587Z caller=compact.go:398 component=tsdb msg="write block" mint=1543212000000 maxt=1543219200000 ulid=01CX7KYRPYW343SSNYKM9DT9GN
level=info ts=2018-11-26T09:00:13.869071Z caller=head.go:488 component=tsdb msg="head GC completed" duration=1.297539ms
level=info ts=2018-11-26T09:00:14.337816Z caller=compact.go:352 component=tsdb msg="compact blocks" count=3 mint=1543190400000 maxt=1543212000000 ulid=01CX7KYS5Q79P73ZAK7F7V2DHS sources="[01CX7050KNQE7D574MK52KDE10 01CX771CS377RYCJM9X3ZAK9DJ 01CX7D2YVYMZAX7SZZS46YXDN6]"
level=info ts=2018-11-26T09:00:14.809531Z caller=compact.go:352 component=tsdb msg="compact blocks" count=3 mint=1543147200000 maxt=1543212000000 ulid=01CX7KYSMR8CP3Z207N6R1F4HB sources="[01CX6AR258F3PGSBZEHCQ6JBM1 01CX7051MJX15JFZNV0R8EM8NM 01CX7KYS5Q79P73ZAK7F7V2DHS]"
level=info ts=2018-11-26T11:15:01.397752Z caller=compact.go:398 component=tsdb msg="write block" mint=1543219200000 maxt=1543226400000 ulid=01CX7VNJM26JE9X5EGY28DE2N0
level=info ts=2018-11-26T11:15:01.406863Z caller=head.go:488 component=tsdb msg="head GC completed" duration=1.037851ms

Ideally, log lines provide little breadcrumbs of what a software was doing when. At the same time, lines repeat a lot, which sometimes is less useful to display as it adds a lot of noise. There are a couple of strategies we can use to identify a line as repeating and hence in need of deduping.

Exact matches:

Exact matches should be done on the whole line payload, except for dates. This allows to dedup messages that carry the exact same information. The argument for excluding the dates is that it's more likely that they are a mandatory field than being useful to the reader as a distinguishing feature of each line. As a first step, I would only strip ISO dates.

Ignore duration and IPs:

Numbers usually make up the variable parts of a log line, e.g., duration=1.037851ms. For display purposes we can choose not to care about repeated lines that have different durations. For this we can simply compare lines with numbers stripped. Numbers are also represented in IP addresses, those would be stripped as well.

Structure:

Structured logging can produce lines that could be considered repeated if their structure is the same, i.e. its keys are the same. Consider:

level=info ts=2018-11-26T09:00:14.809531Z caller=compact.go:352 component=tsdb msg="compact blocks" count=3 mint=1543147200000 maxt=1543212000000 ulid=01CX7KYSMR8CP3Z207N6R1F4HB sources="[01CX6AR258F3PGSBZEHCQ6JBM1 01CX7051MJX15JFZNV0R8EM8NM 01CX7KYS5Q79P73ZAK7F7V2DHS]"

It's structure would be: level ts caller component msg count mint maxt ulid sources. Keys might need to be sorted to be consistent.

Signature:

Some log lines carry lots of variable data, but confer little information, e.g, lines with alphanumerical IDs. For the most aggressive deduping, we could consider a line's signature, ie., stripped of all letters and numbers, so that only whitespace and punctuation remains. Consider:

level=info ts=2018-11-26T09:00:14.809531Z caller=compact.go:352 component=tsdb msg="compact blocks" count=3 mint=1543147200000 maxt=1543212000000 ulid=01CX7KYSMR8CP3Z207N6R1F4HB sources="[01CX6AR258F3PGSBZEHCQ6JBM1 01CX7051MJX15JFZNV0R8EM8NM 01CX7KYS5Q79P73ZAK7F7V2DHS]"

It's signature would be: = =--::. =.: = =" " = = = = ="[ ]".

Latency discussion:

With 24h of Cassandra logs, signature deduping can easily strip 50% of messages. Quick benchmark:

Transfer:

500 lines, 130KB take roughly 280ms
1000 lines, 260KB take roughly 420ms

Deduping

non-optimized signature deduping of 1000 lines down to ~550 takes 40ms in the browser

Rendering

1000 lines takes 300ms
500 lines takes 120ms
this can be sped up with virtualised rendering though

Open questions:

should streams be deduped individually or merged? (Merging a couple of {app="cassandra"} logs together might lead to log lines interleaving that will be considered repeated even though they may come from different hosts)

Promtail could/should send target data to Loki

Would be great to quickly check if all targets can be "scraped" or whatever the equivalent here is.

minor typo

logcli out shows usage: logcpi vs logcli

$ logcli
usage: logcpi [] [ ...]

the readme.md has this also

Print server port at startup

Helpful to get started quickly.

Promtail stops following logs after pod is restarted.

(Reported by a user, not confirmed yet)

Port to arm64

The build scripts have several dependencies on x86 that would need to be address to port loki to arm64.

at https://github.com/grafana/loki/blob/master/loki-build-image/Dockerfile#L5 , Docker is installed by copying in an x86_64 binary
at https://github.com/grafana/loki/blob/master/loki-build-image/Dockerfile#L16 , gometalinter is installed by copying in a linux-amd64 binary.

The gometalinter issue is also related at alecthomas/gometalinter#566 , and probably needs to be dealt with first.

Add config reload endoint / signal to promtail

So we don't have to restart all the pods just to change the scrape config etc.

Streaming Logs

All the backends support grpc streaming for queries, so we'd need to:

make ingesters understand limit -1 means no limit
make ingesters sleep on a condition waiting for new logs of matching streams (and deal with new streams that match)
make querier accept websocket connects and proxy the grpc requests through to that

Loki's configuration

hey,
i am trying to deploy loki using image grafana/loki:master
and this conf file:
https://github.com/grafana/loki/blob/master/docs/loki-local-config.yaml

and i’m getting errors:
caller=consul_client.go:182 msg=“error getting path” key=ring err=null
level=info ts=2018-12-13T16:07:43.715778784Z caller=lifecycler.go:360 msg=“entry not found in ring, adding with no tokens”
level=info ts=2018-12-13T16:07:43.716520284Z caller=lifecycler.go:290 msg=“auto-joining cluster after timeout”

the problem is in that section from what i understand:
ring:
store: inmemory

not sure what it the correct conf and what am i missing.

Document promtail configs

newtailer path err?

problem
i started loki by docker-compose up in loki/docs,but i got this problem

promtail_1_8e273bf154bd | level=info ts=2018-12-15T02:31:53.9581466Z caller=target.go:183 msg="start tailing file" filename=/var/log/var/log/loki7.log
promtail_1_8e273bf154bd | level=info ts=2018-12-15T02:31:53.9583949Z caller=target.go:180 msg="stopping tailing file" filename=/var/log/var/log/loki7.log

reason?
func newTailer(logger log.Logger, handler EntryHandler, positions *Positions, dir, name string) (*tailer, error) {
path := filepath.Join(dir, name)
}

some related code:
watcher, err := fsnotify.NewWatcher()
watcher.Add(path)
case event := <-t.watcher.Events:
tailer, err := newTailer(t.logger, t.handler, t.positions, t.path, event.Name)

when we give watcher absolute path, the event.Name is absolute path, so we don't need do
path := filepath.Join(dir, name)

How to add more location for scrape ?

Hi,
I run this in docker on clean ubuntu 18.04

syslog works well. I installed nginx and try to add nginx in scrape_config but I can't

scrape_configs:
- job_name: system
  entry_parser: raw
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
      __path__: /var/log
      job: nginx
     __path__:/var/log/nginx

I tried to add new label job with path, and after restart promtail container, container exited.
If I try to open interactive docker exec -it promtail I saw under /var/log my nginx folder regulary.

How to add new log file to scrape ?

basic dashboard request

Is there any basic dashboard example?

I really want to try it with a dashboard example.

promtail terminates when loki returns a 500 error from back-pressure

It appears promtail is terminating (and the pod is restarting) when it receives a 500 error from the loki server.
"Error sending batch: Error doing write: 500 - 500 Internal Server Error"
From a discussion on Slack, this occurs when the remote end is overloaded. Possibly this should be a more specific 503 slow down error?

Perhaps back-pressure from the remote end should be expected and handled by promtail, by retrying the request with a capped exponential backoff with jitter.

Additionally promtail could expose a metric indicating its consumer lag, that is the delta between the current head of the log file and what it has successfully processed sent to the remote server. That could be used in AlertManager to warn when there is a danger of loosing logs (for example in Kubernetes, Nodes automatically rotate and delete log files as they grow).

Add Series API equivalent

Given a set of labels, the API endpoint should return the list of available streams that match those labels.
See https://prometheus.io/docs/prometheus/latest/querying/api/#finding-series-by-label-matchers for how this is done in Prometheus.

Unable to start with error "../loki-local-config.yaml: no such file or directory"

Hi,

I was following the Run locally using Docker instructions and hit by this first issue of file not found.

$ docker run --name loki --network=loki -p 3100:3100 --volume "$PWD/docs:/etc/loki" grafana/loki:master -config.file=/etc/loki/loki-local-config.yaml
level=error ts=2018-12-12T09:54:40.816954563Z caller=main.go:28 msg="error loading config" filename=/etc/loki/loki-local-config.yaml err="Error reading config file: open /etc/loki/loki-local-config.yaml: no such file or directory"

Observation

When I cd into locally cloned repo of loki, I am able to start.
I was under assumption that I can start from any place like $HOME.

Helm Chart

Any chances for helm charts for promtail and loki ?

Ksonnet Library

Can't wait to get playing with this. Would you be interested in a ksonnet library a la prometheus-ksonnet, or does it already exist?

Fluentd / Fluentbit comparison to Loki / Promtail

Fluentd is one primary players in this space. It would be worthwhile for potential users of loki / promtail to understand the differences between these two log aggregators / forwarders.

Set metadata/labels via log files (ingest time)

Most of our services (nginx, application servers, etc) emit JSON lines[1] like this:

{ "timestamp" : "2018-12-24T18:00:00", "user-id" : "abc", "request-id" : "123", "message" : "Parsing new user data" }
{ "timestamp" : "2018-12-24T18:01:00", "user-id: " "def", "request-id" : "456", "message" : "Updating user record" }
{ "timestamp" : "2018-12-24T18:02:00", "user-id: " "abc", "request-id" : "789", "message" : "Keep alive ping from client" }

We'd need a way to set labels via the promtail agent (basically json parsing and tagging/labeling at ingest-time). In this example that would be the user-id and request-id metadata. For troubleshooting and debugging we need a way to see the logs for a particular user or request.

[1] http://jsonlines.org/

(this is partly related to high cardinality labels, but I've split it out into two separate issues, the other one is here: #91)

log path not set correctly in promtail for k8s 1.10 and newer

in k8s 1.10 the container log path was changed from /var/log/pods/<UID>/<container>_0.log to /var/log/pods/<UID>/<container>/0.log

kubernetes/kubernetes#55905

Rethink chunk idle semantics

In metrics, we know when to flush a chunk because we expect metrics to flow in at a constant rate. That is not the case with logs. A container might log only once every few mins or even every hour. Us flushing chunks in that case makes for bad compression and super sparse index.

One way to fix this would be to make the agent heartbeat which streams still "exist".

Promtail support for journalctl / systemd and syslog

Promtail Packages

Currently we don't have an easy way for anyone wanting to use standalone promtail to install it as a service on their host machine(s).

We should at least provide pre-built binaries, and ideally packages that can set up the services to run it at boot.

For starters we should provide .deb and .rpm packages.

Do the FORWARD vs BACKWARD direction client side

Doing it server-side is a little complicated and takes some extra effort. We compress in-order and when we read back, we just stream through the chunk bytes. Now if we want to support backward direction, we need to first get all the data in-mem in forward direction and then loop backward.

Now this means we might be deflating and holding MBs more than we need to in the ingesters. This task can be effectively delegated to the client itself.

Scale out single binary mode

Currently the all-in-one (run locally) mode requires no X-Scope-OrgID header. I am not sure if that could be the case for the distributed mode too. Most users of Loki won't need multi-tenancy and not having a gateway would help.

Make it silly easy to deploy your own

We want a really easy on boarding process. One command. This probably means:

No ring in consul; use gossip instead.
A "local" chunk store.
A single binary with all the components in.

README.md incorrect for Docker startup

The Docker instructions suggest using x-local-config.yaml in the README.md when in fact they need to use x-docker-config.yaml otherwise they'll be attempting to contact "127.0.0.1" instead of "loki".

where to update loki info in grafana cloud

I login grafana and use cloud to try loki.

I want to update below info:

Sending logs from a kubernetes cluster
You can use the following command to configure and install promtail on your kubernetes cluster:

curl -fsS https://raw.githubusercontent.com/grafana/loki/master/tools/promtail.sh | sh -s 1041 <Your Grafana.com API Key> logs-us-west1.grafana.net default | kubectl apply --namespace=default -f  -

since it only only works for default namespace.

But i do not know which file i need to update, could someone help me?

Increase grpc limits

Seeing this on distributor logs:

level=warn ts=2018-12-06T15:46:43.673737673Z caller=gokit.go:43 msg="POST /api/prom/push (500) 73.943803ms Response: \"rpc error: code = Unknown desc = out of order sample\\n\" ws: false; Accept-Encoding: gzip; Authorization: Basic ***; Connection: close; Content-Length: 140456; Content-Type: application/x-protobuf; User-Agent: Go-http-client/2.0; Via: 1.1 google; X-Cloud-Trace-Context: 7b9223a0b9071768b79f1a3de078970a/2632702499859856938; X-Forwarded-For: 35.199.19.169, 35.190.19.129, 10.15.0.35; X-Forwarded-Proto: https; X-Scope-Orgid: 29; "
level=warn ts=2018-12-06T15:46:55.01588331Z caller=gokit.go:43 msg="POST /api/prom/push (500) 210.404432ms Response: \"rpc error: code = ResourceExhausted desc = grpc: received message larger than max (6842981 vs. 4194304)\\n\" ws: false; Accept-Encoding: gzip; Authorization: Basic ***; Connection: close; Content-Length: 859722; Content-Type: application/x-protobuf; User-Agent: Go-http-client/2.0; Via: 1.1 google; X-Cloud-Trace-Context: 73eb6754952f85d83d676d8181301a9e/8514407636735609549; X-Forwarded-For: 35.194.87.10, 35.190.19.129, 10.15.0.7; X-Forwarded-Proto: https; X-Scope-Orgid: 29; "

Does not start on Windows subsystem for Linux

Not able to test out loki on my Windows machine running Ubuntu under WSL.

Tried both the docker run commands and docker-compose up without luck:

Attaching to docs_grafana_1, docs_loki_1, docs_promtail_1
loki_1      | level=error ts=2018-12-13T15:43:37.3741138Z caller=main.go:28 msg="error loading config" filename=/etc/loki/loki-local-config.yaml err="Error reading config file: open /etc/loki/loki-local-config.yaml: no such file or directory"
promtail_1  | level=error ts=2018-12-13T15:43:37.5404898Z caller=main.go:29 msg="error loading config" filename=/etc/promtail/promtail-docker-config.yaml err="Eeror reading config file: open /etc/promtail/promtail-docker-config.yaml: no such file or directory"

I tried to open a shell interactively inside the loki container so that I could verify if the volume mount was successful docker run -v "$PWD/docs:/etc/loki" -it grafana/loki:master bash but that does not work, probably because the way you have used cmd/entrypoint in the Dockerfile, but I was not able to find any Dockerfile in the repo to dig deeper.

The docker commands in the quick start also refers to config files in the docs repo, it should probably be clearer that you NEED to clone the repo even when running the docker run commands. Or better, provide some default config files or something. And config should also be over-rideable using environment variables, for example the way Grafana does it today.

Other than that I'm super stoked to try loki out soon!

Feature: multi-line logs

Currently when an multi-line event is written to a logfile promtail will take this as each row is its own entry and send the separately to loki.

It would be great if Loki could handle multi-line events due to stacktraces.

This might also be a bug if promtail already should handle multi-lines.

listen on non privileged port by default?

so that you don't need special privileges to run it.

make fails on linux

➜  loki git:(promtail-ext-lbl) ✗ make
docker run --rm --tty -i \
	-v /home/goutham/go/src/github.com/grafana/loki/.cache:/go/cache \
	-v /home/goutham/go/src/github.com/grafana/loki/.pkg:/go/pkg \
	-v /home/goutham/go/src/github.com/grafana/loki:/go/src/github.com/grafana/loki \
	grafana/loki-build-image pkg/parser/labels.go;
make: Entering directory '/go/src/github.com/grafana/loki'
make: 'pkg/parser/labels.go' is up to date.
make: Leaving directory '/go/src/github.com/grafana/loki'
docker run --rm --tty -i \
	-v /home/goutham/go/src/github.com/grafana/loki/.cache:/go/cache \
	-v /home/goutham/go/src/github.com/grafana/loki/.pkg:/go/pkg \
	-v /home/goutham/go/src/github.com/grafana/loki:/go/src/github.com/grafana/loki \
	grafana/loki-build-image cmd/loki/loki;
make: Entering directory '/go/src/github.com/grafana/loki'
go build -ldflags "-extldflags \"-static\" -linkmode=external -s -w" -tags netgo -i -o cmd/loki/loki ./cmd/loki
go build net: open /usr/local/go/pkg/linux_amd64/net.a: permission denied
Makefile:102: recipe for target 'cmd/loki/loki' failed
make: *** [cmd/loki/loki] Error 1
make: Leaving directory '/go/src/github.com/grafana/loki'
make: *** [Makefile:93: cmd/loki/loki] Error 2

Make log level a label

Making log level a filter allows first class filtering of streams by log level.

(Currently log level filtering is done in the UI, on the set of 1000 returned rows. Would be better to do this in loki so that the returned rows are more relevant.)

Design questions

Hi, I have a few questions that would help me understand Loki better, and I didn't find the answers in the design document:

Batching logs will allow better compression ratios and bigger blobs (which means lower per-operation costs), but must be balanced with the risk of data loss - what is the strategy here?
Can Loki this handle multi-line logs? Let's say a regex matches a line that is really part of a multi-line log - will the search then return all the related lines for that log?
Labels are important for Loki, and I understand that the focus is on k8s first, with automatic labelling
3.a. Exactly what labels are automatically assigned?
3.b. Is there some mechanism to add your own labels, for example to group sources by operating system, or are you limited only to those labels assigned by Loki?
3.c. If you are limited to labels assignd by Loki, how will you handle labelling as you expand out of k8s and accept logs for other sources (e.g. syslog)?
Logs really have 2 timestamps associated with them: the time the log was generated, and the time the log was ingested by Loki - will Loki be able to parse the log generation time out of logs (where it's included, and it usually is, such as with syslog), or will it only use the log ingestion time for time range searches?

hosted log demo instance failed

I login grafana to try online test, but i met below issue:

i can remove grafana instance but logs or metrics not
logs instance can not work now:

[root@dx-app ~]# logcli labels app
https://logs-us-west1.grafana.net/api/prom/label/app/values
2018/12/14 15:12:10 Error doing request: Error response from server: too many failed ingesters
 (<nil>)

I am sure my promtail can work since i can get app info in the last:

[root@dx-app ~]# logcli labels app
https://logs-us-west1.grafana.net/api/prom/label/app/values
helm
mongo
nginx
flask

Logspout integration for logging through Docker

A lot (most?) deployments of applications happen through docker. These use frameworks like Docker Swarm or Kubernetes to deploy. The deployment has little awareness of the underlying hardware/OS.

Any logging solution must integrate into Docker deployment frameworks.

A popular/generic solution is Logspout - https://github.com/gliderlabs/logspout

It is now adopted by a lot of log providers - https://github.com/logdna/logspout and https://www.loggly.com/docs/docker-logging-logspout/

Extract metrics in promtail

mtail is extremely powerful for extracting metrics from logs for systems that don't have good whitebox options. It would be nice if promtail could integrate mtail's parsing system and produce metrics based on the logs.

Support standard protocols as alternatives

One of the great things about Prometheus is that it can use simple, standard protocols for transferring data. This means I can just use "curl" on any machine for testing and that anything that can accept HTTP requests can also trivially expose prometheus metrics, without adding a requirement for any specific library or dependency.

However, loki currently appears to only support comparatively obscure "snappy" compression and a (private) protobuf format for transferring logs. Theses are probably good technical choices, but are also hurdles that complicate implementation and integration of loki into other systems.

The low hurdle of Prometheus is IMO one of the reasons for it's big success. Unlike other monitoring systems there's no single big daemon or library that needs to support everyone's use cases because you can just trivially create your own exporters. I think this is definitely something worth emulating.

I think it would be helpful to do the following to simplify agent implementation:

allow no compression and standard gzip as alternatives to snappy
allow a simple text format as alternative to protobuf

host.docker.internal only works on mac/linux and is not a best practice

see

https://docs.docker.com/docker-for-mac/networking/#port-mapping

The host has a changing IP address (or none if you have no network access). From 18.03 onwards our recommendation is to connect to the special DNS name host.docker.internal, which resolves to the internal IP address used by the host. This is for development purpose and will not work in a production environment outside of Docker for Mac.

my recommendation is to use docker-compose to run grafana+promtail+loki. that way they can all talk to each other. + you need to run fewer commands.

More than one label in selector query returns a 400

Works:

{job="default/prometheus"}

Does not work:

{job="default/prometheus",namespace="default"}

Error:

Multiple errors: [rpc error: code = Unknown desc = syntax error rpc error: code = Unknown desc = syntax error rpc error: code = Unknown desc = syntax error]

High cardinality labels

For many logging scenarios it's useful to lookup/query the log data on high cardinality labels such as request-ids, user-ids, ip-addresses, request-url and so on.

While I realize that high cardinality will affect indices, I think it's worth discussing whether this is something that Loki can support in the future.

There are a lot of logging use-cases where people can easily ditch full-text search that ELK and others provide. But not having easy lookup on log metadata, such as user-id or request-id, is a problem.

I should note that this doesn't necessarily need to be "make labels high cardinality", it could also be the introduction of some type of log-line metadata or properties, that are allowed to be high cardinality and can be queried for.

Example

Our services (nginx, application servers, etc) emit JSON lines[1] like this:

{ "timestamp" : "2018-12-24T18:00:00", "user-id" : "abc", "request-id" : "123", "message" : "Parsing new user data" }
{ "timestamp" : "2018-12-24T18:01:00", "user-id: " "def", "request-id" : "456", "message" : "Updating user record" }
{ "timestamp" : "2018-12-24T18:02:00", "user-id: " "abc", "request-id" : "789", "message" : "Keep alive ping from client" }

We ingest these into our log storage (which we would like to replace with Loki), and here are some common types of tasks that we currently do:

Bring up logs for a particular user. Usually to troubleshoot some bug they are experiencing. Mostly we know the rough timeframe (for example that it occurred during the past 2 weeks). Such a filter will usually bring up 5-200 entries. If there are more than a few entries we'll usually filter a bit more, on a stricter time intervall or based on other properties (type of request, etc).
Find the logs for a particular request, based on its request id. Again, we'd usually know the rough timeframe, say +/- a few days.
Looking at all requests that hit a particular endpoint, basically filtering on 2-3 log entry properties.

All of these, which I guess are fairly common for a logging system, require high cardinality labels (or some type of metadata/properties that are high cardinality and can be queried).

[1] http://jsonlines.org/

Updates

A few updates, since this issue was originally opened.

This design proposal doesn't directly address high-cardinality labels, but solves some of the underlying problems.

With LogQL you can grep over large amounts of data fairly easily, and find those needles. It assumes that the queries can run fast enough on large log corpora.
There is some discussion in this thread, on making it easier to find unique IDs, such as trace-id and request-id

Ingester crashes on exit

Spotted on one of the ingesters:

level=info ts=2018-12-14T09:13:39.379995933Z caller=gokit.go:36 msg="=== received SIGINT/SIGTERM ===\n*** exiting"
level=info ts=2018-12-14T09:13:39.382026879Z caller=loki.go:158 msg=stopping module=store
level=info ts=2018-12-14T09:13:39.382126813Z caller=loki.go:158 msg=stopping module=ingester
level=info ts=2018-12-14T09:13:39.382174645Z caller=lifecycler.go:443 msg="changing ingester state from" old_state=ACTIVE new_state=LEAVING
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0xbbfa8c]

goroutine 88517526 [running]:
github.com/grafana/loki/vendor/cloud.google.com/go/storage.(*Writer).open.func1(0xc077ed3860, 0xc15b1dc780, 0xc06cc148c0, 0xc231538280, 0x1, 0x1)
	/go/src/github.com/grafana/loki/vendor/cloud.google.com/go/storage/writer.go:118 +0xfc
created by github.com/grafana/loki/vendor/cloud.google.com/go/storage.(*Writer).open
	/go/src/github.com/grafana/loki/vendor/cloud.google.com/go/storage/writer.go:108 +0x471

PROPOSAL: fluentd output plugin

I think it'll be useful to have a Fluentd output plugin for Loki. Lots of people already have fluentd installs and sometimes even a complete pipeline with fluentd --> Kafka --> fluentd --> ELK and just having another output to Loki would help them use Loki without many changes.

We use promtail to tag the logs with the correct kubernetes labels but most people don't use kubernetes and still want to use Loki. While we can support Promtail for Kubernetes, I think it is best to just delegate everything else to fluentd.

how to use?

I'm interesting in this package, i have some microservices that want to log data, how to do that with loki?

promtail binary or some standalone ?

Congrats on this project.
I tested locally with docker and works great.

I already using prometheus and love they project :)

I have one baremetal lab server with virtualization and I want to implement promtail on whole instances to test, but problem is promtail. I don't want to use docker on all instances.
Do you have some binary of promtail ?

Indeed, I'm using ELK stack for logging about 50-60 instances and I want to switch on this when become stable :)

Great job ! Support from me !