The alphagov.performanceplatform-collector-config from uk-gov-mirror

This repository holds all of the configuration for our instance of performanceplatform-collector.

Creating a new query
Query Reference

Query Reference

performanceplatform.collector.ga

performanceplatform.collector.ga.trending

performanceplatform.collector.ga.realtime

performanceplatform.collector.pingdom

performanceplatform.collector.webtrends.keymetrics

Valid options:

dataType
additionalfields
mappings
idMapping
plugins

After building up the basic data like so:

{
  dataType: "the dataType in options if present,
             the data-type in data-set if not",
  _timestamp: "the period start datetime",
  ...
  and any 'measures' from the data...
}

The options will be applied in order:

additionalfields will be added.
mappings will change keys at this point.
idMappings will generate a human and encoded id from the values of the keys in the given array.
plugins will run against the data at this point.

Notes:

You need to find out what keymetrics is calling unique visitors and then add this to mapping. In the following example "Visits" is used but this may be incorrect.

{
  "data-set": {
    "data-group": "national-appointments-scheme",
    # Currently we don't pass the schema of realtime because they are returning
    # floats instead of ints.
    # This means we need to use a type other than realtime.
    "data-type": "keymetrics"
  },
  "entrypoint": "performanceplatform.collector.webtrends.keymetrics",
  "options": {
    "additionalFields": {
      "stage": "whoop",
      # This is required to pass the realtime schema.
      # It may not be necessary if we don't use this.
      "for_url": "boop"
    },
    "idMapping": ["dataType", "_timestamp", "stage"],
    # We need to specify all of these as something in backdrop is finding the original
    # field names invalid.
    # This may just be a realtime schema problem again though.
    "mappings": {
     "Avg. Time on Site": "avg_time_on_site",
     "Page Views": "page_views",
     "Bounce Rate": "page_views",
     "Visits": "unique_visitors",
     "New Visitors": "new_visitors",
     "Page Views per Visit": "page_views_per_visit",
     "Avg. Visitors per Day": "avg_visitors_per_day"
    }
  },
  # Nothing is needed for query as all key metrics should be identical.
  "query": {
  },
  "token": "webtrends"
}

performanceplatform.collector.webtrends.reports

Valid options:

row_type_name
dataType
additionalfields
mappings
idMapping
plugins

After building up the basic data like so:

{
  <row_type_name_value>: "row dimension data",
  dataType: "the dataType in options if present,
             the data-type in data-set if not",
  _timestamp: "the period start datetime",
  ...
  and any 'measures' from the data...
}

The options will be applied in order:

additionalfields will be added.
mappings will change keys at this point.
idMappings will generate a human and encoded id from the values of the keys in the given array.
plugins will run against the data at this point.

Notes:

Version

The default API version used is version 2 and the collector will work with no further config for anything with 'v2_0' in the reports or keymetrics url in the credentials file. For things with 'v3' in these urls you must add "api_version": "v3" to the credentials file in order for the version 3 urls to work.

Measures

Due to the nature of reports - the fields are defined by the service pushing data - you must inspect the response to find the measures available and choose from these. e.g. for:

{
  "data":{
    "10/29/2014-10/29/2014":{
      "measures":{
        "Visits":17.0
      },
      "SubRows":{
        ...
      }
    }
  }
}

In this case the measure available is "Visits". All measures are collected by default under their original title. If you need to change any of these then use the mappings option.

row_type_name

"data": {
  "10/14/2014-10/15/2014": {
    "SubRows": {
      "Mozilla": {
        "Attributes": null,
        "SubRows": null,
        "measures": {
          "Visits": 1.0
        }
      },
      "Google Chrome": {
        "Attributes": null,
        "SubRows": null,
        "measures": {
          "Visits": 18.0
        }
      }
    },
    "measures": {
      "Visits": 19.0
    }
  }
}

In the above data part of the response you can see that measures are grouped by browser name. "row_type_name" is a non optional argument which will tell the collector what call the data in this key for each row. In this case row type name is going to be "browser". This will result in data like the following:

{
    "Visits": 5.0,
    ...
    "browser": "Mozilla",
}

Doing completion

If the start and end stages of the service are in separate reports that you can specify two collectors collecting into a single data set and use the additionalfields parameter to distinguish one as start and one as end.

Report id

The reports config needs a report_id specified in query as below. This is unique to a service and report.

{
  "data-set": {
    "data-group": "national-appointments-scheme",
    "data-type": "browsers"
  },
  "entrypoint": "performanceplatform.collector.webtrends.reports",
  "options": {
    "additionalFields": {
      "stage": "start"
    },
    "row_type_name": "browser",
    "idMapping": ["dataType", "_timestamp", "browser"],
    "mappings": {"old_key": "new_key"},
    "dataType": "govuk_visitors",
    "plugins": [
      # see performanceplatform-collector repo for all options
      "Comment('department computed from filtersets by collect-content-dashboard-table.py')",
      "ComputeIdFrom('_timestamp', 'timeSpan', 'dataType', 'department', )"
    ]
  },
  "query": {
    # This is required and must be the valid report id.
    "report_id": "def"
  },
  "token": "webtrends"
}

Testing locally:

Copy the credentials to the location of the current dummy credentials file.
Copy the token to the location of the dummy token file.
Change performanceplatform.json to reflect the real data endpoint locally (generally http://localhost:3039/data/).
In the vm go to /var/apps/pp-puppet/development.
bowl stagecraft
bowl backdrop_read
Create a data set with the token, group and type on your local stagecraft.
From the vm run:

venv/bin/python /var/apps/performanceplatform-collector/venv/bin/pp-collector \
-q /var/apps/performanceplatform-collector-config/queries/<path_to_query_file> \
-c /var/apps/performanceplatform-collector-config/credentials/<path_to_credentials_json> \
-t /var/apps/performanceplatform-collector-config/tokens/<path_to_token_json> \
-b /var/apps/performanceplatform-collector-config/performanceplatform.json \
--console-logging \
--start=2014-10-29 --end=2014-10-30

To check the result from the vm:

curl http://localhost:3038/data/<data_group>/<data_type>.

uk-gov-mirror / alphagov.performanceplatform-collector-config Goto Github PK

alphagov.performanceplatform-collector-config's Introduction

Creating a new query

Query Reference

performanceplatform.collector.ga

performanceplatform.collector.ga.trending

performanceplatform.collector.ga.realtime

performanceplatform.collector.pingdom

performanceplatform.collector.webtrends.keymetrics

performanceplatform.collector.webtrends.reports

Testing locally:

alphagov.performanceplatform-collector-config's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent