Coder Social home page Coder Social logo

uk-gov-mirror / alphagov.performanceplatform-collector-config Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alphagov/performanceplatform-collector-config

0.0 0.0 0.0 2.36 MB

Configuration for our instance of the performanceplatform-collector

License: MIT License

Shell 100.00%

alphagov.performanceplatform-collector-config's Introduction

Code Health

This repository holds all of the configuration for our instance of performanceplatform-collector.

Creating a new query

Add a new config file in /queries.

Query Reference

performanceplatform.collector.ga

performanceplatform.collector.ga.trending

performanceplatform.collector.ga.realtime

performanceplatform.collector.pingdom

performanceplatform.collector.webtrends.keymetrics

Valid options:

  • dataType
  • additionalfields
  • mappings
  • idMapping
  • plugins

After building up the basic data like so:

{
  dataType: "the dataType in options if present,
             the data-type in data-set if not",
  _timestamp: "the period start datetime",
  ...
  and any 'measures' from the data...
}

The options will be applied in order:

  1. additionalfields will be added.
  2. mappings will change keys at this point.
  3. idMappings will generate a human and encoded id from the values of the keys in the given array.
  4. plugins will run against the data at this point.

Notes:

You need to find out what keymetrics is calling unique visitors and then add this to mapping. In the following example "Visits" is used but this may be incorrect.

{
  "data-set": {
    "data-group": "national-appointments-scheme",
    # Currently we don't pass the schema of realtime because they are returning
    # floats instead of ints.
    # This means we need to use a type other than realtime.
    "data-type": "keymetrics"
  },
  "entrypoint": "performanceplatform.collector.webtrends.keymetrics",
  "options": {
    "additionalFields": {
      "stage": "whoop",
      # This is required to pass the realtime schema.
      # It may not be necessary if we don't use this.
      "for_url": "boop"
    },
    "idMapping": ["dataType", "_timestamp", "stage"],
    # We need to specify all of these as something in backdrop is finding the original
    # field names invalid.
    # This may just be a realtime schema problem again though.
    "mappings": {
     "Avg. Time on Site": "avg_time_on_site",
     "Page Views": "page_views",
     "Bounce Rate": "page_views",
     "Visits": "unique_visitors",
     "New Visitors": "new_visitors",
     "Page Views per Visit": "page_views_per_visit",
     "Avg. Visitors per Day": "avg_visitors_per_day"
    }
  },
  # Nothing is needed for query as all key metrics should be identical.
  "query": {
  },
  "token": "webtrends"
}

performanceplatform.collector.webtrends.reports

Valid options:

  • row_type_name
  • dataType
  • additionalfields
  • mappings
  • idMapping
  • plugins

After building up the basic data like so:

{
  <row_type_name_value>: "row dimension data",
  dataType: "the dataType in options if present,
             the data-type in data-set if not",
  _timestamp: "the period start datetime",
  ...
  and any 'measures' from the data...
}

The options will be applied in order:

  1. additionalfields will be added.
  2. mappings will change keys at this point.
  3. idMappings will generate a human and encoded id from the values of the keys in the given array.
  4. plugins will run against the data at this point.

Notes:

Version

The default API version used is version 2 and the collector will work with no further config for anything with 'v2_0' in the reports or keymetrics url in the credentials file. For things with 'v3' in these urls you must add "api_version": "v3" to the credentials file in order for the version 3 urls to work.

Measures

Due to the nature of reports - the fields are defined by the service pushing data - you must inspect the response to find the measures available and choose from these. e.g. for:

{
  "data":{
    "10/29/2014-10/29/2014":{
      "measures":{
        "Visits":17.0
      },
      "SubRows":{
        ...
      }
    }
  }
}

In this case the measure available is "Visits". All measures are collected by default under their original title. If you need to change any of these then use the mappings option.

row_type_name

"data": {
  "10/14/2014-10/15/2014": {
    "SubRows": {
      "Mozilla": {
        "Attributes": null,
        "SubRows": null,
        "measures": {
          "Visits": 1.0
        }
      },
      "Google Chrome": {
        "Attributes": null,
        "SubRows": null,
        "measures": {
          "Visits": 18.0
        }
      }
    },
    "measures": {
      "Visits": 19.0
    }
  }
}

In the above data part of the response you can see that measures are grouped by browser name. "row_type_name" is a non optional argument which will tell the collector what call the data in this key for each row. In this case row type name is going to be "browser". This will result in data like the following:

{
    "Visits": 5.0,
    ...
    "browser": "Mozilla",
}

Doing completion

If the start and end stages of the service are in separate reports that you can specify two collectors collecting into a single data set and use the additionalfields parameter to distinguish one as start and one as end.

Report id

The reports config needs a report_id specified in query as below. This is unique to a service and report.

{
  "data-set": {
    "data-group": "national-appointments-scheme",
    "data-type": "browsers"
  },
  "entrypoint": "performanceplatform.collector.webtrends.reports",
  "options": {
    "additionalFields": {
      "stage": "start"
    },
    "row_type_name": "browser",
    "idMapping": ["dataType", "_timestamp", "browser"],
    "mappings": {"old_key": "new_key"},
    "dataType": "govuk_visitors",
    "plugins": [
      # see performanceplatform-collector repo for all options
      "Comment('department computed from filtersets by collect-content-dashboard-table.py')",
      "ComputeIdFrom('_timestamp', 'timeSpan', 'dataType', 'department', )"
    ]
  },
  "query": {
    # This is required and must be the valid report id.
    "report_id": "def"
  },
  "token": "webtrends"
}

Testing locally:

  • Copy the credentials to the location of the current dummy credentials file.
  • Copy the token to the location of the dummy token file.
  • Change performanceplatform.json to reflect the real data endpoint locally (generally http://localhost:3039/data/).
  • In the vm go to /var/apps/pp-puppet/development.
  • bowl stagecraft
  • bowl backdrop_read
  • Create a data set with the token, group and type on your local stagecraft.
  • From the vm run:
venv/bin/python /var/apps/performanceplatform-collector/venv/bin/pp-collector \
-q /var/apps/performanceplatform-collector-config/queries/<path_to_query_file> \
-c /var/apps/performanceplatform-collector-config/credentials/<path_to_credentials_json> \
-t /var/apps/performanceplatform-collector-config/tokens/<path_to_token_json> \
-b /var/apps/performanceplatform-collector-config/performanceplatform.json \
--console-logging \
--start=2014-10-29 --end=2014-10-30
  • To check the result from the vm:
curl http://localhost:3038/data/<data_group>/<data_type>.

alphagov.performanceplatform-collector-config's People

Contributors

alexmuller avatar annapowellsmith avatar cliffshepp avatar easternbloc avatar edds avatar fawkesley avatar jabley avatar jcbashdown avatar jonnywyatt avatar leelongmore avatar leenagupte avatar lennym avatar mattrco avatar pwaller avatar robyoung avatar roc avatar shotclog avatar theotherurmy avatar timmow avatar tombooth avatar zilnhoj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.