Coder Social home page Coder Social logo

newrelic / newrelic-pcf-nozzle-tile Goto Github PK

View Code? Open in Web Editor NEW
9.0 27.0 16.0 70.37 MB

New Relic Firehose Nozzle for Pivotal Cloud Foundry

License: Apache License 2.0

Go 98.68% Shell 0.29% Makefile 1.03%
nrlabs nrlabs-data nrlabs-odp observability-data pcf tanzu

newrelic-pcf-nozzle-tile's Introduction

New Relic Community Plus header

New Relic VMware Tanzu Nozzle Tile

The New Relic VMware Tanzu (PCF) Nozzle Tile is a Firehose nozzle that forwards metrics from VMware Tanzu Loggregator in VMware Tanzu to New Relic for visualization. ​ This code can be either pushed as a regular VMware Tanzu application with cf push or installed in Ops Manager using the tile version. ​ The tile is also available in the Pivotal Network alongside the documentation describing how to configure and use the nozzle. ​ See our documentation for more details.

Compatibility

​ The New Relic VMware Tanzu Nozzle Tile is compatible with VMware Tanzu 2.4 and higher. ​

Changes From V1 to V2

​ The V2 release includes several additional features as well as breaking changes. Deployment configurations, alerts, and dashboards might require updates. Full details are available in the V2 notes. ​

Main updates

  • Reverse Log Proxy Gateway and V2 Envelope Format
  • Event type adjustments
  • Event attribute modifications
  • Event aggregation - metric type events
  • Multi-account event routing
  • Caching and rate limiting - VMware Tanzu API calls
  • Configuration variable changes
  • Log message filters
  • Metric type filters removed
  • Graceful shutdown ​

Application build and deploy

​ The application is prebuilt and can be pushed as an application or imported to Ops Manager. If you make any changes to the code, you can rebuild both the binary and the tile. ​

Build the binary

  1. Get dep to manage dependencies:
$ go get -u github.com/golang/dep/cmd/dep
  1. Generate the nr-fh-nozzle binary inside ./dist:
$ make build-linux

You can then deploy the application with cf push using the newly generated files, as described in the push as an application section. ​

Generate the tile

  1. Install bosh-cli and tile-generator. ​
  2. Generate the tile under ./product.
$ make release

You can use the generated tile right away and import it to Ops Manager. ​

Testing

Requirements

  • Access to a VMware Tanzu environment
  • VMware Tanzu API credentials with admin rights
  • VMware Tanzu UAA authorized client

Setup

  1. Set your environment variables as per the manifest.yml.sample file.
  2. Run go run main.go ​ To run tests and compile locally:
$ make build

Generate UAAC Client

​ You can create a new doppler.firehose enabled client instead of retrieving the default client:

$ uaac target https://uaa.[your cf system domain]
$ uaac token client get admin -s [your admin-secret]
$ uaac client add firehose-to-newrelic \
    --name firehose-to-newrelic \
    --secret [your_client_secret] \
    --authorized_grant_types client_credentials,refresh_token \
    --authorities doppler.firehose,cloud_controller.admin_read_only \
    --scope doppler.firehose

  • firehose-to-newrelic: your NRF_CF_CLIENT_ID env variable.
  • --secret: your NRF_CF_CLIENT_SECRET env variable. ​

Push as an application

​ When you push the app as an application, you must edit manifest.yml first

  1. Download the manifest.yml.sample file and the release from the repo.
  2. Unzip the release, rename manifest.yml.sample to manifest.yml and place the file in the dist directory.
  3. Modify the manifest file to match your environment.
  4. Deploy:
cf push -f <manifest file>

Make sure to assign proper values to all required environment variables. Any property values within angle brackets need to be changed to the correct value for your environment. ​

When you're pushing the nozzle as an app, the product and release folders are not required. Make sure to remove both folders to reduce the size of the upload, or use the .cfignore file. ​ ​

Import as a tile in Ops Manager

​ Import the tile from releases to Ops Mgr. Once imported, install the tile, and follow the steps detailed in the Pivotal Partner Docs. ​

Import dashboard

A VMware Tanzu dashboard could be manually imported to New Relic dashboards using the Dashboard API. Follow this documentation to get detailed information about where to obtain the Admin user API key and to use the API explorer.

  1. Go to API Explorer
  2. Use your Admin user API key
  3. Copy the content of dashboard.json and paste to the dashboard parameter of the request.
  4. Send the request

Support

Should you need assistance with New Relic products, you are in good hands with several support diagnostic tools and support channels.

If the issue has been confirmed as a bug or is a feature request, file a GitHub issue.

Support Channels

Privacy

At New Relic we take your privacy and the security of your information seriously, and are committed to protecting your information. We must emphasize the importance of not sharing personal data in public forums, and ask all users to scrub logs and diagnostic information for sensitive information, whether personal, proprietary, or otherwise.

We define “Personal Data” as any information relating to an identified or identifiable individual, including, for example, your name, phone number, post code or zip code, Device ID, IP address, and email address.

For more information, review New Relic’s General Data Privacy Notice.

Contribute

We encourage your contributions to improve this project! Keep in mind that when you submit your pull request, you'll need to sign the CLA via the click-through using CLA-Assistant. You only have to sign the CLA one time per project.

If you have any questions, or to execute our corporate CLA (which is required if your contribution is on behalf of a company), drop us an email at [email protected].

A note about vulnerabilities

As noted in our security policy, New Relic is committed to the privacy and security of our customers and their data. We believe that providing coordinated disclosure by security researchers and engaging with the security community are important means to achieve our security goals.

If you believe you have found a security vulnerability in this project or any of New Relic's products or websites, we welcome and greatly appreciate you reporting it to New Relic through our bug bounty program.

If you would like to contribute to this project, review these guidelines.

To all contributors, we thank you! Without your contribution, this project would not be what it is today. ​

License

​ The project is released under version 2.0 of the Apache License.

newrelic-pcf-nozzle-tile's People

Contributors

ardias avatar bpecknr avatar davidgit avatar gsanchezgavier avatar miransar avatar mlong-nr avatar paologallinaharbur avatar shahramk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

newrelic-pcf-nozzle-tile's Issues

Expose Cache

Is your feature request related to a problem? Please describe.

A customer complained about too many requests to the API. It may be related to the cache timeout being statically set to 30 mins.

Feature Description

We have added a configuration to change the cache timeout/ttl but we have not exposed it. We should expose this configuration so they can adapt the timeout to their environment.

Also, there is an optimization that can be done when clearing the cache: instead of clearing everything we could have a lastUpdated timestamp and only clear entries that are older than a certain amount (5 minutes, or a percentage of the cache_timeout value configured. For example if cache_timeout is 1 hour then clear everything with 50% of the value, in this case that would be 30 mins)

Describe Alternatives

A clear and concise description of any alternative solutions or features you've considered. Are there examples you could link us to?

Additional context

Notes from the original Jira:
@gsanchezgavier "I suggest to clear from cache apps that hasn’t be used (consulted from the cache) in the last period of time defined by cache_timeout. "
@ardias "Brian also suggested removing from the cache all the applications that return 404 when updating."

Priority

Nice to Have

Misconfiguration in Insights authentication log fload

In case the authentication fails multiple logs are created, out for each sample and not mechanism is in place to reduce the speed as the post failed.

xxxxxxxxxxx 1 Wed May 06 2020 14:10:24.702 time="2020-05-06T18:10:24Z" level=error msg="Failed to send insights events [1/3]. Will retry. Error: Insights Post: : bad response from Insights: 403 \n\t{}"

Expected behaviour:
The logs are shown, but the amount between each event is reasonable.

Support for "logs in context" for BOSH Release VMs

Description

I am investigating ways to have log data pulled from this nozzle to show up in the Logs UI view for a selected infrastructure host that is a VM deployed by Bosh hosting various Jobs. I work at new relic, so please reach out and I'm happy to show some screenshots.

Acceptance Criteria

Pass the 'bosh.hostname' of BOSH VMs/Hosts with logs.

Describe Alternatives

None known at this time. I am unable to find a way for customers to decorate firehose logs with their own attributes/tags. There may be another way to tackle this.

The goal is for the customer to be able to see firehose logs in the "Host Log's UI" when looking at a host related to a specific BOSH VM (i.e. vms for diego, api, router, etc...) where the NR Infrastructure Agent is deployed.

Dependencies

Possibly the infrastructure team?

For Maintainers Only or Hero Triaging this bug

Suggested Priority (P1,P2,P3,P4,P5):
Suggested T-Shirt size (S, M, L, XL, Unknown):

Behaviour change on VMware Tanzu RabbitMQ [VMs] 2.0.5-build.12

Originally reported here : https://newrelic.slack.com/archives/CP2EWSX7C/p1652918372937719

TAS 2.12 and our PCF nozzle to ingest PCF metrics and logs.
The raw metric event on the PCF side of the integration has some rabbitmq metrics that come in as 'tags' example:
origin:"p.rabbitmq" eventType:ValueMetric timestamp:1651547607958710939 deployment:"service-instance_93ab5c31-e725-4e3b-9cd6-c2d17d411263" job:"rabbitmq-server" index:"28b0b60e-2cc8-40d2-af8e-aabf2e4fe5e2" ip:"10.223.80.54" tags:<key:"instance_id" value:"rabbit@28b0b60e-2cc8-40d2-af8e-aabf2e4fe5e2.rabbitmq-server.services.service-instance-93ab5c31-e725-4e3b-9cd6-c2d17d411263.bosh" > tags:<key:"queue" value:"betatest" > tags:<key:"source_id" value:"rabbit@localhost" > tags:<key:"vhost" value:"93ab5c31-e725-4e3b-9cd6-c2d17d411263" > valueMetric:<name:"and" value:0 unit:"" >

On the new relic side the PCF metric event that is generated does not have this tag metric data -tags:<key:"queue" value:"betatest"
The New Relic translation just has valueMetric:<name:"rabbitmq_queue_messages_unacked" value:0
The user wants to see pcf.key= queue and value = betatest, i.e. they want the tags that decorate the PCF metric.

it changed when the user upgraded VMware Tanzu RabbitMQ [VMs] 1.21.7-build.12 to VMware Tanzu RabbitMQ [VMs] 2.0.5-build.12

Dashboard code is outdated

[NOTE]: # The JSON will not import into the new platform interface.

Description

[NOTE]: When trying to copy and paste the JSON code for the dashboard, it is not compatible with new platform requirements

Expected Behavior

[NOTE]: Import of JSON code to work

Scheduled cache purging based on LastPull attribute is not executed.

Description

The switch case statement in cfclient/cfapps/cache.go that checks the LastPull time for each application in the cache to remove old entries does not get executed. Only UpdateInstances and adding new applications to the cache ever get executed.

This can cause:

  • If an application in the cache no longer exists, the nozzle will continue to attempt to update instance details for that application and trigger HTTP 404 errors.
  • If the rate limiter causes the initial FetchApp calls to timeout, an app.name attribute can get stuck with the value of WAITING FOR DATA. Since FetchApp is only called for new applications and the cached applications are never removed, the WAITING FOR DATA app.name would never be updated.
  • If VCAP details for an existing application to enable multi-account event router are updated (RPM account ID / Insights insert key) those changes would not take effect until the nozzle is restarted.

This behavior

Steps to Reproduce

A logging statement can be added to the case statement for time.NewTicker(cacheDuration * time.Minute).C to show that the logic isn't executed.

Expected Behavior

The scheduled cache purge should be corrected so that cached application data is only used for ~30 minutes and then FetchApp will be called to get updated application data when a ContainerMetric or LogMessage envelope type is received for that application GUID again.

PCFCapacity continues to report data for diego cells that no longer exist

Description

PCFCapacity events continue to include data for diego cells that no longer exist.

Steps to Reproduce

  1. Start the nozzle and allow data to start reporting.
  2. Scale down the number of diego cells.
  3. Confirm that PCFCapacity events still contain details on the diego cells that have been removed.

Expected Behavior

PCFCapacity events should no longer report data for diego cells that have been removed.

Relevant Logs / Console output

Your Environment

Additional context

Impact is limited to PCFCapacity events. PCFContainerMetric, PCFCounterEvent, PCFHttpStartStop, PCFLogMessage, and PCFValueMetric are not impacted.

Tile no longer sets NR_CF_API_Password

[NOTE]: # tile no longer sets CF uaa admin password which causes issues if you try to reset the uaa password

Description

[NOTE]: # recently reset PCf uaa admin password but NR firehose app is referencing old uaa admin password. This use to be configurable in the tile. This is causing an issue as the app requires it at start up.

[support ticket opened with NR team

Custom Attributes are not being sent via Firehose

SUPPORT TICKET ID: 500Ph000007dNFF

We are having issues with data consistency with our newrelic firehose.

it seems like the firehose is not consistent with sending custom attributes for events.

example event from NR UI ( I have masked the values here for security, but assume each one does have a value ).

{
  "appId": ,
  "appName": "",
  "containerId": "",
  "daily_crawls":,
  "daily_errors": ,
  "date": "",
  "entityGuid": "",
  "host": "worker",
  "organization_id": "",
  "priority":,
  "products":,
  "realAgentId":,
  "retailer": "",
  "tags.account": "",
  "tags.accountId": "",
  "tags.trustedAccountId": "",
  "timestamp":,
  "weekly_crawls": ,
  "weekly_errors": 
},

example event JSON from firehose:

{
    "accountId":,
    "appName": "",
    "applicationId":,
    "containerId": "",
    "entityGuid": "",
    "eventType": "",
    "host": "",
    "nrNoWrite:customAttributes": 8,
    "organization_id": "",
    "priority": ,
    "realAgentId": ,
    "retailer": "",
    "tags.account": "",
    "tags.accountId": "",
    "tags.trustedAccountId": "",
    "timestamp": 
}

what is the nrNoWrite:customAttributes value? is it masking our custom attributes like weekly_errors ?

our understanding is that we are using a legacy NR firehose connection, and we are thinking this might be the cause of the issue.

is there documentation for this firehose? or potentially a more up to date code path we could use?

Including spaces in nrf_enabled_envelope_types leads to dropped data types

Description

Including spaces in a comma or pipe separated value for nrf_enabled_envelope_types causes those envelope types to not be sent to the New Relic platform. Ideally we should make this case insensitive as well.

Expected Behavior

Ability to include spaces in the configuration and potentially not match the expected case without impacting customers.

Steps to Reproduce

Set nrf_enabled_envelope_types to ContainerMetric, CounterEvent, ValueMetric and only ContainerMetric will be reported.

The nozzle v1.1.20 is calling /v2/apps with deprecated parameter inline-relations-depth

  1. Customer deployed new relic nozzle tile downloaded from Pivotal (https://network.pivotal.io/products/nr-firehose-nozzle/).

  2. The installed nozzle apps are polling CAPI once for every minute.
    NOZZLE_APP_DETAIL_INTERVAL: 1
    The requests looks like:
    "GET /v2/apps?inline-relations-depth=2&order-direction=asc&page=2&results-per-page=50

  3. The parameter inline-relations-depth has been deprecated for a while. The use of this parameter will make the API call more expensive as explained by CAPI team:

inline-relations-depth queries have been deprecated for a very long time due to performance issues like this. In particular, each time we add new features or database models, inline-relations-depth queries will now have to join (with or without an index!) those new models into each query.

so as our database model grows more complex, those queries get more and more expensive.

Please consider to remove the use of this parameter.

Wayne
Pivotal Customer Support

Correct EU API endpoints configuration. Allow separate custom configuration of Both Logs and Event API endpoint.

Description

Default EU configuration (UI) does not send Event data to EU New Relic Accounts due to misconfiguration in newrelic-client-go

This can be fixed by setting a "Custom Insights URL" but this also overrides the Logging API endpoint causing logging data to not be sent.
https://github.com/newrelic/newrelic-pcf-nozzle-tile/blob/master/newrelic/nrclients/nrclients.go#L192-L193

Expected Behavior

[NOTE]: # When selecting EU as account region that the data be sent to the EU.
When Custom Insights URL is chosen that this only configures the Event Endpoint URL and not both the Event and Logs endpoint.

Additional context

I have opened a ticket to fix the default value for the New Relic Client API endpoint
newrelic/newrelic-client-go#663

NR-App logging issue

After deploying NR-Firehose we noticed that our PCF-Metrics App "metrics-ingestor" started crashing with "Out of memory" errors. Looking at the NR-Firehose application (newrelic-firehose-nozzle-1.1.22) logs the reason seems to be pretty obvious as there is alot of junk data '//////' being logged non-stop which is causing the ingestor to crash. Below is the log message

value:\"sys.cfsb-1.gaig.com\" > logMessage:<message:\"\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" message_type:OUT timestamp:1571687791461137400 app_id:\\\"379c2de7-22ac-4b82-92e5-26ca45bdbeeb\\\" source_type:\\\"APP/PROC/WEB\\\" source_instance:\\\"1\\\" > \" message_type:OUT timestamp:1571687794535489769 app_id:\"379c2de7-22ac-4b82-92e5-26ca45bdbeeb\" source_type:\"APP/PROC/WEB\" source_instance:\"0\" > " message_type:OUT timestamp:1571687798423133705 app_id:"379c2de7-22ac-4b82-92e5-26ca45bdbeeb" source_type:"APP/PROC/WEB" source_instance:"2" > " message_type:OUT timestamp:1571687800953327629 app_id:"379c2de7-22ac-4b82-92e5-26ca45bdbeeb" source_type:"APP/PROC/WEB" source_instance:"0" >
2019-10-21T15:56:45.639-04:00 [APP/PROC/WEB/0] [OUT] \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

I wonder if anyone else has experienced similar issue? Even though the Release Notes for v1.1.21 mentioned the fix, we are seeing similar behavior both in v1.1.21 and v1.1.22. Thoughts?

Monitoring of NewRelic firehose nozzle

How would you go about monitoring the NewRelic firehose nozzle in the event that it crashes or hangs and does not restart? We have had a couple instances where when we lose networking our monitoring doesn't work and we have to go in and restart.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.