The blink from DoSomething

Move all saved searches to new kibana

Production

Blink: 401, renamed to Blink: Debug => HTTP 401
Blink: Broadcast Incoming Receipt, renamed Blink: Twilio SMS Broadcast => Status Callback Webhook
Blink: Broadcast Relay expected skip, renamed Blink: Twilio SMS Broadcast => Expected Skip
Blink: Broadcast Relay Retries, renamed Blink: Twilio SMS Broadcast => Relay Retry
Blink: Broadcast Relay Success, renamed Blink: Twilio SMS Broadcast => Relay Success
Blink: c.io profile updates, renamed Blink: Customer.io => Profile Update => Success
Blink: Campaign Signup Posts -> Customer.io, renamed Blink: Customer.io => Track Campaign Signup Post => Success
Blink: Campaign Signups -> Customer.io, renamed Blink: Customer.io => Track Campaign Signup => Success
Blink: Crashes, renamed to Blink: Debug => Crashes
Blink: Customer.io -> Blink, renamed to Blink: Customer.io => Email Activity Webhook
Blink: Customer.io errors, renamed to Blink: Customer.io => All Errors
Blink: Dropped messages, renamed to Blink: Debug => Retry Manager => Limit Reached, Reject Message
Blink: Errors, renamed to Blink: Debug => All Errors
Blink: Gambit Retries, renamed to Blink: MobileCommons => Gambit Campaigns => Retry
Blink: Gambit successful proxy requests, rebame Blink: MobileCommons => Gambit Campaigns => Success
~~Blink: Missed cio Backfill updates~~
Blink: MoCo message data, renamed to Blink: MobileCommons => Message Data Webhook
Blink: Production, renamed Blink: Debug => All Log Messages
Blink: Validation errors, renamed to Blink: Debug => Validation Errors
Blink: Warnings, renamed to Blink: Debug => All Warnings

Staging

This potentially could be moved by exporting prod searches:

Blink: Staging
Blink: Staging Broadcast Relay Retries
Blink: Staging Broadcast Relay statuses expected to be skipped (non 'delivered').
Blink: Staging Broadcast Relay Success
Blink: Staging Cio Customer Update
Blink: Staging Incoming Broadcast Reciept
Blink: Staging Warnings

Return 202 Created instead of 200 OK

202 makes much more sense:

The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.
https://httpstatuses.com/202

Refactor RabbitManagement to use isomorphic-fetch instead of axios for consistency with gateway-js

In order to keep with "Simplify the toolset" principle, use fetch HTTP client library.

Feature: MoCo webhooks

TBD

Only post inbound Mobile Commons messages to Conversations API

Refs #102, https://dosomething.slack.com/archives/C2C8NLNAY/p1503003860000137

The Conversations API could be responsible for determining if a /receive-message payload is inbound or outbound, but ideally the gambit-message-data-relay should only make a Conversation API request for a Twilio message when the From is not equal to our Twilio number (for staging its +14069981855).

Ensure correct iso to unix time conversion when sending signups to customer.io

See #114.

One last concern is we need to double check that the iso date to timestamps transformation accounts for event timezone.

Rogue sends created_at field in GMT iso date, for example, 2017-09-14T18:27:21+00:00. This is transformed to unix time as 1505413641. Unix time resolves to 09/14/2017 @ 6:43pm (UTC), which seems to be correct.

However, c.io documentation is kind of vague on what timezone is used when segments are being built. Moreover, c.io documentation is down now, so I'll address this issue in another pull request.

MoCo webhook excepts 200 not 202

cc @aaronschachter

Twilio webhook to Conversations POST /receive-message

As discussed in ODev jam today, we need webhooks (one for Staging, another for Production) to set as the inbound request URL of our Twilio messaging services.

cc @rapala61

Chore: consider non-isomorphic fetch

Globals are always a bad idea.

Unexpected validation error in v1.5.0 release

at=warning application=blink env=production code=error_validation_failed request_id=370f9f55-5ded-486e-bcf7-b4f9d9b0d8a5 method=POST host=blink.dosomething.org path=/api/v1/events/user-create fwd=52.1.10.35 protocol=https MessageValidationBlinkError: child "addr_city" fails because ["addr_city" must be a string]

cc @DFurnes

May be related to #129 or #133.

Accept application/json requests in /customerio-sms-broadcast webhook.

Currently, the /customerio-sms-broadcast webhook expects application/x-www-form-urlencoded data from the Customer.io webhook. There is a blocked PR in Convo API that generates the settings in aplication/json format. Blink validations must be updated in order to process these new settings and unblock the PR that introduces these settings in Convo API.

Validate inbound sms Twilio messages

Discuss which fields are required in GamCon /recieve-message.

cc @rapala61 @aaronschachter

Investigate code=H19 desc="Backend connection timeout"

Example request id c736c782-7cec-418f-8047-45b11d238ba1

at=error code=H19 desc="Backend connection timeout"

Feature: message log database

Note to self per conversation with Matt.

Implement catch all queue
Implement catch all queue -> Mongo worker
Possibly dump different message types to different Mongo collections
Wrap messages with metadata, like date and request id to make them indexable and searchable

Create High level Blink Topology Diagram

User Story

As an O'Dev, I want to diagram how Blink works so DS staffers can understand how the app speaks to other apps.

Create generic webhook to send freeform JSON to quasar queue

Steps

Add webhook to blink
Make it post to quasar-customer-io-email-activity for now
Add gateway php integration
Rename and repurpose queues as described in #99

cc @sheyd

Investigate gambit retry on Signup.postDraftReportbackSubmission error:API response is false

CC @aaronschachter @rapala61

This got retried 100 times, then rejected:

at=warning application=blink env=production code=error_gambit_proxy_response_not_200_retry worker=GambitChatbotMdataProxyWorker request_id=fe887469-9c3f-403b-a520-c66a388bd5bb response_status=500 response_status_text="Internal Server Error" {"error":{"code":500,"message":"Signup.postDraftReportbackSubmission error:API response is false."}}

Gambit is sending two messages to a member when they signup through Gambit keyword

Steps to reproduce

Member sends keyword message to GamCam
User gets text message based on gambitSignupMenu template
GamCam creates new signup on Rogue directly
Rogue sends the signup to Blink
Blink sends the signup to GamCon /send-message
GamCon thinks this signup is new and sends another message to the member, this time based on externalSignupMenu template

Desired behavior

Blink should not send signups from Rogue to GamCon that have signup source = 'sms*'.

Rename `mobilecommons_status` to `mobile_status`

@DFurnes renamed mobilecommons_status field in blink Northstar user object transformer: DoSomething/northstar#575

Need to make the corresponding change in Blink.

Recommendation: tune nodejs params on Heroku

Example: https://github.com/DoSomething/gambit/pull/920

Rafa Pacas [< 1 minute ago]
The new flags brought MAX SWAP to 1-2Mb from ~13Mb so it seems to have better tuned V8 memory management.

Adjust phone-related user profile fields

According to DoSomething/northstar#631, DoSomething/northstar#570.

Workers crash when they can't fit all messages in the queue to the memory

Blink wasn't designed with support for queues with lengthy backlogs.
Given the asynchronous nature of Node.js, workers are trying to consume all available messages from their queues. When queue backlog has more messages than possible to fit into the memory available to a worker, worker crashes.

This is important for two reasons:

Longer backlogs are anticipated in Blink with rate limiter in place
Expected backfill operations in most cases would generate queue backlogs longer than normal relay operations would do. For example, customer.io backfill on the screenshot above produced 70K messages in the queue and customer-io-update-customer couldn't fit all them into its memory.

Feature: Rogue webhooks

Create webhooks for signup and reportbacks, with basic data validation and no processing.

Revisit reporting quantity for Phoenix Next campaigns w/ disabled quantity field

Phoenix Next campaigns can have quantity field turned off.

Under the hood Phoenix Next would just set quantity on reportbacks for such campaigns to 1. This may be a problem as we anticipate second report back quantity to include quantity from the first one (per discussion with @sbsmith86).

Example:

Member collected 5 pares of jeans
Reported back a photo and quantity 5
Collected 5 more
Reported back photo and quantity 10

In the case where quantity field turned off, quantity will always be reported as one. Even on the second reportback.

Example:

Member posted a shower songs screenshot
Signup post quantity automatically set to 1
Member posted another one shower songs screenshot
Signup total quantity is still 1

Expected:

Signup post total quantity is 2

To discuss with: @justkika, @ashleybaldwin, @sbsmith86.

Add unit tests for unsubscribed field in customer.io user object

Ref #134

Blink Staging doesn't retry gambit mdatas

Aaron Schachter [17 hours ago]
yeah, i generally haven't seen retries on Blink Staging (https://blink-staging.dosomething.org/api/v1/webhooks/gambit-chatbot-mdata) when testing with the Thor mData

Aaron Schachter [17 hours ago]
i just texted a reportback emoji to test gambit pull #909, which triggered a 500 error, but not seeing retries -- and recall some time recently i wasn't seeing retries upon testing these bugs on thor (edited)

cc @aaronschachter

Blink should not require a `campaign_run_id` in validation for a Rogue post

When sending a new post from Rogue, we were getting 422 errors from Blink when we did not send a campaign_run_id for a new post:

at=warning application=blink env=staging code=error_validation_failed request_id=8f9bf989-bca3-411c-a072-5b43e48c1380 method=POST host=blink-staging.dosomething.org path=/api/v1/events/user-signup-post fwd=54.159.163.56 protocol=https MessageValidationBlinkError: child "campaign_run_id" fails because ["campaign_run_id" is required], message {"data":{"id":392751,"signup_id":8084848,"quantity":"2","why_participated":"why participated","campaign_id":"8888","campaign_run_id":"","northstar_id":"5589c9bb469c6475138b81f0","url":"https://rogue-thor.dosomething.org/images/392751","caption":"first caption","status":"pending","remote_addr":"24.90.89.121","source":"rogue-oauth","created_at":"2017-12-08T17:21:30+00:00","updated_at":"2017-12-08T17:21:30+00:00","deleted_at":null},"meta":{"request_id":"8f9bf989-bca3-411c-a072-5b43e48c1380","retryAttempt":0}}

Phoenix-Next does not have campaign_run_ids so we should not require this. Full Slack thread here.

We are "silently" dropping messages that don't recover within 100 retries (24hrs)

In the last 24 hrs ( Now: 1/11/18 3:30pm EST) we have had 4 requests that have been caught in a retry loop, but went undetected since the alert thresholds we have are not calibrated for exponential backoff, but for linear retry logic.

They went undetected until the debug_retry_manager_limit_reached error is emitted, at which point the request is simply rejected and not re-queued.

Kibana logs search that show the requests

Blink isn't reporting to new Relic

After NodeJs update #88 Blink isn't reporting to New Relic.
https://rpm.newrelic.com/accounts/108038/applications/41804031

Fix Internal Server Error on getting response text from a response that has no content

Error example:

response_status_text="Internal Server Error" {"message":"Cannot read property 'text' of undefined"}

Line responsible:

blink/src/workers/TwilioSmsInboundGambitRelayWorker.js

Line 82 in 5fd0001

const cleanedBody = (await response.text()).replace(/\n/g, '\\n');

Should be refactor to use try/catch.

Relay moco message level data to Gambit Converstation

Notes per discussion with @aaronschachter:

Relay Twillio -> MoCo -> Blink -> Gambit Convers
Talk to @rapala61 when Gambit Convers staging endpoint is ready
Follow the same retry protocol as we have with Gambit Campaigns
Gambit Convers is using different auth header

cc @justkika

Current retry messages logic is conflicting with usage of prefetch_count setting

By its nature, RabbitMQ's prefetch_count setting limits the number of messages sent to a worker at a given time. To release consumed message and receive a new one, a worker should acknowledge consumed message first.

This conflicts with the implementation of "retry" logic. Because messages scheduled for a retry are delayed from being redelivered using JS's setTimeout(), workers acknowledge them only after the timeout.

Thus, when a worker is waiting on more retry messages than prefetch_count setting, a worker will receive no new messages.

Example: prefetch count 3, 5 messages scheduled for a retry, 32 new messages are not processed.

Repurpose quasar-customer-io-email-activity for general use by the data team

Per conversation with @sheyd:

Rename quasar-customer-io-email-activity to data-prime (or quasar-data-prime)
Use RabbitMQ routing in order to get message copies from other Blink queues
Create data-misc (or quasar-data-misc) for data assorted messages, testing and retrying messages
Create webhook endpoint and gateway-php integration for data-misc queue

Users with no email created on cio

Example request id 41d44480-edc7-4fb5-98a4-19ebb075d03f

TypeError: this.shouldSkip is not a function

blink/src/workers/GambitMessageDataRelayWorker.js

Line 29 in d4dd05f

if (this.shouldSkip(message)) {

TypeError: this.shouldSkip is not a function

Investigate why blink doesn't retry sending msgs to Gambit on socket retries

at=warning application=blink code=message_processing_error queue=gambit-chatbot-mdata request_id=fa233496-db1f-4e3b-8b59-e2bfa1f56ed0 FetchError: request to http://ds-mdata-responder.herokuapp.com/v1/chatbot failed, reason: socket hang up

Subscribed_at fields isn't updated on cio

Not subscribed_at field passed to c.io.

MoCo reports Connection failed. Errno::ETIMEDOUT CC: 2. E: ''

From time to time we're seeing the following on Mobile Commons mDatas:

Your Server's Response to Mobile Commons

Connection failed. Errno::ETIMEDOUT CC: 2. E: ''

Processing Time
29.999538898468 seconds

Allow null quantity values

Problem

When rogue submits a post to blink, we get validation errors if the quantity we send over is null.

We allow quantity to be null on posts since we won't always be collecting a quantity when a user report's back on a campaign.

Fix

Update Blinks validation to accept null quantity

Don't require override file when it's not present

Reproduce and fix the following error:

stack: 
   [ 'Error: ENOENT: no such file or directory, lstat \'/Users/rpacas/Desktop/repos/blink/config/env/override-development.js\'',
     '    at Object.fs.lstatSync (fs.js:960:11)',
     '    at Object.<anonymous> (/Users/rpacas/Desktop/repos/blink/config/index.js:16:18)',
     '    at Module._compile (module.js:571:32)',
     '    at Object.Module._extensions..js (module.js:580:10)',
     '    at Module.load (module.js:488:32)',
     '    at tryModuleLoad (module.js:447:12)',
     '    at Function.Module._load (module.js:439:3)',
     '    at Module.require (module.js:498:17)',
     '    at require (internal/module.js:20:19)',
     '    at Object.<anonymous> (/Users/rpacas/Desktop/repos/blink/app.js:19:14)' ]

Post retry count to Gambit

Gambit needs the Blink retry count per each incoming request to fix incoming message analytics.

Refs https://github.com/DoSomething/gambit/issues/897

Add `nextRetryTimestamp` to the message's meta object

Description

Add nextRetryTimestamp as an ISO8601 timestamp to the meta object. This will make it easy to spot the time the messages will be retried in the future from within Redis.

Node security update

[ACTION REQUIRED] Node.js security update on Heroku
The Node.js team has announced that a high severity remote Denial of Service (DoS) Constant Hashtable Seeds vulnerability in Node.js versions 4.x through 8.x has been patched in the following versions:
4.8.4
6.11.1
7.10.1
8.1.4
https://nodejs.org/en/blog/vulnerability/july-2017-security-releases/

Change default campaign signup post type from action to photo

Post-devroundtable discussion decision.

Inspect Gambit custom header to determine retry

Gambit will send a custom header in its error response to indicate whether Blink should retry the failed request.
Blocked by sending request header in https://github.com/DoSomething/gambit/issues/902

Placeholder: Extend Blink messages metadata from external services

Description TODO.
Note: call them attachments?

Make sure blink retries on ETIMEDOUT errors to gambit

May 10th 2017, 05:20:18.741 b6689b63-a5c6-4ac3-9ba7-bd1a8d5d7481 at=warning application=blink code=message_processing_error queue=gambit-chatbot-mdata request_id=b6689b63-a5c6-4ac3-9ba7-bd1a8d5d7481 FetchError: request to http://ds-mdata-responder.herokuapp.com/v1/chatbot failed, reason: connect ETIMEDOUT 23.23.77.117:80

Relay Rogue campaign_signup_post type to cio events to indicate kind of member action

TL;DR

Sergii Tkachenko: So, to summarize: "classic" Phoenix report back item* will become campaign_signup_post with type: "action". And, for example, DACA call will be the same campaign_signup_post, but with type: "call".

Shae Smith: yea that's about right. How we end up actually defining/counting a "reportback" will probably change in the coming months, but that is the idea.

* - Added per Shae's clarification

Long Slack chat version:

[10:21 AM]
Sergii Tkachenko Hey Shae, good morning!

[10:21 AM]
Do you have time for a quick chat about setting reportback event type to cio signup post events?

[10:23 AM]
So the idea is to assume that now all signups posts are reportbacks

[10:24 AM]
so we'll add a type attribute to event arguments and set it to reportback

[10:24 AM]
however, real event type is still stay campaign_signup_post

[10:24 AM]
example

[10:25 AM]
Sergii Tkachenko uploaded this image: Pasted image at 2017-09-22, 10:25 AM
Add Comment

[10:25 AM]
Sergii Tkachenko
activity type (see col 3) will stay the same

[10:25 AM]
and I'll extend actual data attributes with type: "reportback",

[10:26 AM]
do you think this could be issue with current Rogue events implementation?

[10:26 AM]
Shae Smith I see, thank you for the explanation.

[10:26 AM]

So the idea is to assume that now all signups posts are reportbacks

[10:27 AM]
This is what is a little confusing- We tried in Rogue, to not assume that every post is a reportback

[10:27 AM]
In fact, as we look towards multi-action campaigns, we really don't want to assume everything is a reportback

[10:28 AM]
a Rogue Post paralleles what a ReportbackItem was in Phoenix-Ashes

[10:28 AM]
and we say that a user has a "Reportback" if they have at least one of them

[10:28 AM]
So if a user has 5 posts under a signup, they still only have one reportback

[10:30 AM]
If your event classification works for C.io, then that is great, but just want to be clear about how were are defining stuff so that you all can be on the same page before segmenting things.

[10:31 AM]
And some forward thinking- We might be including Post types in the Post model so that we can determine if a Post ties to different actions on a campaign

[10:31 AM]
but that is to come

[10:32 AM]
Sergii Tkachenko Ah, I see. Would it be less confusing to set the typo to reportback_item instead?

[10:32 AM]

And some forward thinking- We might be including Post types in the Post model so that we can determine if a Post ties to different actions on a campaign

[10:32 AM]
yes, that would be perfect solution

[10:32 AM]
Shae Smith
I would rather us also get rid of the "Reportback" terminology

[10:32 AM]
maybe action_item?

[10:33 AM]
or just action

[10:33 AM]
Sergii Tkachenko yes, sound great

[10:33 AM]
I like action

[10:34 AM]
More forward thinking: could you please think of few examples when signup post events are not actions?

[10:34 AM]
Insta post?

[10:35 AM]
To clarify: trying to think what different post types could be on Post model

[10:36 AM]
Shae Smith
sure!

[10:36 AM]
so for example. There is a DACA campaign, and each day we give users new actions to take

[10:37 AM]
a campaign lead would add actions, could be tweets, call your senator, IG posts, etc

[10:37 AM]
and we would ingest those actions (somehow lol) as Posts (we could rethink this terminolgy in Rogue)

[10:39 AM]
So a Post object could look something like this

[10:46 AM]
Sergii Tkachenko Great example! Thanks.

[10:47 AM]
Shae Smith
sorry my computer is being weird

[10:47 AM]
but the idea is we want to leave it open so that a campaign lead could give users multple things to do

[10:47 AM]

{
   id: "1",
   signup_id: "5970",
   campaign_id: "1603",
   northstar_id: "laksjdlfjadslf", 
   url: "http://www.url.com", // could be a link to a playlist, youtube video, image
   status: "pending",
   source: "phoenix-web",
   remote_addr: ""
   action_bucket: 1 // Each day would be a new "bucket", One campaign could have multiple buckets and each bucket could have multiple actions we ask the user to do. 
   action_type: "call", // The actual type of action "tweet", "call", "IG",etc 
   action_completes: true, // If the action completes the bucket or not. 
}

(edited)

[10:48 AM]
none of the teminolgy has been solidified, and I don't even know if all of this is going to happen

[10:48 AM]
but that is the idea

[10:48 AM]
But i think considering each post and "action" for now is safe

[10:49 AM]
Sergii Tkachenko So to summarize: "classic" Phoenix report back will become campaign_signup_post with type: "action". And, for example, DACA call will be the same campaign_signup_post, but with type: "call". (edited)

[10:49 AM]
Sounds about right?

[10:51 AM]
Shae Smith yea that's about right. How we end up actually defining/counting a "reportback" will probably change in the coming months, but that is the idea.

Log full request message when it's invalid

To analyze what went wrong.

No birthday field on customer.io for niche users

Co-reg users from Niche don't have birthday field in customer.io.
Pivotal: https://www.pivotaltracker.com/story/show/145932733

dosomething / blink Goto Github PK

blink's Introduction

DoSomething.org (Legacy Website) 🔥

License

blink's People

Contributors

Stargazers

Watchers

Forkers

blink's Issues