Coder Social home page Coder Social logo

scremsong's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar github-actions[bot] avatar keithamoss avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

scremsong's Issues

View: Triage

This view allows a person to triage the stream of tweets, photos, et cetera by performing a few key actions:

  • Assign to $user (with queue size shown next to their name)
  • Take action: e.g. Favourite, Retweet
  • Mark as done/Ignore (Fade to grey)

The view is a collection of columns representing a stream of content (Twitter mentions, Twitter searches, Instagram searches). One column represents one API call.

Content should be updated in near-real time.

Web Sockets

Are web sockets still the blessed way of pushing data to sync up state in multiple remote clients? Where do service workers come in?

Right now we're hackily pulling data back via window.setInterval and it's all a bit awful. Before we go too much further down that path let's investigate the blessed way to push data to clients.

We'll need to make it work for:

Triagers

  • A reviewer coming online / going offline
  • A reviewer changing the status of a work assignment
  • New/updated tweets arriving
  • Updated tweets arriving (e.g. Deleted tweets)

@todo

  • Do we need to include RV's css? (ref)
  • Handle individual new tweets arriving
  • Handle individual backfilled tweets arriving
  • Handle intelligently discarding received tweet objects if we have enough to show a column (i.e. 20/column)
  • Handle receiving the tweets called for by loadMoreRows (these always need to be stored)
  • Handle intelligently discarding tweet objects that are "too far away" from the current view of all columns it appears in
  • Handle user-toggleable filters for "Show me completed tweets" and "Show me discarded tweets" and how that impacts column totals
  • How do we handle new tweets that arrive while the user is disconnected?
  • Handle maintaining position and scrolling to tweets. Ideally, saving it between page loads.
  • Refer to and cleanup #18
  • Fix the issue with tweet media

Reviewers

  • Being assigned a new task
  • Having a task unassigned from you
  • Updated tweets arriving (e.g. Deleted tweets)
  • Going offline
  • Notifications (e.g. of being assigned a new task)

Application

  • How will we handle the backend server going down or being switched out? (WebSocket connection being dropped.)
  • Need to handle onChangeQueueUser() in UserReviewQueueViewContainer to fetch assignments for the new user
  • Stop any old person signing up

Resources

Refactoring

  • Break app.tsx up into its respective modules (re-ducks and redux-typed-modules and Strongly typed Redux modules made easy!)
  • Cleanup @ts-ignore junk in app.tsx
  • Use selectors
  • Types for all ducks
  • Types for all components
  • Assignments should be indexed by assignment_id for easier updating in future
  • We shouldn't be modifying tweets with reviewer_id and review_status
  • Don't use getState()
  • Use Serializers on the backend
  • Add Python linting

Queue view enhancements

  • Show the oldest unread recent tweet in each assignment
  • Add thread action: "Expand Thread" ("Collapse Thread" when open)
  • Collapsing a thread updates the "last read" timestamp
  • Only one thread may be open at a time
  • Tweets in the thread since the "last read" timestamp are shaded blue/have a chonky blue tab on the side. This also applies to the tweet shown in the collapsed view.
  • In the collapsed view there's notice of + X more new tweets below/next to the tweet
  • Move action buttons to the right of the tweet in collapsed view. Still below in expanded view.
  • [Later] Liking or RTing or Replying to a tweet from the collapsed view updates the timestamp to the same as / just after the one you replied to. If there are more replies the tweet you see updates and the number of new tweets also in there decreases by one.
  • Sort by unread timestamp

Misc

Simple tweet actions

Link from each tweet to its twitter.com reply/favourite/retweet page on:

  • The triage view
  • The queue view

We'll fallback to modals using web intents.

c.f. #40

Consume streaming tweets from Twitter

Tweepy looks good.

We'll need to handle:

  • Disconnecting if we get a 402 (too many connection attempts in a short period of time) and somehow alerting us.
  • Store the Twitter API authorization tokens in the database.
  • Automatically starting a stream if there's an election on (so that we can restart the app and have things Just Work).
  • Stopping an old stream if a new one has just connected. Or, not starting a new stream until the old one has disconnected).
  • A means to start/stop streaming.
  • Not writing duplicate tweets (if two apps are running at once).
  • Tweet compliance
  • Fill in gaps

Required database tables:

  • A table to hold credentials
  • A table to hold the uuid of the current 'master' application (used for CD and start/stop)
  • A table to hold any errors (clearable by admins) or logs (e.g. LIMIT notices)
  • A table to hold tweets

Documentation:
Streaming With Tweepy
Consuming streaming data
Streaming message types
Filter realtime Tweets
Standard streaming API request parameters

Some potentially useful results for the future:
Twitter Stream re-connect best practices (For python-twitter, not Tweepy - but the principles are the same.)
Tweepy. Make stream run forever
Twitter Streaming API limits?

Optimisations

  • Use a shouldComponentUpdate OR a tweet selector on UserReviewQueueViewContainer and TweetColumnContainer so that they only update when the assignments/column change (which hold the ids of the tweets), not when any tweet object changes
  • Ensure heavy parts of the UI (e.g. triage view) only update when absolutely necessary
  • Reduce what gets passed down to TweetColumn as much as possible (e.g. Does it actually need the details of assignments or just that there's a tweet_assignment?
  • Take a quick pass over TweetColumn and look for performance improvements
  • Does switching to queue and back to columns maintain scroll position in each column?
  • How does triage actually know when new tweets are being added above your position in the list?
  • Can we intelligently discard tweets for users not doing triage (i.e. They only need to have tweets stored locally that are assigned to them or that are enough to load TweetColumn)
  • Maybe we don't need to send users new tweets and store them unless they're on triage view and the tweets are within X of their current view?
  • Maybe we can only send tweet data if it's part of an assignment. For triage just send a new tweet total.
  • Does loadMoreRows work for future tweets (scrolling up)? How does it now that tweets are being added at the top and not elsewhere?

Add a user option to not show completed/discarded tweeets on the triage view

To me this feels like my preferred mode of triage, but others want to see everything.

As a user I should be able to toggle between seeing all tweets and only see tweets that haven't been:

  • Assigned
  • Processed (DONE)
  • Discarded

Some considerations:

  • In light of #28 and #26 we'll need to think how we buffer updates to tweets that would cause them to disappear.
  • How we'll handle restoring tweet position if the tweet at the top of the column is one of the ones being removed (We could walk the tweet_ids array back up to find the closest one maybe?)

This should be part of a new settings screen and allow for fine grained control for that user. i.e. Checkboxes for seeing Dismissed, Assigned, Done, Closed.

Edit: Decision made not to implement this - would cause more problems (e.g. needing to undo, making mistakes) than it's worth.

Show placeholder tweets if we don't have the tweet data client-side yet

Right now we rely entirely on loadMoreRows from react-virtualized to handle loading tweets. React-Virtualized has the option to show "placeholders" - for example, as the user is scrolling or as loadMoreRows is loading data. As we move towards issues like #26 we'll need to potentially sync a large number of tweets in.

Let's investigate this further and see how we can use it to improve the UX of seeing at least something for the tweet placeholders rather than blank spaces.We could avoid the need to have all tweet objects client-side if we instead showed placeholder tweets in the columns as the user is scrolling (up or down), fire events up to the store, debounce/wait those events OR wait until the user stops scrolling, then fetch them as a batch, shove them into the store, and replace the placeholder tweets with the actual tweets (and recalculate the cell heights).

This would handily remove issues around having to send potentially huge payloads when the user is scrolled a long way down the column and we're trying to send them all tweets between now and their current position.

Relates to #26.

Custom tweet actions

In addition to the standard set of tweet actions, our Scremsong actions (assign to user, pre-canned reply, mark as read) we'd may like to have further custom actions in the future to:

  • Highlight photo or Add photo to the Democracy Sausage site
  • Email share the tweet (why?)

Add a page to see our current Twitter API consumption state

This is required if we can't get an increased rate limit per #21. Refer to #6.

Just a visual at this stage. Call the API directly.

Actually, given the rate limit endpoint itself is rate limited we might want to cache it locally too. Then we can make use of it in stuff like threading where we want to start backing off or changing behaviour based on our API usage. Think about what our fallback position is for "we've been rate limited in this 15 minute period and the queue of tweets to process is building up" (e.g. In that "mode" we can assign sub-tweets to people as individual tweets that have no thread relationship).

Actually, we should be refreshing and logging our rate limits pretty frequently so we can do post-election analysis of how close we came. This would be paired with logging from the functions that use the API to give us a clear picture of how we're using the API and help inform future enhancements/changes in application behaviour.

Future enhancements:

  • Logging this data at frequent intervals to see how close we ride to the edge.
  • Using this data to queue up and execute replies/favourites/retweets rather than executing them directly. This would let us, for example, prioritise certain actions (e.g. replies) over others if we're close to the rate limit.

Material-UI v1 Migration

  • Get @material-ui core installed and working
  • Transfer existing components to @material-ui core
  • classes: any
  • Convert our theme over to the new themeing system
  • Remove styled-components and material-ui-responsive-drawer
  • Remove old deps

Setup an API endpoint to dispatch fake tweets for easier testing

Manually dispatching tweets is tiresome.

Setup an API endpoint that takes the following options:

  • Number of tweets to send
  • Number of seconds to send them in

It will then:

  • Pick a random tweet from the data store
  • Find the highest tweet_id in the data store and add n
  • Send a web socket event for a new fake tweet

PHP API layer for Twitter et cetera

  • API layer in front of Twitter to handle caching to stop us running into their rate limits
  • scremsong_social_media_cache table to cache the results of appropriate API calls
    • Id
    • Platform (e.g. Twitter, Instagram, ...)
    • Request Type (e.g. GET, POST)
    • Endpoint (e.g. /searches/tweets)
    • Parameters (URL encoded parameter string)
    • Last Requested (Timestamp)
    • Result (BLOB)
    • TTL (Timestamp)
    • In Progress (Boolean)
    • Pruning Strategy (Age, # of Items)
    • Pruning Threshold (Per Above)
  • Constants (define in soft config - updatable without a deploy?)
    • TTL Base
    • TTL Increment
    • TTL Max
  • TTL timestamp will increase using a back-off approach if we receive no new data.
  • When a request comes in we'll check the cache table to see if we can use the cached result. If not, we'll update the table and fire off a request. If a request is already in progress we'll sleep and wait a few times and then fail back to the client.
  • Rate limits will be passed back to the client to handle appropriately (e.g. show a message, use a modal with a web intent)
  • We assume one column = one API call

Fork and fix react-tweet's issues

  • marginTop error
  • Media display issues with tweet height (test in isolation first to make sure it's not our use of columns)
  • Appears to be modifying props (entities like hashtags [adding hashtag with the same value as text]) on rendering each card
  • Support passing in action listeners for reply, retweet, favourite
  • Support passing in states for the reply, retweet, and favourite icons
  • Support passing in overriding styles for the root div like opacity
  • Make tweet objects a PureComponent / implement shouldComponentUpdate for specific tweet props that are known to change. I think right now they're being re-rendered each time the social.tweets store is changed.
  • See if passing the media onHeightUpdate thing fixes the height measurement issues in columns with media (e.g. column 6)
  • Add https upgrade option

Setup deploy pipeline and DigitalOcean infrastructure

Use the Docker container approach we took with Ealgis. Only rebuild and redeploying when there's a new version minted.

  • Handle the warnings from Redis and RabbitMQ
  • Decide on our approach to deploying new versions: In place on the existing server or a complete rebuild.
  • Decide where Scremsong will live - alongside DemSausage or on a separate server?
  • Setup the task queue containers - alongside the database server or separate
  • Setup Database backups
  • Setup log exfiltration for all of the above (if taking the nuke and rebuild approach)
  • What's Django best practice for populating initial database state?

Tweet placeholders

Right now we rely entirely on loadMoreRows from react-virtualized to handle loading tweets. React-Virtualized has the option to show "placeholders" - for example, as the user is scrolling or as loadMoreRows is loading data.

Let's investigate this further and see how we can use it to improve the UX of seeing at least something for the tweet placeholders rather than blank spaces.

Keyboard shortcuts for power users

A neat feature for the triage view for really busy elections would be keyboard shortcuts to:

  • Move up and down tweets in a column
  • Move left and right between columns
  • Assign a tweet (a) to a specific user (1, 2, ... n) or unassign it (u)
  • Discard a tweet (d)
  • Pull new tweets into a column (r)

This should help a lot with really high volume elections.

Implementation will be interesting since we'll need to pass messages down to columns ("The currently selected tweet is id blah - style that as selected) and have action fire off that listen to keyboard events. It feels like wrapping the triage view in a new KeyboardShortcuts component, or having it sit in TriageView (like TweetColumnAssigner), might be the easiest way to go.

Comment: Not all triagers would use this, and those that would may not have enough time to develop and reinforce the muscle memory that makes it useful.

Handle resyncing state if the user is disconnected and reconnects

We already send back assignments, users, et cetera onconnect, so I think we just need to handle sending back new tweets that have arrived since the user was disconnected?

We could do this as part of an onreconnect action called in onconnect if app.disconnected was false - we'd send the highest tweetId we knew about and the backend could just send back everything that's happened since.

  • Is there anything else we send that's not already sent in onconnect?
  • How to handle really out-of-date clients. Force a refresh?

We might need to break the onconnect data sending out into a separate resync process. That could look something like:

  • first connect -> client sends back "not yet connected" -> server sends back all of the initial state they need
  • disconnected -> connects -> client sends back "I was connected and here's the last tweet I know about and some other stuff" -> server sends back only what the client needs to fill in the gaps.

Improve how we handle assigning tweets to columns based on their content

See also #24

  • Our approach to tweet matching seems to assume that every part of every tweet phrase must match. Is that correct?
  • Also, it matches parts of words e.g. "democracy sausage" column matches #democracysausage
  • Longer term we should rethink how we do tweet matching to use proper full text search rather than our hacked together approach. Maybe we do it on receiving a tweet and then remove the need altogether for doing it in the database and in Python?

Index page occasionally points at stale assets

Wiping cloud front removed the old js files, but then people get a blank page until they refresh again. It just sends the index.html back for requests for those assets.

Do we have no-cache on index.html so it doesn't serve up did files?

Hacky workaround: Some code embedded in index.html that refreshes the page if assets aren't loaded.

API usage and logging

Thanks to the strict rate limiting on Twitter we may need to have an API usage view for admins to examine the prettified output of rate-limit-status. We may also need log requests so we have the underlying data required to analyse and tweak our caching strategies.

Canned responses

When replying to a post it should be possible to select from a set of pre-canned responses that will pre-fill the reply box. Users will still be able (and encouraged) to edit them, and they won't post automatically.

Users should be able to define their own additional pre-canned responses.

Improving tweet threads

Ideas for improving how we collect, use, and display tweet threads.

  • Tweet threads only include tweets that the original user is included in. How does Twitter itself handle tweet threads where the original user is taken out of the thread?

View: Regular User

The view for regular users exists to process tasks that have been assigned to them - e.g. tweets to reply to.

Tasks will be displayed as native content (tweets, grams) with relevant actions: reply, quote + retweet. Once an action is taken the user should be prompted to "mark as done". Tasks will be able to be marked as done separately (to allow for "no action required" scenarios).

Replying will happen in-line if we're under the rate limit for this 15 minute block. If not, and the backend returns a rate limit error, we'll fallback to modals using web intents.

Threads will need to be updated to include replies. Here is one approach. To conserve rate limits we may have to require to user to take an action to see updates - e.g. Clicking refresh, only seeing one assigned tweet at a time.

Notifications of new content should be displayed to the user in some fashion e.g.

  • As a badge within the application
  • Within the application's title
  • As a desktop notification

Any user can switch to the view of any other user and 'act' as them to action their tweets. (e.g. Makes clearing queues easier if they go out to vote.)

Users should be able to "go offline" indicating they're not currently able to process tweets. This flag that next to their name in the assignment list.

If possible (within rate limits) replies should also be updated in near-real time and shown. If not possible, we'll have to re-think the UI and UX.

Reading:

Think about our approach to columns for the future

The idea for the future is to allow users to:

  • Add their own columns with custom search terms (e.g. for specific elections). (This necessitates a restart of the tweet streaming task.)
  • Remove/hide columns they don't want to see

Right now the backend is handling all of the:

  • Column logic ("Here are the columns for this user" - though right now that's just all columns), and
  • Tweet to Column logic ("This tweet has the keywords for Columns 1, 5, and 7")

We had an idea about moving this logic all to the frontend, so the backend just sends all of the information about columns and any tweets and the frontend:

  • Decides what columns a tweet goes in
  • Decides whether to store or discard a tweet

Handling this client side might look like adding some selectors in between receiving the action from the web socket and dispatching the action. Selectors seem like a better fit because reducers should just be simply modifying the store, not making decisions about what goes in the store.

Considerations:

  • get_social_columns_cached() in twitter_streaming.py will need to be handled. Not sure how that sort of cross-process clearing would even work given Django is separate to Celery.

Support tweet threads on the assignments screen

Tweet threads are hard - there's no API concept for them. We're getting replies to our replies streamed in by virtue of including @DemSausage as one of our search terms - BUT Twitter's API has removed all of the streaming endpoints for tweets from a given account. As such, we can't get our replies to people and this stops us following the tweet chain down from the assigned tweet through to all replies.

Threads will need to be updated to include replies. Here is one approach. To conserve rate limits we may have to require to user to take an action to see updates - e.g. Clicking refresh, only seeing one assigned tweet at a time.

Approach

  • On receiving a tweet walk back up the chain using in_reply_to_id (from local, from remote) to find the ultimate parent id. If parent is not parent of an assignment, save all the tweets we collected, and issue the standard new tweet event. If parent is part of an assignment, pass all tweets we collected to the refresh_assignment logic (which saves all tweets - including the new one, updates the assignment, and returns an updated assignment)
  • To refresh an assignment we'll issue requests to find all replies to all users in the thread for the lowest possible tweetId (depending on when they came into the conversation). This will uncover "hidden" threads that we didn't receive via streaming. We'll get those results and use that to build/refresh the relationships, we'll save al of those tweets, update the assignment, and return the updated assignment.
  • Background refreshing of assignments is a nice to have and will help us capture sub-threads that we aren't included on/getting via streaming that are useful (e.g. Someone replying to a tweet to say they also found sausage). This could also happen as a result of a user interaction like "Check for updates on this assignment/my assignments".
  • A refreshed assignment should throw a notification for users. It may also want to update a timestamp so we can sort on the queue by either creation date or update date.

Backend Implementation

  • When tweets are backfilled we resolve all of these relationships en masse after the tweets are collected, and then we save the whole lot together in one transaction. We then send the relevant events - decide if this happens en masse or individually.
  • resolve_tweet_parents(tweetId) Finds all of the parent tweet objects (from local, from remote) for a given tweet. Returns tweets[].
  • resolve_tweet_children(tweetId) Given a parent tweet (a tweet that has no in_reply_to) find all children of all tweets (except to @DemSausage). Returns tweets[].
  • build_relationship(tweets[]) Given a set of tweets that represent a complete relationship build a new relationship object (per below).
  • create_assignment(tweets[]) Given a set of tweets that represent a complete relationship call build_relationship() and stuff save a new assignment row. Always saves new tweet objects. Sets a created_on and last_updated_on date.
  • update_assignment(assignmentId) Given an assignment refresh all of its tweets via resolve_tweet_children() and, if needed, update the assignment in the database. If an assignment was marked as DONE, change it back to PENDING. Always saves new tweet objects. Updates thelast_updated_on date.
  • On receiving a new tweet. If it doesn't have in_reply_to, current logic applies. If it does have an in_reply_to pass the tweet object to the new celery queue and do nothing.
  • Celery queue for saving threaded tweets given a tweet object: If the tweet is part of an assignment already, do nothing but saving the tweet (handles tweets arriving or being processed out of order). If it's not part of an assignment, call resolve_tweet_parents(). If parent is NOT part of an assignment, save the tweets we found and issue a NEW_TWEET event for the original tweet. If parent IS part of an assignment, call resolve_tweet_children() to refresh the thread, pass it to update_assignment(), and then issue NEW_TWEET, UPDATED_ASSIGNMENT, and if necessary a COMPLETED_ASSIGNMENT_WAS_UPDATED events that send all of the new tweet objects along. Show the assigned user a notification of one of their assignments being updated and highlight that on the queue UI in some fashion and show the tweet being as already part of an assignment in the triage UI.
  • On assigning a tweet. Call resolve_tweet_parents() if necessary - pass that or the tweet itself to resolve_tweet_children(), pass that to create_assignment(), and then issue NEW_ASSIGNMENT events that send all of the new tweet objects along to update the queue and triage UIs.
  • Functions that use API calls will log how many they used (directly or via Tweepy?) for post-election analysis. Part of #22. Logging should give us a chain of logs to see what specific parts of tweet handling cost in API calls (new tweets, backfilling, assigning).
  • Use a dirty flag to distinguish tweets we saved, but couldn't resolve stuff for and thus didn't send to the clients.
  • If we distinguish the source of a tweet (for use with sinceId/maxId logic) should we start caching tweets from users to limit the number of API calls used in get_tweets_from_user_since_tweet_from_api()?
  • Use limits around the API getters to prevent consuming a lot of API calls for non-@DemSausage users if an old tweet is replied to
  • Think about how tweet flow in: If most tweets are part of an assignment that's API calls up and down. Even if they're not, replies are always resolved up. How's that going to play out?
  • We can reduce the need for replies from us to be requested by an API call if we cache replies sent from within Scremsong, but can't rely on all replies going out from within Scremsong. What's the app rate limit on posting statuses?
  • Allow only sending notifications for a particular set of userIds
  • flake8

UI Implementation

  • [Queue] Display tweet threads based on the relationships data sorted chronologically.
  • [Queue] Show tweet action buttons for all tweets in the thread.
  • [Queue] Visually show when an assignment that you closed has been set back to pending by an update.
  • [Queue] Allow the user to sort by the creation date or update date of their assignments.
  • [Queue] Allow the user to only show recently updated or "unread" assignments.
  • [Queue] Visually show the user when an assigned thread has been updated (e.g. highlighting the thread and individual tweets as "unread" until the user takes an action).
  • [Triage] All tweets in a thread that's assigned to someone will show as assigned.
  • [Notifications] When an assignment is updated.
  • [Notifications] When an assignment that you marked as done has been updated.

Thoughts

  • What effect will bouncing and replacing the Django server have on the Celery queue? I think the queue remains unaffected (it's on the database droplet), but what happens to any tasks in the queue that are currently being run.
  • Maybe we need to do https://github.com/keithamoss/scremsong/issues/44 sooner since we're using Celery queues more now?
  • With locally caching replies we're not accounting for later tweets arriving by other means and bumping the sinceId higher i.e. leaving gaps between the last lot we cached and the new tweet. This isn't an issue as long as we're only caching things from/to @DemSausage (I think?), but would be if we're doing it for other accounts. Have a think about this.
  • Perhaps we need to track the source of tweets if we're relying on sinceId/maxId? e.g. From stream, From backfill, From resolving parents/children.
  • Don't send new tweet notifications until after assignment events
  • For backfill: Send new tweet events as a single event after all of the assignment resolution/notifications.
  • For backfill: Make en masse assigning work backwards from the highest id to ensure maximum use of the local db.
  • Locked accounts replying to tweets break our thread fetching implementation
  • Walk through all of the scenarios of tweets arriving and think about the API usage. Are we going to burn a lot on replies?
  • Rebuild Celery container
  • Document the logic of how tweets flow in and handling logic in the Enhancements issue

Notes

Edit: Ooh, a more elegant approach to the data fetching would be to have tweet streaming take care of filling in missing in-reply-to tweets before it writes the received tweet to the database. Then we just need to handle the UI side (which can be separate Tweet components with some of our own CSS applied).

Do a backend POC of the structure for one thread, wire it up to the frontend loosely, then build the frontend around that and make sure it works end-to-end. Then we can build out the rest of the backend logic (Celery queues, et cetera).

  • Create a never ending celery task to ping statuses/user_timeline for all tweets after since_id.
  • Store tweets in the usual tweets table and think of a way to easily grab the highest for comparison with since_id. Don't show tweets from us in the triage columns.
  • Update the assignments API to return an array of tweet ids for each assignment, not a single id. This array should be a chronologically sorted list of tweets.
  • The assignments API should also return all assignments chronologically sorted by the most recent tweet for each assignment (so new stuff gets bumped to the top)
  • The changes to the assignments API and GUI should accommodate showing notifications to the user like "You're assigned tweets have some new replies".
  • The assignments GUI can then just display separate <Tweet /> elements for each without worrying about doing fancy threading.

Further thinking

  • We'll need to handle tweets being assigned that are not the first tweet in a thread. e.g. Someone tweets about sausage (but has none of our search terms), someone responds (with our search terms), we assign that second tweet, but don't have that first one. Gotta fill in those gaps.
  • A tweet can only be assigned if its parent or child tweets are unassigned
  • When a tweet is assigned all of its parent and child tweets are part of the same assignment
  • When a tweet is received from the stream we resolve if its part of an assignment
  • How is the one-to-many relationship of assignments-to-tweets going to work? Right now it's built around a one-to-one relationship and deleting an assignment reflects that.
  • Do we need to handle tweets coming in out of order or being processed out of order? Maybe we need to walk the thread chain in both directions to be sure?
  • We're going to have to be careful about when stuff gets saved to the database. When a client connects/resyncs we don't want the tweet being in there, but the thread info isn't yet, so they only get partial information. But would that only be for a little window until we send thread info + assignment info? In any event, it may be good to wrap the tweet, thread, and assignment update logic up in a transaction.
  • Do we need to always maintain tweet relationships or do we ONLY build it when a tweet is assigned to someone?
  • We'll have to deal with backfilling tweets with parents happening in the right order so that we've resolved parents of younger tweets before their children are added so that the whole thread chain exists.
  • Data Structure 1: Django table for tweet threads with one row per tweet per thread with parent_id, thread_id, and tweet_id
  • Data Structure 2: Django table for tweet threads with one row per thread with parent_id, thread_id and thread_data (JSON field with tweet_ids: string[] and relationships: <some json>
  • Data Structures: It's a question of (1) Easier in Django and indexable vs (2) Fewer rows, existing JSON data structure to use on the frontend, no Django convenience methods, and MAYBE not easily indexable (but we can deal with that by maintaining tweet_ids as a flat array of all tweets in the relationship that we CAN index.

Relationships example:

[
    "1",
    "2",
    {
        "tweet_id": 3,
        "children": [
            "10",
            {
                "tweet_id": 11,
                "children": ["20", "21"]
            },
            "12"
        ]
    },
    "4"
]

Reading

Users disconnecting and going offline

Users can opt to stop receiving assignments (going inactive), but we're going to need to handle users going offline and disconnecting. This can happen by:

  • Their own choice (e.g. closes browser tab)
  • At the application level (e.g. we deploy a new build and a new server comes up)
  • Further upstream (e.g. their internet goes down)

We don't want to spam folks with "User is offline/online" notices, so have a think about how to approach this. Off the top of my head:

  • When the user connects to the web socket server update a timestamp on their profile
  • When the user disconnects/closes the web socket connection log a celery task with a delayed start that will set the user as offline and send a notice if it wakes up and finds that the user hasn't reconnected since the disconnection/offline that triggered the task

Discussion from user testing:

  • Being disconnected (in a networking sense) = always show as offline
  • On reconnecting / reloading, if offline show prompt. (Only on the queue view or everywhere?)
  • Give them 30s grace to allow them to reconnect (e.g. if they close the tab by mistake)
  • When reconnecting we'll need to ensure all appropriate state is resynced. I think we're good for everything except the new column state for triage? Think about this to double check :)

Contact Twitter about increased rate limits

We got rate limited once during the Victorian election, so we'll definitely have issues with bigger elections.

  • What premium APIs would be useful? (e.g. Streaming favourites)
  • What specific rate limits would we like increases on?
  • $ of premium tier?

Columns and decks of streams

Columns will display on the Triage view in the TweetDeck style. They'll be built from a set of pre-configured searches.

Individual users will be able to add their own columns based on custom searches and save them as a custom deck. Users will be able to load decks.

https://developer.twitter.com/en/docs/tweets/filter-realtime/api-reference/post-statuses-filter.html
https://developer.twitter.com/en/docs/tweets/filter-realtime/guides/basic-stream-parameters#track
https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets

Tweepy not getting full text
Tweet updates

Thoughts and feedback from H:

  • Hide/Collapse: Cross icon. Remove from triage (with global toggle for show?)
  • Assign: Show in triage w/ person's name/initials and with a prominent icon. When dealt with give it a green for done background by the reviewer, remove from triage.
  • Like and RT: Same green background as for mark as complete.
  • One day, navigation and all actions via keyboard shortcuts for added speed.
  • Allow for multiple simultaneous triagers

Save the position of the users tweet columns between switching pages and page reloads

When the user reloads the page (for whatever reason - technical issues, memory issues) we want them to come back to the same position they were at in each column.

Internally within each column we already track the tweet that's at the top of the column as the user scrolls, and then maintain that position when new tweets are added to the top.

This would involve extending that and:

  • Dispatching actions that send HTTP/WS updates back the user's profile
  • Using those props to determine the tweets to send back onconnect so the user has right slice of the tweet store for each column
  • Feeding those props back to the columns on componentDidMount to set them up on load

A few considerations:

  • How should we handle really old tweets? (e.g. Hundreds or thousands back in history). Talk to H about desired behaviour.
  • How should we handle the tweet disppearing (e.g. Changing to the "Don't show completed tweets" setting)
  • Should saving position also save the new tweet marker? (i.e. the top tweet they can see and the top tweet they have loaded)

Ideas/Thoughts

  • Will cdU's scrollToTweet end up in a race condition, or getting in the way of, cDM's logic that updates settings? That might be the cause of some of the weird behaviour.
  • Can we stop the initial onload call to update settings from cDM?
  • Sanity check the inputs to onPositionUpdate to prevent invalid input
  • Make the column placeholder states nicer
  • Fix the offset by one issue that may be due to mismeasuring heights
  • Test with a user with no settings
  • Test with columns with no tweets

Relates to #43

Deploy to PROD

  • Use EALGIS-style deploy approach
  • Deploy RabbitMQ on db-stack

Refactor tweet thread resolution code

Ref. #15

  • Refactor final code as a class(es) and document what functions raise ScremsongExceptions so that the calling code knows to handle them (e.g. Marking tweets as dirty)
  • Refactor all of the general getter functions
  • Better distinguish between statuses (i.e. tweet data) and tweets (i.e. Django tweet objects)
  • Avoid pointless database calls because we've stripped a Tweet object back to a Status lower down, and then turned it back into a Tweet object before returning it.
  • Setup a test suite that does local (mock) and actual remote testing of all pathways. (If this works for the next couple of elections.)
  • Tweet threads only include tweets that the original user is included in. How does Twitter itself handle tweet threads where the original user is taken out of the thread?

Discard tweets far away from the current scroll position

Right now we store all tweets we receive during a session and never discard any. This has implications for memory usage during a really busy election.

This can include:

  • Discarding tweets as the users scrolls away from them
  • Discarding/ignoring new tweets that arrive but that are far away from our current scroll position
  • Maybe we can only send tweet data if it's part of an assignment. For triage just send a new tweet total/something that tells the column there are new tweets then, when the user requests new tweets, we go and grab them based in sinceId?

Important: think discarding tweets above out current scroll position in a column breaks our assumptions around react-virtualized because we never get a loadMoreRows call for them and so never get a chance to "get them back" if a user scrolls up and brings them into view.

One approach to this is to:

  • Have each column report via onRowsRendered their positions (top + bottom + overscan)
  • Run a task higher up at a set interval to look across all columns and work out which common set of tweets are too far away from the current scroll position of all columns such that they can be removed (Potentially using requestAnimationFrame)

We'd need to consider:

  • How discarding tweets, and thus changing column sizes, will impact react-virtualized behaviour
  • Some other stuff once we think about this more deeply :)
  • How we handle discarding tweets for users only processing queues - they may never interact with the triage view!
  • Maybe we don't need to send users new tweets and store them unless they're on triage view and the tweets are within X of their current view?
  • Do we throw away new tweets that are in the buffered queue for a column - I think that breaks our assumptions around react-virtualized.

Advanced tweet actions

c.f. #30

Tasks will be displayed as native content (tweets, grams) with relevant actions: reply, quote + retweet. Once an action is taken the user should be prompted to "mark as done".

Replying will happen in-line if we're under the rate limit for this 15 minute block. If not, and the backend returns a rate limit error, we'll fallback to modals using web intents.

If we can, let's cache our replies in the database to avoid extra API calls to show.status. (c.f. #15)

Switch to Celery

Let's use Celery, not our hacked together queueing solution.

  • Deploy django-celery with RabbitMQ
  • Have a single Celery task start-up with the stack, stop the previous task, and begin streaming
  • Deal with the edge case of two workers streaming at once and log a warning (duplicate pkeys)
  • Reconnect on getting a 420
  • How can frontend know if we're running? Inspect the task queue for running tasks?
  • What causes the Connection reset by peer errors?
  • Why is guest still showing up in the logs?
  • Filling in missing tweets doesn't seem to entirely work
  • Move DB logging stuff to logger?

Twitter API Documentation

Standard search API
Using the standard search endpoint
Standard streaming API request parameters
Streaming tweets: Reconnecting best practice

Refactor TweetColumnAssigner

  • So that it internally manages the state of the dialog. The three components using it should only need to pass event hooks in.
  • Streamline the rendering code - we're not DRY
  • Can we show user pictures from python-social-auth?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.