Coder Social home page Coder Social logo

tracking-exposed / facebook Goto Github PK

View Code? Open in Web Editor NEW
110.0 110.0 46.0 16.02 MB

facebook.tracking.exposed - collaborative tool for algorithm investigation

Home Page: https://facebook.tracking.exposed

License: GNU Affero General Public License v3.0

JavaScript 99.66% Shell 0.22% Dockerfile 0.12%

facebook's Introduction

Tracking Exposed

Tracking Exposed

Synopsis

License: GPL v3 Build Status Coverage Status

Tracking Exposed enables academic research and analysis on the impact of algorithms.

  • Users that want data about their own filter bubble.
  • Researchers collecting data with control groups in Facebook.
  • Journalists interested in echo chambers and algorithm personalization.

Packages

Core Packages

Package Version Description
@tracking-exposed/data npm Common data layer.
@tracking-exposed/processor-cli npm Control a data processor.
@tracking-exposed/services-cli npm Control a web service.
@tracking-exposed/utils npm Shared utility functions.

Web Services

Package Version Description
@tracking-exposed/service-rss npm Subscribe to custom RSS feeds based on entities.

Stream Processors

Package Version Description
@tracking-exposed/process-entities npm Process impressions and extract entities.
@tracking-exposed/process-rss npm Generate and cache RSS feeds.

FAQ

Want to report a bug or request a feature?

Please read through our CONTRIBUTING.md and file an issue at tracking-exposed/issues!

Want to contribute to tracking-exposed?

Check out our CONTRIBUTING.md to get started with setting up the repo.

How is the repo structured?

This repo is managed as a monorepo that is composed of many npm packages.

facebook's People

Contributors

bebsy avatar berli0z avatar davinerd avatar dependabot[bot] avatar digitigrafo avatar endorama avatar fievelk avatar joxer avatar konarkmodi avatar micheleb avatar mitch90 avatar nolash avatar raimondiand avatar titan-c avatar vecna avatar vrde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

facebook's Issues

facebook HTML processing pipeline

Processing pipeline of HTMLs

from the raw HTML of facebook you can extract meaningful metadata and append your own result to the database, to let other researcher benefit of that, and so on, in a collaborative effort to create a

The goal is having a distributed network of parsers. independent developers might run their own analysis tool on top of some validated meta data. a distributed effort of parsing, trying to emulate the analysis facebook itself does. well, not exactly the same because would be impossible, but somehow, create a working pipeline that might:

  • show to the user (restricted access) more information on what is received
  • perform statistics on topics, penetration of fake news, shape of spreading
  • observing trends online from an open source independent third party, like alexa of facebook
  • provide API for algorithm analysis, to researchers, working group, policy makers, journalists

To begin, we've to extract the smaller chunk of metadata, and make progress in a binary tree of parsers.
image

we can save the metadata submitted, if the information is meaningful, privacy preserving at their best, minimized at the best against attacks that can provide any benefits to minimized to be against decontexualisation attacks at API level.

processed that empower the data analysis and the capability of this network. and the dataset, and the analysis might follow

This is what is in the database after some iteration. every iteration extend the metadata in mongodb:
image

simple kind of parser

function getPostType(snippet) {

    var $ = cheerio.load(snippet.html);

    if ($('.uiStreamSponsoredLink').length > 0) 
        var retVal = "promoted";
    else if ($('.uiStreamAdditionalLogging').length > 0)
        var retVal = "promoted";
    else
        var retVal = "feed";

    // TODO, don't use exclusion condition, but find a selector
    // for 'feed' too, and associate postType: fail so we can investigate on it later
    debug("・%s ∩ %s", snippet.id, retVal);
    return { 'postType': true, 
             'type': retVal };
};

var postType = {
    'name': 'postType',
    'requirements': {},
    'implementation': getPostType,
    'since': "2016-11-13",
    'until': moment().toISOString(),
};
return parse.please(postType);

*The HTMLs are collected via web-extension and saved at the end of this backend-handler: https://github.com/tracking-exposed/facebook/blob/master/lib/events.js#L52 *

More complicate parser exists, they are located in https://github.com/tracking-exposed/facebook/tree/master/parsers

@nolash do you have suggestion? you've been the first to contribute 👍 I'm committing in branch feedBasicInfo, and @fievelk is doing the version in python https://github.com/fievelk/fbt_pyparsers

This is the first script is run in sequence, postType, pasted above, just extend the table 'html' as metadata in the server. is a binary decision tree

$ DEBUG=* node parsers/postType.js 
  parser:⊹core Connecting to https://facebook.tracking.exposed/api/v1/snippet/status
{
  "since": "2016-11-13",
  "until": "2016-12-29T19:11:36.938Z",
  "parserName": "postType",
  "requirements": {}
} +0ms
  parser:⊹core 46638 HTMLs, 300 per request = 155 requests +1s
  parser:⊹core Connecting to https://facebook.tracking.exposed/api/v1/snippet/content
{
  "since": "2016-11-13",
  "until": "2016-12-29T19:11:36.938Z",
  "parserName": "postType",
  "requirements": {}
} +5ms

This is the output of the execution, for every html snippet, look in two patterns. It is better if the condition cease do be exclusive. if we can understand how to spot a non-promoted post too, the information is more robust and everything goes better

  parser:postType ・fdb795f8c2394d23dd2280ad4eedf9f7c897b98e ∩ feed +6ms
  parser:postType ・e41f623d1cf4e3737aaf8396ee0f52383622c145 ∩ feed +4ms
  parser:postType ・f55e0ba360454fd295070b8ac4231cfd75a4dc21 ∩ promoted +11ms
  parser:postType ・d76f8d8e8f21162f21a291cccbe5101699bb585e ∩ feed +274ms
  parser:postType ・4ccd0d6090490d9afd0c9c0a4cdb24b47eaa68c6 ∩ feed +729ms
  parser:postType ・916ebb01da701f417391ab30928298a6c24428eb ∩ feed +130ms

Collect a list of user-proof reader for RSS

we need a list of RSS client for different operating systems. They should be most straightforward as possible (for example, I tested newsbeuter but considering it should be compiled it is not ok. this is the reason why is not here where self-hosted RSS client such https://tt-rss.org/ and https://freshrss.org )

We need something download+install+use proof.

fbtrex July'17 Introduction

+ fbtrex introduction July’17 +

What is the problem?

  • Facebook has tremendous global reach and influences thinking in many places
  • The ways in which FB selects information to be presented to individual users are not well understood
  • Algorithms are like social policies; they should be in public discussions, they can’t be imposed.
  • Users has been used as rats lab in psychological and social experiment
  • There are strong evidences on the impact the algorithm can, and has, in the society
  • There is no transparency (with a data policy) from FB
  • Facebook claim the complexity of the algorithm it is so huge no engineer can understand or explain the internals, making in the algorithm the perfect technocratic scapegoat

What are the goals

  • Overall
    • Observe, evaluate, and better understand Facebook algorithm, in particular how FB leverages and profits from promoted content.
    • Develop awareness on how Facebook operate the business of promoted content
    • Enable researchers from non-technical fields to analyze the social media
  • For users of FB
    • Enable better understanding a critical issue and its implications
    • Raise awareness around the concept of “algorithm diversity” or “the right to pick your algorithm.”
    • Provide users with a persistent record of their timeline.
  • For research
    • Serve as a neutral actor, reliably providing research-quality data for analysis
    • Engage diverse research audiences to utilize the data, prioritizing and encouraging interdisciplinary approaches.
  • For FB
    • Over the long term, get FB to give users greater transparency and greater agency over how they experience the algorithm.
    • Encourage FB to publish more open data describing the social phenomena happening in the social media

How we intend to work toward the goals

  • Overall, the approach is based on broadly distributed collection of individual user experiences with FB feeds from their respective observation points, utilizing a networks of volunteers and bots
  • Using a web-extension, we will collect data on what each user sees on their public timeline.
  • Extract the metadata from the data submitted, and make this dataset available to researchers.
  • Produce visualizations and other renderings of how different users experience the FB algorithm.
  • Use research findings to inform both advocacy and user education.

What needs to be done next?

Overall, we are open to advice, partnership, and collaboration with all interested parties.

Our roadmap includes the following milestones:

  • Improve the software
    • Improve the code of the parsers, small component who extract metadata from the posts seen from the user.
  • Establish community processes
    • Write a visually and formally clear data policy, explain the life-cycle of the data and where third party and users interact with us in an exemplary way.
    • Write an ethical agreement to
      • permit the third party accessing the database of collected observations, in order to protect the supporters against social media intelligence;
    • the third party has not to sell, monetize, publish, reuse or analyze and copy outside the scope of the agreed analysis.
      • As safeguards, the algorithms runs on the database are declared and formalized, we execute the script, providing an API for the owner.
  • Engage and support researchers

Implement /api/v1/zombiereport

A POST cryptographic signed containing number fields:

{
    timestamp: ISO format about when the post is done
    content1: number
    content2: number
}

content has to be defined on tracking-exposed/web-extension#23

from the headers publicKey is associated to an authenticated user, and the number are updated. the number reported are send incrementally. Will never happen that a number sent would be lesser than the previous.

CSV download it is broken

image

12124 ۞  ~/Downloads cat feed-100014305273231.csv 
"savingTime","id","type","timelineId","publicationUTime","postId","permaLink"
"2017-04-14T08:41:26.817Z","c7619be84ae3ddb88715b67b393864ad09c4168e","feed","299e771127cbc6e4f5093488c40c1bb3c8f7c241","2017-04-12T12:06:43.000Z","",""

Document the semantic analysis, the semantic database and the RSS database

Using https://dandelion.eu as third party and sponsor of fbTREX, they receive pieces of text and return the semantic analysis by extracting the wikipedia pages related to the content.

This permit us to do keyword research and indexing of what is on wikipedia (allegedly and enforced by community standard) only content with encyclopedic relevance are there.

Workflow

  1. online user connects to /feeds/something.xml
  2. if "something" it is a valid keyword, it is inserted as the subscribed feeds in the collection feeds
  3. a scheduled task rss-composer looks for all the existing feed, looks at the last imported labels and if matches, it append the

mongodb collections, format and naming

labels

mongodb indexes

{  semanticId: 1  }, 
{  when: 1 }

object:

{
    "semanticId" : "61d87231dd11b414ca333c6528732ed75303150d",
    "lang" : "en",
    "when" : ISODate("2019-01-22T09:32:25.514Z"),
    "l" : [ 
        "Deep learning", 
        "Artificial intelligence", 
        "Curve fitting"
    ],
    "matchmap" : [ 0 ],
    "textsize" : 77
}

semantics

mongodb indexes

{  semanticId: 1  }, 
{  label: 1 }

object:

{
    "semanticId" : "b37245be6425ff066866786940775eb6848ab3eb",
    "spot" : "prometió",
    "confidence" : 0.83,
    "title" : "Prometio",
    "wp" : "http://es.wikipedia.org/wiki/Prometio",
    "label" : "Prometio"
}

feeds

mongodb indexes:

{ "id" : 1, "unique" : true }

object:

{
    "id" : "e362945dba9eeacc93fdeda926f3167a2f0be59a",
    "insertAt" : ISODate("2019-01-22T10:28:44.701Z"),
    "labels" : [ 
        "Dudeism"
    ],
    "created" : false
}

flatmap-stream dependency is malicious

Currently we are depending on flatmap-stream, which contains a malicious payload that looks for crypto currencies wallets and tries to grab them Link.

The dependency on flatmap-stream is introduced by nodemon, which is on v1.8.3, but it seems that is has been fixed in v1.8.7.
So probably by dumping the version it should be fixed.

supporting more keys for the same user

restructure the onboarding API in the backend.
support more keys per user, communicate them to the users via web, evaluate with @vrde how to signal and communicate these conditions:

  • when an user delete localstorage
  • when firefox/chrome syncing are used
  • if we can communicate condition like "we know your user but you are in a new browser, therefore..."

restore/redesign the metadata extraction and error management

An important feature should be:

  1. see the original post from Facebook
  2. see how we process it, which metadata have been extracted by it
  3. access to these evidences when a new error in the parser arises

At the moment we are facing a parser refactor and a new rebuilding of the database, this issue is to keep track of the progress

Can run the project locally on first install

Following the readme after doing npm run watch I get:

> [email protected] watch /home/me/dev/facebook
> nodemon --config config/nodeamon.json app

[nodemon] 1.17.4
[nodemon] reading config ./config/nodeamon.json
[nodemon] to restart at any time, enter `rs`
[nodemon] or send SIGHUP to 3847 to restart
[nodemon] ignoring: sections/webscripts/*.js
[nodemon] watching: lib/*.js app.js sections/*.pug sections/*/*.pug dist/**
[nodemon] watching extensions: js,css,pug
[nodemon] starting `node app app.js`
[nodemon] forking
[nodemon] child pid: 3858
[nodemon] watching 367 files
ઉ nconf loaded, using config/settings.json
/home/me/dev/facebook/app.js:31
    throw new Error("Rename config/settings.example to config/settins.json, and read the content");
    ^

Error: Rename config/settings.example to config/settins.json, and read the content
    at Object.<anonymous> (/home/me/dev/facebook/app.js:31:11)
    at Module._compile (internal/modules/cjs/loader.js:678:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:689:10)
    at Module.load (internal/modules/cjs/loader.js:589:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:528:12)
    at Function.Module._load (internal/modules/cjs/loader.js:520:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:719:10)
    at startup (internal/bootstrap/node.js:228:19)
    at bootstrapNodeJSCore (internal/bootstrap/node.js:576:3)
[nodemon] app crashed - waiting for file changes before starting...

Suggestions support

The goal is permit viewers to rapidly interact with the project, in order to get volunteers and hints.

Trought the presence of [a text], pages will open a modal asking to fill up some questions, implemented in dist/js/suggest.js

implement signature verification of submitted posts

in lib/events.js

        .then(function(supporter) {
            if(!supporter || !_.isInteger(supporter.userId)) {
                debug("UserId %d not found: *recording anyway*", headers.supporterId);
                // throw new Error('user lookup - userId not found');
                return {
                    'userId': headers.supporterId,
                    'protocolViolation': true
                }
            } else {
                debug("UserId %d found", headers.supporterId);
                return supporter;
            }

            // console.log(_.size(req.rawBody));
            // debug("%d", _.size(supporter.publicKey));
            // if (signer.verify('NaCL is amazing!', signature, publicKey))
            // raise error if fail -- work in progress

This piece of code is an holed patchwork to make the backend works with the web-extension.
The base58 and NaCL sequence gave me some problems and I didn't found yet the need to make it a priority, but would be very wise have it.

fix and review parserv ELK format

Condition where postId fail in being extracted should be managed differently.

org.elasticsearch.index.mapper.MapperParsingException: failed to parse [errors]
[...]
Caused by: java.lang.IllegalArgumentException: For input string: "postId"

visual feedback for parsers

  • implement a CSS that may mock the original one of Facebook
  • have a visualization reporting: the exacted metadata and the original post (with the style said above)
  • a button that permit to report to the system a wrong parsing
  • a visualization that might make emerge posts with unexpected condition

support localized splashpage

  • find a more reliable geoip
  • test clean include Jade pattern for support different splashscreen
  • test out suggestion workflow
  • have a good example
  • have a list of implemented "declinations" showing used a link to see it

implement and use log with ELK

This is a multipurpose issue, in collaboration with @joxer

  • test the sending of messages to the ELK system log.tracking.exposed
  • testing the visualization of the ELK system
  • integrate the logging in different components of the pipeline.

edit, at the end of the issue, we should have these logs working:

  • adopters handshake
  • events received by web-extension
  • page navigation info
  • parsers errors
  • parsers statistics on metadata extraction
  • mongodb auditlog
  • anomalies (they require investigation before they expires)

make the whole project status be accessible via web

at the moment. online if you open https://facebook.tracking.exposed/project you'll see only two bullet points (glossary and .pdf download). But these are the file that should be listed: https://github.com/tracking-exposed/facebook/tree/master/sections/project

The files "problem" "solution" and "details" should be fixed. there are some unprintable chars which should become a ' or "

And the links, in the .pdf as footnotes, should become clickable links . This is because the document has been written as a shared doc and now the conversion to HTML is a bit tedious

Implement a stronger authentication for fbtrex users

In this moment the URL used by the user to access their own data can be:

  • guessed from the outside
  • is never changing

Despite there are not private information on it, it is debatable it can represent a sensible information. It is important keep these data accessible only to the legitimate user, and if the user want, share said link to someone trusted. The user should be able to revoke the access token.

reminder, refactor

When the rewriting of the code will be planned, has to be remember:

  • socket.io, realtime update
  • split of data storage (elasticsearch, graph-js, mongo, node4j ?)
  • component splitting, pipeline approach, functional definition as much as possible
  • dockerization of every component
  • interpret (python or node) capability tuning and limit
  • kernel security capabilities limit
  • logging of every data management
  • description of the data structures copied in the different backend

metadata extension in db.timelines

@vrde I was thinking that the meta-data extension on the posts might be asynchronous.
the best way to keep them, is in a list of object, every metadata an object, and trough the label of the object the meta-data get identified

order: xx,
displayTime: yy
type: "promoted" | "feed"
meta: [
]

inside of meta:

{ label: 'via',
  userId: 212121,
  displayName: 'Claudio A.' },
{ label: 'title'
  ... },
{ label: 'friendactivity',
  friend: '3232321',
  what: 'comment' | 'like' | 'reaction '
}, 
{ label: 'likes',
   count: 10
}, 
{ label: 'comments',
   count: 15
}

basically, what is parsed it is in 'meta'. The object timeline contain just the basic info of the place (which position, when. even time and postId are in meta, because promoted posts hasn't them)

redesign realitycheck webapp [transition task]

In the next future we would have a professional UX designer, web designer and d3.js developers working on the webapp currently known as realitycheck. In this issue I keep track of the changes which would permit a smooth transition, capable to:

  • display the new metadata collected
  • perform some basic filtering operation
  • download your data

UX research and improvement on realitycheck page

Now the @Mitch90 visualisation is in place, at the realitycheck page answers https://facebook.tracking.exposed/realitycheck/100013436260185 the new viz. we can imagine, this would be the page where users interact more and specially the users that don't know what the filter bubble really is.

how can be performed an UX interaction test? how can we improve the communication on the project itself ?

I think using .jade snipped we can manage new links, new text, to appear as a feed and as group share the access on feeding with information related.

as per the look, I was thinking

  • to make work the number of timelines based on the screen size, and number of post per timeline be a "mandatory" number. if a timeline has less posts than the required, is ignored and took an older one.
  • lines connecting the rectangles

restore stats | impact

The alpha /impact is hard to be restored. c3 has been used in a sub-optimal way, so, it is better define which stats can be done and rewrite these APIs/viz.

users, adoption, page access can be separated by the others (timelines, html, impression) and then, metadata for html.

these might be three different graph, restructured properly. I'll document here the progress

Missing license

I see there's GLP-2.0 (please note the typo) in package.json but there's no COPYING or LICENSE file.

improve robustness of ID in echoes.js

  lib:echoes ES logging ID 154777370644111870 in [parserv] +28ms
  lib:echoes ES logging ID 154777370644160860 in [parserv] +0ms
  lib:echoes ES logging ID 154777370644142820 in [parserv] +0ms
  lib:echoes ES logging ID 154777370644190140 in [parserv] +0ms
  lib:echoes ES logging ID 154777370644118140 in [parserv] +0ms

This is the log, it should be more random and smaller. sadly we can't use hashes

Validate endpoint fails when user has a vanity url

If the user has a vanity URL for their profile, the permalink does not contain ?id= and the requests fails.

This behaviour can be reproduced with the following command:

curl -X POST -H "Content-Type: application/json"  -d '{
  "html": "",
  "userId": "818251551",
  "permalink": "https://www.facebook.com/agranzot/posts/10154799187231552",
  "publicKey": "HDtoJqDMDsP7hJJqevWGsdhirWaydLzv6XEuU3ofo58i"
}' "http://localhost:8000/api/v1/validate"

string before "CR" is not displayed when reading RSS

we need to fix these lines

const fbtrexRSSplaceholder = "Welcome, you should wait 10 minutes circa to get this newsfeed populated, now the subscription is taken in account. " + CR + "fbTREX would stop to populate this feed if no request is seen in 5 days. updates would be automatic. You can find more specifics about the RSS settings in [here todo doc]";

const fbtrexRSSdescription = "This newsfeed is generated by the distributed observation of Facebook posts, collected with https://facebook.tracking.exposed browser extension; The posts are processed with a technical proceduce called semantic analysis, it extract the core meanings of the post linkable to existing wikipedia pages";

const fbtrexRSSproblem = "We can't provide a newsfeed on the information you requested. This is, normally, due because you look for a keyword which has not been seen recently. We permit to generate RSS only about voices which are part of wikipedia because this ensure we do not enable any kind of stalking. (editing wikipedia would not work). You can really use only label which are meaningful encyclopedic voices.";

crash

`
$ npm run watch

[email protected] watch /home/user/src/facebook/facebook
nodemon --config config/nodeamon.json app

[nodemon] 1.18.3
[nodemon] reading config ./config/nodeamon.json
[nodemon] to restart at any time, enter rs
[nodemon] or send SIGHUP to 10358 to restart
[nodemon] ignoring: sections/webscripts/.js
[nodemon] watching: lib/
.js app.js sections/.pug sections//*.pug dist/**
[nodemon] watching extensions: js,css,pug
[nodemon] starting node app app.js
[nodemon] forking
[nodemon] child pid: 10370
[nodemon] watching 272 files
/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:59
throw err;
^

Error: /home/user/src/facebook/facebook/lib/../sections/talks/landing.pug:50:1
48| a(href="https://elezioni.tracking.exposed") https://elezioni.tracking.exposed
49| | some

50| b English
--------^
51| | blogpost commenting on the analysis are:
52| a(href="https://medium.com/@trackingexposed/facebook-tracking-exposed-background-80e0f72e615f") Methodology/Background
53| |,

unexpected token "indent"
at makeError (/home/user/src/facebook/facebook/node_modules/pug-error/index.js:32:13)
at Parser.error (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:53:15)
at Parser.parseExpr (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:264:14)
at Parser.block (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:996:25)
at Parser.tag (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:1157:24)
at Parser.parseTag (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:1049:17)
at Parser.parseExpr (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:208:21)
at Parser.block (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:996:25)
at Parser.tag (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:1157:24)
at Parser.parseTag (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:1049:17)
at Parser.parseExpr (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:208:21)
at Parser.block (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:996:25)
at Parser.tag (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:1157:24)
at Parser.parseTag (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:1049:17)
at Parser.parseExpr (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:208:21)
at Parser.block (/home/user/src/facebook/facebook/node_modules/pug-parser/index.js:996:25)
[nodemon] app crashed - waiting for file changes before starting...
`

Export data in RSS format

As suggested by some people in the internet, RSS is a format to share updates which can be resurrected. The first use case is to let a person fetch their own data in RSS format from the personal page. This issue will be addressed during the mozilla/global-sprint#308 mozsprint

code refactor for version 1.0

The branch where development is happening is: https://github.com/tracking-exposed/facebook/tree/yS
below the two priority tasks as discussed with @rugantio @fedebarba @joxer)

P0 - Show data on realitycheck

  1. Have robust parsers (me and @fedebarba )
  • Maintain a QOS of acceptable service (this should be measured with a better interace than the current)
    • 80% of posts in a month are collected well
    • SLA 72 hours on breaking parser
  1. Redo the part of the site that shows the data (graphic part, stripped down)
  • Change the reality check system and focus on interactive tables
  • search / sort / filter
  • at the moment the problem lies in the heaviness of the query and the answer, so the API must be redone
  1. Complete the separation of the 3 services, integrate system logs.
  • split the static web pages, the collector, the API

P1 - Having constant logging and showing transparency to users

  1. Having a monitoring and logging infrastructure (ELK, currently in testing on log.tracking.exposed)
  2. Something custom that reads from elasticsearch with statistics

Next steps (still to be documented/addressed)

P2 - Having batch processes that allow you to have the aggregated data aggregated for users

P3 - allow those who "create a research group" to see aggregate data of users who have opted in his group.

P4 - allow users to share portions of their timeline (so as to make comparisons between them)

Application start problem without a Database

The application can start even if mongo is not running, when running it starts without displaying any error.
We can adopt two strategies:

  • Don't make the application start when there's an error
  • Logs out every error

I would say the best option is the first one, but I leave the open decision for the team.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.