ooni / probe Goto Github PK
View Code? Open in Web Editor NEWOONI Probe network measurement tool for detecting internet censorship
Home Page: https://ooni.org/install
License: BSD 3-Clause "New" or "Revised" License
OONI Probe network measurement tool for detecting internet censorship
Home Page: https://ooni.org/install
License: BSD 3-Clause "New" or "Revised" License
We would like to have a DFD of how the data flow works in the mlab data pipeline.
This is once the report is stored locally in our ooni backend collector how does it end up published on cloud storage and big table.
@stephen-soltesz can you take care of this?
Lovely OONI people, README.md did not recommend which trac to use, so here I am.
I have been playing with porting some personal testing scripts to the framework and come to appreciate a limitation in doRequest. It seems that unless a responding server returns a very specific set of headers, twisted records that response.body is an empty string. After hours of nginx annoyance, I found through running a simple node.js that at least 'Content-Type', e.g. 'text/html; charset=iso-8859-1', had to be set (if not 'Trailer' or others). This affects http_requests.py as a substantial number of sites return empty bodies over both connections.
Jake reports that ooni is ignoring test .deck logfile paths.
M-Lab's name system is ready for Ooni usage.
Synopsis
The Role Definitions currently have many FIXME comments for Reliances.
Close Criteria
This ticket should be closed when every role's reliances are specified. The Organizational Reliances are a set of statements of the form: "Role X relies on Role Y for ..."
Related Issues
Synopsis
There are Unincorporated Impacts below the new Impacts table.
Close Criteria
d@d:~/ooni-probe$ ./bin/ooniprobe nettests/blocking/dnstamper.py -f
hosts.txt
WARNING: Failed to execute tcpdump. Check it is installed and in the PATH
Log opened.
[D] No test deck detected
[D] processing options
[D] Checking if backend is present
[D] Checking if file is present
Starting Tor...
[D] Setting control port as 26177
[D] Setting SOCKS port as 45954
[D] 10%: Finishing handshake with directory server
[D] 15%: Establishing an encrypted directory connection
[D] 20%: Asking for networkstatus consensus
[D] 25%: Loading networkstatus consensus
[D] 45%: Asking for relay descriptors
[D] 50%: Loading relay descriptors
[D] 53%: Loading relay descriptors
[D] 57%: Loading relay descriptors
[D] 61%: Loading relay descriptors
[D] 64%: Loading relay descriptors
[D] 68%: Loading relay descriptors
[D] 72%: Loading relay descriptors
[D] 76%: Loading relay descriptors
[D] 80%: Connecting to the Tor network
[D] 85%: Finishing handshake with first hop
[D] 90%: Establishing a Tor circuit
[D] 100%: Done
[D] Building a TorState
Successfully bootstrapped Tor
[D] We now have the following circuits:
[D] * <Circuit 1 BUILT [194.109.206.212] for GENERAL>
[D] * <Circuit 2 BUILT [88.198.100.230] for GENERAL>
[D] * <Circuit 3 BUILT [195.242.152.250] for GENERAL>
[D] * <Circuit 4 BUILT [195.191.16.63] for GENERAL>
[D] * <Circuit 5 BUILT [91.206.27.30] for GENERAL>
[D] * <Circuit 6 BUILT [94.126.178.1] for GENERAL>
[D] * <Circuit 7 BUILT [188.40.32.154] for GENERAL>
[D] * <Circuit 8 LAUNCHED [] for GENERAL>
[D] * <Circuit 9 BUILT [87.106.249.118] for GENERAL>
[D] * <Circuit 10 BUILT [31.172.30.4] for GENERAL>
[D] * <Circuit 11 BUILT [176.65.109.60] for GENERAL>
[D] * <Circuit 12 BUILT [173.246.82.97] for GENERAL>
[D] * <Circuit 13 BUILT [178.86.31.41] for GENERAL>
[D] * <Circuit 14 BUILT [37.130.227.133] for GENERAL>
[D] * <Circuit 15 BUILT [173.254.216.69] for GENERAL>
[D] * <Circuit 16 BUILT [80.237.226.75] for GENERAL>
[D] * <Circuit 17 BUILT [31.172.30.2] for GENERAL>
[D] * <Circuit 18 BUILT [31.172.30.1] for GENERAL>
[D] * <Circuit 19 BUILT [85.25.108.113] for GENERAL>
[D] * <Circuit 20 BUILT [204.45.185.164] for GENERAL>
[D] * <Circuit 21 BUILT [62.141.42.149] for GENERAL>
[D] * <Circuit 22 BUILT [37.130.227.134] for GENERAL>
[D] * <Circuit 23 BUILT [85.214.73.63] for GENERAL>
[D] * <Circuit 24 LAUNCHED [] for GENERAL>
[D] * <Circuit 25 BUILT [166.70.154.130] for GENERAL>
[D] * <Circuit 26 BUILT [193.10.227.195] for GENERAL>
[D] * <Circuit 27 BUILT [85.25.110.235] for GENERAL>
[D] * <Circuit 28 BUILT [68.169.35.102 5.135.176.63 46.167.245.50] for
GENERAL>
[D] * <Circuit 29 EXTENDED [68.169.35.102 83.212.98.169] for GENERAL>
[D] * <Circuit 30 EXTENDED [68.169.35.102] for GENERAL>
[D] Obtained our IP address from a Tor Relay None
[D] Running [(<class 'nettests.blocking.dnstamper.DNSTamperTest'>,
'test_a_lookup')]
[D] Options {'inputs': <ooni.nettest.inputProcessorIterator object at
0xa8280cc>, 'version': '0.4', 'name': 'DNS tamper'}
[D] cmd_line_options {'pcapfile': None, 'help': 0, 'subargs': ('-f',
'hosts.txt'), 'resume': 0, 'parallelism': '10', 'test':
'nettests/blocking/dnstamper.py', 'logfile': None, 'collector': None,
'reportfile': None}
[D] testsEnded: Finished running all tests
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 489, in _startRunCallbacks
self._runCallbacks()
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 576, in _runCallbacks
current.result = callback(current.result, _args, *_kw)
File "/home/d/ooni-probe/ooni/oonicli.py", line 107, in runTestList
d1 = runner.runTestCases(test_cases, options, cmd_line_options)
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 1214, in unwindGenerator
return _inlineCallbacks(None, gen, Deferred())
--- ---
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py",
line 1071, in _inlineCallbacks
result = g.send(result)
File "/home/d/ooni-probe/ooni/runner.py", line 416, in runTestCases
oonib_reporter = OONIBReporter(cmd_line_options)
File "/home/d/ooni-probe/ooni/reporter.py", line 271, in init
from ooni.utils.txagentwithsocks import Agent
File "/home/d/ooni-probe/ooni/utils/txagentwithsocks.py", line 15, in
from twisted.internet.endpoints import TCP4ClientEndpoint,
SSL4ClientEndpoint, _WrappingProtocol, _WrappingFactory
exceptions.ImportError: cannot import name _WrappingProtocol
Main loop terminated.
Twisted version:
d@d:~/ooni-probe$ ./bin/ooniprobe nettests/blocking/dnstamper.py
--version
WARNING: Failed to execute tcpdump. Check it is installed and in the PATH
Log opened.
[D] No test deck detected
Twisted version: 12.3.0
The_WrappingProtocol class is not anymore in the last version of Twisted (stable, 12.3.0):
https://twistedmatrix.com/documents/current/api/twisted.internet.endpoints.html
https://twistedmatrix.com/documents/12.1.0/api/twisted.internet.endpoints.html
Synopsis
The Threat Taxonomy already includes injection attacks, but it is specific to data within reports. There is also a potential threat of injection attacks in non-report data, such as in HTTP server logs of probe target webservers.
Close Criteria
It may be useful for a certain collector to specify which inputs it supports. This allows the creation of topic specific collectors (e.x. the collector on news websites, the collector on blogs, the collector for Italy, etc.).
When a input that is not supported by the specified collector we should output a message and prompt the user to input an alternative collector.
https://github.com/TheTorProject/ooni-probe/issues/109
https://github.com/TheTorProject/ooni-probe/issues/115
https://github.com/TheTorProject/ooni-probe/issues/116
ooni-probe needs to get the addresses of test helpers and collectors. The M-Lab NS will handle listing all the test helpers running and the .onion address of the collector running on that machine.
An example of how this service works can be found here: https://mlab-ns.appspot.com/neubot?format=json
.
We are making the assumption that there is no test that requires more than one test helper that binds on the same port. (i.e. HTTP Return JSON Header and TCP Echo test helpers can not be used both at the same time in a nettest)
Specify the procedure for moving to a new version of a nettest and test helper, and the mechanism for a server to notify a client about obsolete or dangerous tests/helpers.
Ooni is in the M-Lab data pipeline.
An "out-of-spec message" is a protocol message that is not valid according to the relevant protocol specification (e.g. RFC 2616 for HTTP).
Despite Postel's principle ("Be generous in what you accept"), we know that in practice network software often fails to be robust against unexpected messages. If the request is not designed to be an attack and has no malicious payload, the bad effects are normally limited to denial of service to other clients. Let's call a network component (origin server, proxy, middlebox) "fragile" if receiving an out-of-spec message causes it to fail to give correct service to other clients.
The policy question for this issue is whether Ooni-probe 1.0 should ship with net-tests that perform out-of-spec requests. (Whether Ooni-b can relay out-of-spec requests is also important, but not in the scope of this issue.)
Of the candidate tests in issue #89, the only one that sends out-of-spec messages according to its description seems to be "HTTP Invalid Request Line".
Here are the arguments I see against including such net-tests:
a) The operators of network components affected by out-of-spec messages may view them as attacks.
Note that many countries, including Western liberal democracies, have computer misuse laws that cover activities that are perceived to be exploiting bugs in network software. It may be difficult for someone suspected under such a law to defend themself, especially if the suspicion draws attention to their other activities. There have been cases where people were convicted under computer misuse laws for sending out-of-spec messages (or messages that were perceived to be out-of-spec), even when there is considerable doubt that they intended to perform an attack, e.g. http://www.theregister.co.uk/2005/10/11/tsunami_hacker_followup/. The risk in countries where the rule of law is less consistently applied is only likely to be worse.
It's also possible that other network users could be misidentified as having
originated the probe, and similarly be viewed as attackers.
b) Effects on fragile network components can deny service to other network users.
Suppose, for instance, that a censoring proxy fails because it receives an out-of-spec message. It may fail "open" (i.e. let through subsequent requests) or "closed" (i.e.
fail to correctly relay subsequent requests). If it fails closed, then the probe has
had the effect of making the censorship worse for other network users, at least in
the short term, which is obviously counterproductive.
This can happen whether or not the fragile component was part of a system
of network interference. The description of "HTTP Invalid Request Line" seems
to make the implicit assumption that only "interfering" network components are
likely to be fragile. This is wrong; transparent/caching HTTP proxies,
firewalls that are not intended to be interfering, and origin servers, can also
realistically be fragile.
c) Effects on fragile network components can result in misleading measurements.
In principle, any active network test can change the behaviour of the network.
In fact such changes are one of the things we want to measure! However, if the
intent is to measure network behaviour that could in principle have been
encountered by non-test clients in normal operation, that could result in
misleading overreporting of network interference.
Documenting these problems for specific tests only partially addresses point a), since a user can't really be expected to have enough information to determine their risk of being viewed as an attacker. It does not address points b) and c) at all.
If ooni-probe supports running such tests but the results are not stored by a given
collector (e.g. MLab's collectors), that would address point c) for that collector,
but not points a) and b).
(There are some other net-tests that send atypical messages that are clearly in-spec, but that seems to be much less of an issue; other client software will occasionally send messages that are atypical in the same way.)
It may be useful for a certain collector to specify which inputs it supports. This allows the creation of topic specific collectors (e.x. the collector on news websites, the collector on blogs, the collector for Italy, etc.).
When a input that is not supported by the specified collector we should output a message and prompt the user to input an alternative collector.
https://github.com/TheTorProject/ooni-probe/issues/114
https://github.com/TheTorProject/ooni-probe/issues/115
https://github.com/TheTorProject/ooni-probe/issues/116
The Makefile in ooni-probe/inputs/ points at the file https://ooni.torproject.org/inputs/input-pack.tar.gz, but this is 404'd.
Measurement Lab and Least Authority complete a privacy audit of the data probes submit and M-Lab publishes, and we limit our initial release to the safe subset.
As we agreed in #107 we should assess how much code coverage we are reaching with our unittest (by using a tool like coverage) and possible integrate it with coveralls.io.
Here are some suggestions by @nathan-at-least
Some handy automated tools are:
API documentation generator from python doc strings - so that anyone can browse the names and intent of particular tests.
Coverage analysis - see coverage which can generate html reports of which lines of application code are exercised by unit tests. This is a quick way to notice untested portions of code.
Test Bots - Setting up a bot to run unit tests then generate an html report for various revisions and platforms can quickly show regressions.
Synopsis
We need different terminology for these distinct kinds of statements:
Close Criteria
This issue can be closed when both of these are satisfied:
Details
Legacy: The wiki already uses this terminology:
Suggested terminology:
Produce specification/design-document ooni-backend.
Close this ticket with a yes / no.
The MLab initialize.sh
script for Ooni selects which test helpers bind to a given port randomly. The requirement is for the same port to provide multiple distinct test helpers, so the current strategy is to partition the MLab slices (and thus IP addresses) for each port according to how many helpers require that port. The random selection accomplishes this in a stateless / configuration-free manner.
Meanwhile, the probe will use the mlab-ns
web service to request test helpers and a collector prior to running a net-test. This service currently responds non-deterministically (with various constraints and prioritizations such as scoring based on load).
The question is: Are these two sources of non-determinism a problem?
For scientific repeatability, randomness adds noise. For diagnostic reasons, determinism can make it simpler to understand logs or report data. For security reasons, censors might be able to game non-determinism in a way to favor particular test results. It may be that none of these concerns are strong enough (also considering the dev cost of removing the non-determinism).
If the answer is "no", there's a dev cost implication for mlab-ns
which should be coordinated with MLab.
test issue
We have parsers for the HTTP & DNS test reports.
Write test input processors for every test that can use a URI in a meaningful way.
An ooni-probe could learn about test decks and available input lists from a collector and provide a UI for choosing and running these tests.
A collector operator would specify the experiments they would like to collect in the form of a set of test decks with accompanying input lists, and a probe operator would be able to select one of the available experiments and then perform the measurements.
#115
It would be handy if the http_requests test also recorded the IP and nickname of the Tor exit IP that the http fetch occurred over, as we would get exit scanning for free.
Tests should not stall forever.
Are we writing for Python >= 2.6?
What if I want to use something that is new to 2.7, should I just take the code I need and put it into utils? I ask because I want the namedtuple, OrderedDict, and Counter classes from the collections module but the later two classes are new to 2.7.
Deployment works, updates work, mlab-ns works...
And notify the user.
Synopsis
The threat category Deanonymizing Data Correlation lacks definitions and examples.
Close Criteria
Implement backend rejecting probe's request for a test helper with a notification of obsolescence or risk.
Test failures should be reported correctly by the probe.
Ooni has a complete and useful threat model.
Synopsis
Issues such as #133 represent a risk not currently captured by the Threat Taxonomy so we need a new category, probably under "Resource Abuse". The category "Resource Abuse" should be renamed to "Resource Risks" to generalize it to encompass unintentional problems.
Also, the Leveraged Attacks under Resource Risks (was: Resource Abuse) in the Threat Taxonomy has a FIXME comment about unintentional DOS.
Close Criteria
Related Issues
It may be useful for a certain collector to specify which inputs it supports. This allows the creation of topic specific collectors (e.x. the collector on news websites, the collector on blogs, the collector for Italy, etc.).
When a input that is not supported by the specified collector it should reply with an error.
The collector should also expose an HTTP API where you can download all the supported inputs.
This is related to: TheTorProject/ooni-probe#109
Synopsis
Currently the Role Definitions include mlab-ns
Operator.
Close Criteria
Close this ticket after this role is changed to a more generic Directory Service Operator, and all wiki references are updated. Issues specific to mlab-ns
should be retained as examples, so that the Threat Model is still useful for MLab.
If a backend has specified a collection policy it should enforce the policy.
Document how the ooni reporting state machine is changed, if at all.
Document how the ooni CREATE report API is changed, if at all.
test
Review the list of proposed tests for Ooni's initial release, and decide which to include, and which to defer.
Determine what should be specified -- input sets, specific tests, test decks?
Determine a syntax for how policy is specified.
Determine where (on which system/component/file) policy will be specified.
Specify any additional state between ooni-probe and ooni-backend.
The API should expose the inputs that are supported by the collector backend and the list of test decks that are curated by the collector
See also:
https://github.com/TheTorProject/ooni-probe/issues/109
https://github.com/TheTorProject/ooni-probe/issues/113
https://github.com/TheTorProject/ooni-probe/issues/114
How should a contributor get started helping out with Ooni? What documentation should they read? What are good projects for them to tackle?
Complete the list of tests that will be deployed with the initial release.
When the clock on a tor client is so wrong that tor network consensus can not be reached, exit with a user comprehensible error, rather than hanging forever.
Add DNS host resolution to tls_handshake.py and input processor for a URI list.
A guide for downloading, setting up, and using Ooni, including an explanation of data privacy and consent.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.