noncesense-research-lab / archival_network Goto Github PK
View Code? Open in Web Editor NEWInvestigating the frequency of alternative blocks, reorganizations, potential double-spend attacks, selfish mining, and more.
License: MIT License
Investigating the frequency of alternative blocks, reorganizations, potential double-spend attacks, selfish mining, and more.
License: MIT License
Retain IP addresses from which each copy of a block/transaction broadcast is received. This will be helpful for studying topology/latency. Additionally, it will allow cross-referencing between blocks, however this could be indexed other ways if operating in IP-blind mode.
Prior to incorporation into publications/dashboards, exact IP addresses should be obscured to preserve privacy.
Since our mysterious entity has ~ 50 MH/s, look at whether:
A) times with side chains correspond to losing 50 MH/s for an hour on the main chain.
-- or --
B) if the massive hashpower pouring in does NOT correspond to a loss on the main chain, which would mean that this is a different entity.
Were the ASICs from spring 2017 run on the testnet or stagenet before making an appearance in the main chain hashrate?
The bigger question is whether we should keep an eye on testnet hash rate for any surreal spikes.
When current MAP_VPS_setup.sh
script is executed on a fresh Debian install, User map
is not added to sudoers (maybe this is intentional, but configuring monerod-archive
requires a sudo to write in /opt
It would be ideal if the MAP_VPS_setup
also pulls down and places the monerod-archive
binary in the appropriate location.
The MAP_VPS_setup.sh could also wget
the archival daemon configuration script to create the directory and configure monerod-archive
as an auto start service, as mentioned in #36
https://monerodocs.org/interacting/monerod-reference/#testing-monero-itself
Here they give you a shoutout, but it links to https://www.noncesense.org/
- appears your DNS is not configured to redirect www
to https://noncesense.org/
Look for signatures of selfish mining
This issue will track my work in order to install Grafana in one VPS for MAP Project.
Install Grafana on MAP-GRAFANA machine.
Install InfluxDB
Pointing all the CollectD to the InfluxDB
Create the dashboard "Nodes" and the row. Each row will be called as the hostname of VPS
As noted in the text of altchain_temporal_study.py, this is a planned enhancement.
Potentially two histograms:
A) histogram of length between orphaned single blocks (should be natural)
B) histogram of length between peculiar side chains
Hack together a quick wiki page about how to connect to Tokyo as a remote node.
Note, the goal here is to connect data about propagation of transactions that originate at our nodes. R & D purposes.
If you want privacy, I would suggest NOT connecting to one of our nodes operating at --log-level > 9000
There are currently multiple warnings raised for using multiple indexing to assign values. This is bad practice, see Pandas docs.
The member
// hash cash
mutable crypto::hash hash;
on struct block is intentionally omitted from its SERIALIZE directive, and so it never appears in JSON and is not archived.
See cryptonote_basic.h struct block: https://github.com/monero-project/monero/blob/ebf2818ab5f42b10745cb99d07920f3197c3d914/src/cryptonote_basic/cryptonote_basic.h#L386
Should we try to add this field to serialization and bring it into the exported block JSON?
Why do many nodes report alt chains going 20 or 30 blocks deep?
Example histogram here: https://github.com/Mitchellpkt/Monero_AltBlock_Research/blob/master/Plotting/node_586b0c5_histogram.png
It has been posited that this is due to a bug in 0.12.0.0 causing these long side chains.
Can you shed light on what could be causing this? Seems like a phenomenal waste of PoW.
If so, what fraction? Probably small.
Create a script that takes the custom monerod-archive
log file as input and extracts a list of all txn hashes that show up in main chain block and a list of all txn hashes that show up in the alternative blocks.
Goal 1: Explore setdiff(alt_txns, main_chain_txns)
to see what exactly goes on in those alt blocks….
Goal 2: Run each of the alt chain transaction hashes through RPC/get_transactions and make sure that we can retrieve the details (key image, ring member) for all alt txns. (This is important to verify soon; IFF alternative transactions are not available through the RPC, we must jump on modifying the patch to include transaction write-out functionality).
It just dawned on me that it is statistically likely that we will encounter heights with 3 versions of a solved block.
Why?
Natural splits due to latency often lead to a single orphaned block. If that was the only source of forks, we expect a split to two versions.
However, we know that there is a second phenomenon; the miner frequently running out 20 - 35 chain blocks. While those artificial side chains are being produced, there will still be the usual benign orphaned blocks.
Once I have values for (frequency of single orphaned blocks) and (frequency of these longer side chains) it will be possible to calculate the statistical frequency of expected triplets.
However, back-of-the-envelope: I suspect that random latency splits occurring at the same time as the artificial side chains is a common event.
Are we prepared for this?
@neptuneresearch , how does the custom daemon handle this
The "timestamp" included in the block by the miner could be spoofed, inaccurate, or not updated.
Should record two fields:
Time received is stored in the filename to second resolution. Several copies of new blocks all seem to roll in together within a second, so we cannot identify latency. Need to record sub-second received time.
NOT the same as the timestamp in the block, which is chosen by the miner, and could be spoofed.
Restarts and syncing can cause a single alternative block to be recorded multiple times in the logs. Use the first timestamp for a given version as its observation date, and ignore subsequent reports of the same alternative block.
Suppose we have block at height H
, which would normally generate a coinbase reward of R(H)
if the block is small and there is no penalty. If there is an oversize block, then the coinbase is reduced. Define P(H)
as the penalty imposed on each block. Total coinbase payout T(H
) is thus:
T(H) = R(H) - P(H)
I'm interested in collect these variables. I'm interested in a histogram of {P}
, which I assume will have a lot of small P(H) = 0
blocks. How often are penalties applied, and what does their distribution look like? (bonus points if 3D distribution showing distribution evolution over time, i.e.
x-axis: time bins
y-axis: P (bins?)
z-axis: counts of penalties applied within that time window.
Even a 2D histogram would be sweet, before jumping into the 3D.
Further, consider the miner's gain (G)
from electing to oversize the block and take the penalty - how well were they compensated? Now we include the total fees the miner collected, F(H)
G(H) = F(H) - P(H)
I'm curious about the frequency of oversized blocks and profitability in those instances.
This project is totally open game. I have zero time to pursue NRL endeavors, at least for the next month. I would love for somebody to tackle this. Could even be a simple Jupyter notebook. Ping @neptuneresearch for data dumps of {height, total fees, block reward, block size}
which I think is all that's necessary for the first steps described above.
During this interview on Bitcoin Uncensored, fluffy comments: "Monero Hash tracks nodes, but they don't track every single node. Its not like they are plowing through nodes like chain analysis would, trying to enumerate them"
Well, that's actually a very interesting idea.... Should be a quick iterative process... Request a peer list from each connected node. Connect to those nodes and request their peer lists (memoize by not repeatedly connecting to already-sampled nodes). Repeat recursively until we know about all of the open nodes.
Occasionally checking the size of the Monero network will be valuable for ascertaining how the number and distribution of nodes impacts other characteristics. This will also help with determining how MAP nodes should be geographically distributed, to match the profile of Monero network activity.
As a speculative side note / secondary analysis... This kind of network mapping is probably already a routine procedure for one or more surveilling entities. Could we turn this idea on its head and analyze our connection history across multiple MAP nodes for evidence of such activity by non-MAP entities? If the scanning party does not take steps intentionally obscure/obfuscate their search pattern, it would be trivial to see their loggers sweep across our archival nodes.
Consider three MAP nodes {A, B, C} configured so that node B is always connected to node A and node C. If a scan is executed without concealing the behavior, what would we expect to see in our combined logs across the MAP network?
13:59:00 Some unknown node X connects to MAP node A
13:59:01 Node X requests peer list from MAP node A
13:59:02 Node X disconnects from A
13:59:03 Node X connects to node B
13:59:04 Node X requests peer list from MAP node B
13:59:05 Node X disconnects from B
13:59:06 Node X connects to node C
13:59:07 Node X requests peer list from MAP node C
13:59:08 Node X disconnects from C
This propagating blip would be strongly suggestive of a network scan in progress
Let's say we have a table of block ID and node receipt timestamps (NRTs) for an archival node.
Height HH:MM:SS
1643586 // 00:00:00, 00:00:04, 00:00:09
1643587 // 00:02:05, 00:02:06, 00:02:11, 00:03:00
1643588 // 00:04:06, 00:05:00, 00:07:05
1643589 // 00:07:04, 00:07:14, 00:07:18, 00:07:21
NRT(H,x)
is the x
th time that we received a copy of block H
How long did it take for somebody to solve block H?
W(H) := NRT(H,first) - NRT(H-1, first)
A histogram of this quantity over many H's tells us about mining activity.
Difference between block's miner timestamp and actual broadcast to network
Maybe call this D for 'delay'
D(H) = NRT(H,first) - MRT(H)
(don't need to specify first or last for MRT since it will be the same in all copies)
A histogram of this quantity over many H's would theoretically provide information about latency etc. However, there is a lot of timestamp spoofing, which becomes the more interesting feature of this histogram
What is the time difference between first and last receipt of a certain block by a given node?
B(H) := NRT(H,last) - NRT(H,first)
What are the implications? What would a histogram of this show us. Essentially, the time envelope for bursts of network activity around block discovery times. This might be an interesting way to heuristically detect a running node by network traffic rates, even if actual content is concealed by VPN, etc.
How many times do we receive a copy of a given block?
C(H) := # of NRT entries for height H
What does this tell us?
How long does it take for a broadcast to propagate across the network to the last node. Use extended notation: NRT(N,H,x)
indicating the timestamp when MAP node N
received the x
th copy of block at height H
Suppose MAP node 'orange' is the first to hear a block, and MAP node 'ginger' is the last to hear about that block. Then we are interested in
G(H) := NRT(ginger, H, first) - NRT(orange, H, first)
More generally,
G(H) := NRT(first node, H, first copy) - NRT(last node, H, first copy)
This would be very interesting for both blocks and transactions - and can be used to estimate the expected number of orphaned blocks due to natural causes.
Plot distribution of ring sizes since January 2017 (and changes over time)
Syncing the blockchain requires patience, and the progress indicator given is not a good indicator of progress.
Synced XXXXX/1625048
provides a fraction completed in terms of 'height', but this is very poorly correlated with how much time is left. This is because early blocks sync very quickly, and later blocks are quite slow.
While different nodes sync at different absolute speeds, based on bandwidth and power, the relative speeds and slowdowns seem perceptually similar. That means that one could make a plot of [fraction of sync time] vs [fraction of sync height].
While this is mostly a novelty, it would be interesting to find the kinks in the plot and mark what they correspond to (e.g. changes in volume, changes in features, etc). This could be useful for studying how scaling of various technologies has performed in practice, relative to theoretical O(*n)
Our project has great need and use for JetBrains CLion, PyCharm and WebStorm. We could probably make use of some of their other tools also.
@Mitchellpkt Can you apply for an open source license for our project?
Snippet from JetBrains open source page:
Open Source Licenses
Get free licenses for JetBrains tools if your non-commercial open source project meets these requirements:
Your project meets the Open Source definition
Your project is at least 3 months old
Your project is actively and regularly developed
You are the project lead or an active committer
Your project is NOT sponsored by a commercial company or organization and does NOT have paid employees
Your project does NOT provide commercial services (such as consulting or training) around the software, and does NOT distribute paid versions of the software
Qualifying open source projects may apply for licenses to the All Products pack, TeamCity, YouTrack, and Upsource
Here's the link to apply: JetBrains open source license request
Just for anecdotal fun.
Sort the b1s data frame by delta_time, and peek at a few of the silly slow blocks.
I think that the necessity for a fixed ringsize is relatively self-evident from a statistical perspective (I fully support fixed ring-size). However, it’d be fun to pull out some proof for good measure. (Only looking at transactions since January 2017, for relevance)
Fishing around for signatures of anomalous behaviors falls right in the ballpark of #noncesense-research-lab :- ) … Seems like a straightforward project to iterate over transactions with non-standard ringsizes and check whether any of their ring members were outputs generated by txns that had unusual ringsizes themselves.
I wouldn’t be surprised if we locate a few chains of transactions with a string of unusually-sized rings surrounded by 7-member decoys. (of course, this can be compared against the background likelihood of selecting a decoy with generated by > 7 ring members).
Wanted to check whether somebody else is already working on this? If not, suggestions for tackling? My default approach would be to use RPC (python wrapper?) to scan through [2017-present] transaction tree, memoize ring size info, then analyze that. Let me know if you have advice for starting points/libraries, better approaches, or prior art.
Suggestion from @neptuneresearch
Right now, the daemon is manually launched by /.monerod-archive --detach (+other args)
This method has no auto-restart, so the nodes are not very resilient.
It would be better if we register the archival daemon as a systemd service (?)
Ideally this would go in the configuration script
If you have any tips for obtaining data on the contents of orphaned blocks, please comment or contact me.
This is crucial for ascertaining whether or not a double-spend attack has ever occurred or been attempted (by checking whether two blocks at the same height contain the same key image spending to a different recipient stealth address)
It is currently unknown whether or not anybody retained these records. Can you find a copy?
Uh oh, it seems like the bitmonero.log
files are not persistent. I should have realized that before.
Given the archival nature of our project, we want to retain monerod
log files, so we have them handy for future analyses.
This issue is being filed as the primary obstacle to analysis addressing issue #28
@serhack @neptuneresearch - I assume there's probably an easy way to mitigate this?
When each copy of a block is received (perhaps multiple copies from multiple nodes), record the node receipt timestamp to milliseconds.
This enables study of latency. What's the timing on the shortest route for a txn/block to arrive at MAP node? What's the timing on the longest route? What is the scale of the time difference? milliseconds? seconds?
Check this to see whether iterative enumeration is even necessary.
As discussed in the "custom ring composition spoils Monero fungibility" wiki, any non-standard algorithm for decoy selection can be used to group transactions that are potentially made by the same wallet or entity. This can be automated by applying unsupervised clustering algorithms on the empirical age distribution of decoys used in real transactions.
In an idea world, where all users and wallets follow the typical decoy selection algorithm, all Monero transactions fall into the same indistinguishable cluster. However, a set of transactions generated with significant deviations away from the norm (e.g. using a uniform selection algorithm) will shift to their own cluster.
The largest cluster(s) with the most members represent the fungible bulk of Monero, and the outlier clusters should be quite interesting to inspect.
Note, it might be useful to try log(age) coordinates as well, to catch signatures on shorter timescales.
Hey all 😁 I see that the discord link on the readme is broke... care to post a new one? 🙏🏼
Record all transaction broadcasts received by the node. This means keeping:
Right now, we retain the Txn hashes, but we need to know:
Check peerlist logs to see if there are multiple nodes that share the same IP address (for instance, multiple users running over the same VPN service).
Why does this matter? Using small round exaggerated numbers: Suppose there are 200 active nodes, and 100 of them are using VPN company X over a small set of IP addresses. If Monero activity through VPN company X is halted (whether intentionally or by accident) this would cause a disproportionately large blow to the network.
It's a quasi-centralization that could cause small points of failure to have a larger impact.
Visualize the wonkiness. Definitely multimodal.
Debian 9 has G_LIBC (package known as libc6-dev) preinstalled version 2.24 .
To upgrade you need to follow these passages. Warning: if you run commands and you don't know what they do, please don't try to upgrade G_LIBC!
Open with any editor the /etc/apt/sources.list
Add the following line deb http://ftp.us.debian.org/debian sid main
Run apt-get update
Let's upgrade LIBC! Run apt-get install libc6-dev
and then wait...
Reopen /etc/apt/sources.list
and comment (with "#") the line you wrote in the step 2.
Enjoy!
NEVER NEVER run a apt-get upgrade
or apt-get full-upgrade
while you are touching /etc/apt/sources.list
. If you upgrade all the packages to "sid", the system can become unstable.
Right now launch_monerod.sh
launches with --log-level 0
I think level 2 gives more output? Is this right?
It will need to be updated in launch_monerod.sh
and the node instruction document.
Expect different clusters, separating different phenomena / players.
Right now data is organized in messy grep'd out log dumps. Need a clean format to use for this project. It seems like each entry receiving a given block should contain:
Suggestions for format? Other ideas for data to include? Thanks!
Do not use "rate" unless referring to quantity per unit time.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.