Can this be made multicore by using many sqlite dbs?

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Comments (80)

jtobey commented on June 12, 2024

I don't think so. I think the initial load could be faster (2-10x) but it will not be so easy.

from bitcoin-abe.

jcrubino commented on June 12, 2024

What are the bottle necks?
Is there anyway to display before the blockchain is completely in the db?

from bitcoin-abe.

jtobey commented on June 12, 2024

The big bottleneck is developer time. :) The design criteria for fast initial load differ from those for incremental update. The result would be a second, parallel implementation of some complex code, and this would create a maintenance burden. Recently, I started working to refactor and simplify the incremental loader. Perhaps this will make a fast initial loader easier to maintain. One could probably gain speed by multithreading at the block level (treating the chain as a queue of blocks), or perhaps by separating the tasks of blockfile parsing, transaction importing, and block importing (including reorganizations and statistics). I could perhaps write this in a day, but unless someone hires me to do it, it must wait its turn on the to-do list. Yes, you can load the database with one process and serve pages with another. The loader process should include the --no-serve option and run either in a loop or via bitcoind's "-blocknotify" feature (or see tools/abe_loader). The server process should have "--datadir=[]" to avoid waiting for the load to finish. This is how I recommend setting up a server.

from bitcoin-abe.

jcrubino commented on June 12, 2024

How about releasing sqlite torrents with known hashes as 1 month intervals... I realize this is asking to take responsibility for this and that is ok but any concerns with this method?

from bitcoin-abe.

jtobey commented on June 12, 2024

I would thank anyone who releases torrents, but I do not have the resources to do this myself.

from bitcoin-abe.

yhaenggi commented on June 12, 2024

i might be able to do this aslong there is demand.

from bitcoin-abe.

dikidera commented on June 12, 2024

@K1773R, please do my friend.

from bitcoin-abe.

fsckin commented on June 12, 2024

I haven't looked at the source yet, but I noticed that the load appears to write and check the status on every iteration of the loop:

SELECT blkfile_number, blkfile_offset
FROM datadir
WHERE dirname = '~/.bitcoin';

Can that be changed to only write and check every 100k iterations? Perhaps a small improvement could be realized there.

from bitcoin-abe.

jtobey commented on June 12, 2024

Thanks, but I think it does this only once per commit. If you use
--commit-bytes=100000 as in the README-* examples, this should be dwarfed by transaction-related
writes.

From: fsckin [email protected]
To: jtobey/bitcoin-abe [email protected]
Cc: jtobey [email protected]
Sent: Friday, April 12, 2013 6:02 PM
Subject: Re: [bitcoin-abe] Initial load is too slow (#15)

I haven't looked at the source yet, but I noticed that the load appears to write and check the status on every iteration of the loop:
SELECT blkfile_number, blkfile_offset
FROM datadir
WHERE dirname = '~/.bitcoin';
Can that be changed to only write and check every 100k iterations? Perhaps a small improvement could be realized there.
—
Reply to this email directly or view it on GitHub.

from bitcoin-abe.

MahdiMontgomery commented on June 12, 2024

@K1773R I also have interest in this.. I can offer a seedbox, but can't provide the databases.

from bitcoin-abe.

JoeMattie commented on June 12, 2024

I would really appreciate this as well

from bitcoin-abe.

jcrubino commented on June 12, 2024

I think we need the bootstrap torrent approach.
It needs a more scalable startup for any future pragmatic uses. Especially small business uses.
I will take donations in bitcoin to run this on AWS to get the first bootstrapped sqlite or db of largest donars choice.
Then set the torrent off from there.

from bitcoin-abe.

slavik0329 commented on June 12, 2024

@jcrubino I agree, this should be done. I am currently at block 214837 and it has been 5 days now. Once I have this loaded, I will gladly distribute my dump using Bittorent Sync

from bitcoin-abe.

jcrubino commented on June 12, 2024

@slavik0329 Is there a way to deterministically create a checksum so your results could be verified by others?

from bitcoin-abe.

dermoth commented on June 12, 2024

@slavik0329 I wouldn't mind giving out my MySQL dump too, however it'll require the
same settings to run on the destination system.

BTW, one thing you could try while loading the block is running the
database (abe itself in case of SQLite) with libeatmydata - it's a
LD_PRELOAD library that fake the OS' sync calls (fsync, O_SYNC, etc.).
This avoid waiting for dirty data to be written down which will
definitely help if you don't have SSD'a or an expensive battery-backed
RAID cache.

On the down side your database may be unable to recover from a system
crash, so you shouldn't use it on a production system unless if you
really know what you are doing.

Thomas

from bitcoin-abe.

jcrubino commented on June 12, 2024

@slavik0329 @dermoth @jtobey @zuijlen @bitcoin-abe
Would a multi core block chain parser be a better solution?
Work on two or three blocks in sliding window fashion across the blockchain?
Use sqlite as an intermediary between parser and end db destination... ?

I think db torrent bootstraps are only a temporary solution to the bigger problem.

from bitcoin-abe.

dermoth commented on June 12, 2024

@jcrubino There is a random block importer in the code - I haven't tested it myself, but if it works well then maybe it would be able to import blocks in parallel - that's assuming it either doesn't deadlock or it restarts gracefully after a deadlock. I'd also strongly suggest testing it with commit after every block - the larger your transactions are the greater the risks of deadlock is, and the bigger the rollbacks will be.

If you have the budget you can just use a database server large enough to keep data in RAM, or keep the data files on SSD for fast data access. It's been a while my cheap workstation has fallen out of specs regarding this, but IIRC back in the times I was still running on SSD load times were quite good.

from bitcoin-abe.

jcrubino commented on June 12, 2024

@dermoth so sequential processing of the block chain is not mandatory? The db updates if the blcok information is newer the the already committed data?

from bitcoin-abe.

dermoth commented on June 12, 2024

@jcrubino I believe so, although I have no idea if the current code & database schema allows this, but it should be fixable at the very least.

You could play with Abe/mixup.py if you like and see for yourself...

from bitcoin-abe.

yhaenggi commented on June 12, 2024

@dermoth there is even a faster way to do it, put the DB your importing on to a tmpfs :) this way you can import the full btc chain in a matter of hours!
once your done, move the DB back to HD and enjoy an up to date Abe

from bitcoin-abe.

dermoth commented on June 12, 2024

@K1773R That's assuming you have enough RAM, but if you do have that much RAM available and disable the sync calls with libeatmydata, then the OS will do its own buffering and all data will be held in RAM instead of read from disk, and dirty cache (unwritten data) will be flushed to disk as disk speed allows (you may need to adjust vm.dirty_background_ratio and vm.dirty_ratio sysctl's depending on your IO speed. InnoDB does that in its own memory pool too but it will flush data to log files (and to doublewrite buffer and files when you're out of space in logs - you may disable the doublewrite during import too..).

Note that at that level your main constraint will be MySQL and Python (Abe) speed, and I'm pretty confident the only way to scale it up (without managing a parallel import) will be getting faster cpu cores (rather than more cores).

from bitcoin-abe.

acejam commented on June 12, 2024

I just finished creating my Abe MySQL DB after about 2-3 days. Here's a torrent that I'm currently seeding to one of my servers. It's about 28 GB in size and was created using mysqldump.

http://goo.gl/h2guUp
Abe - Bitcoin - MySQL - 10/31/2013

from bitcoin-abe.

slavik0329 commented on June 12, 2024

thank you for this acejam

Sent from my iPhone

On Nov 1, 2013, at 12:19 AM, acejam [email protected] wrote:

I just finished creating my Abe MySQL DB after about 2-3 days. Here's a
torrent that I'm currently seeding to one of my servers. It's about 28 GB
in size and was created using mysqldump.

http://acejam.com/files/bitcoin/abe/abe.sql.torrent
Abe - Bitcoin - MySQL - 10/31/2013

—
Reply to this email directly or view it on
GitHubhttps://github.com//issues/15#issuecomment-27546943
.

from bitcoin-abe.

JoeMattie commented on June 12, 2024

Excellent. Will seed forever

On Thu, Oct 31, 2013 at 8:19 PM, acejam [email protected] wrote:

I just finished creating my Abe MySQL DB after about 2-3 days. Here's a
torrent that I'm currently seeding to one of my servers. It's about 28 GB
in size and was created using mysqldump.

http://acejam.com/files/bitcoin/abe/abe.sql.torrent
Abe - Bitcoin - MySQL - 10/31/2013

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-27546943
.

from bitcoin-abe.

MrBlockkette commented on June 12, 2024

Hey guys,
I am currently creating my MYSQL Database with abe, after
3 days i only have about 10% processed. I tried to download
the torrent file from above, but nobody is seeding anymore.
Could anybody please seed again or upload the File anywhere.

THX

from bitcoin-abe.

acejam commented on June 12, 2024

@laejub06 I have increased the seed ratio on my server - try now.

from bitcoin-abe.

MrBlockkette commented on June 12, 2024

now its working. great thank you ;)

from bitcoin-abe.

gclsoft commented on June 12, 2024

How to restore a 27GB sql file database?I use mysql -uroot -p abe <abe.sql,it's days long and nothing console print.I use mysql adminstrator,after a night ,it says "The restoration failed"

from bitcoin-abe.

gclsoft commented on June 12, 2024

Can someone kindly tell me how big the abe is?I used a VPS of 30GB disk,but after two days,I found it was too small for abe.Now I have to choose a new VPS.Is it 60GB disk ok or 80GB disk better?

from bitcoin-abe.

dermoth commented on June 12, 2024

With MySQL, my Abe database is currently 26G, and that's excluding the global InnoDB tablespace, redo logs, etc. (which depends on your specific InnoDB tuning). Note that by default the DB would be much larger, to get to this low usage I used binary storage mode and compressed tables (both are documented in Abe). Binary storage must be enabled before initial import (technically conversion is doable but I won't go there) while compression can be done afterwards, although it's obviously faster to start with compressed tables.

To speed up import/querying I highly recommend having enough ram to hold the entire database in RAM, which is very well possible considering the sizes above. Most of the RAM should be allocated to the InnoDB Buffer pool. It's obviously not a requirement; I have only 2G allocated to my InnoDB Buffer pool (shared with another db) and I get very decent response times for the address balances I used it for. My global InnoDB files are on SSD but I eventually had to move the Abe Database to Raid1 spindles for space reasons (which makes it pretty much Hybrid, although it's backwards, the SSD won't help at all for read access in that scenario).

from bitcoin-abe.

zuijlen commented on June 12, 2024

Within abe there is also the no-statistics branch and not to save/show all
the transactions, only the current BTC value of all addresses.
This saves a lot of space (this is why I used Abe on my server).
Please read: no-statics branch abe.conf default-trim-depth setting

default-trim-depth and datadir's "trim_depth" enable a major

space-saving optimization for sites that do not require full history

or firstbits support.

...

A typical value would be 40. This reduces the database

size by about 50-70% (XXX untested) for chains with many "change"

addresses like Bitcoin.

from bitcoin-abe.

vodkadrunkinski commented on June 12, 2024

Look into using TokuDB (either as the tokutek package or Mariadb 10.x) if
you want to save space. You can convert innodb directly to it.
On Jan 14, 2014 1:06 PM, "Thomas Guyot-Sionnest" [email protected]
wrote:

With MySQL, my Abe database is currently 26G, and that's excluding the
global InnoDB tablespace, redo logs, etc. (which depends on your specific
InnoDB tuning). Note that by default the DB would be much larger, to get to
this low usage I used binary storage mode and compressed tables (both are
documented in Abe). Binary storage must be enabled before initial import
(technically conversion is doable but I won't go there) while compression
can be done afterwards, although it's obviously faster to start with
compressed tables.

To speed up import/querying I highly recommend having enough ram to hold
the entire database in RAM, which is very well possible considering the
sizes above. Most of the RAM should be allocated to the InnoDB Buffer pool.
It's obviously not a requirement; I have only 2G allocated to my InnoDB
Buffer pool (shared with another db) and I get very decent response times
for the address balances I used it for. My global InnoDB files are on SSD
but I eventually had to move the Abe Database to Raid1 spindles for space
reasons (which makes it pretty much Hybrid, although it's backwards, the
SSD won't help at all for read access in that scenario).

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-32239147
.

from bitcoin-abe.

robertjmoore commented on June 12, 2024

There are no seeds on that torrent... can anyone be so kind as to seed for me? I'll keep this seeded once I have it.

from bitcoin-abe.

acejam commented on June 12, 2024

@robertjmoore I'm still actively seeding my torrent - not sure why you can't connect. Looks like I've transferred about 2.5 TB over the last 2 months

from bitcoin-abe.

robertjmoore commented on June 12, 2024

Hmm ok thanks, I'll try a few things on my end.

Bob

On Mon, Jan 20, 2014 at 5:17 PM, acejam [email protected] wrote:

@robertjmoore https://github.com/robertjmoore I'm still actively
seeding my torrent - not sure why you can't connect. Looks like I've
transferred about 2.5 TB over the last 2 months

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-32802471
.

from bitcoin-abe.

jcrubino commented on June 12, 2024

Can we release a docker image with abe + torrent etl to mysql for this
service?

On Mon, Jan 20, 2014 at 4:19 PM, robertjmoore [email protected]:

Hmm ok thanks, I'll try a few things on my end.

Robert J. Moore
RJMetrics
[email protected]
http://www.rjmetrics.com
(856) 308-9661
Twitter: @RJMetrics

On Mon, Jan 20, 2014 at 5:17 PM, acejam [email protected] wrote:

@robertjmoore https://github.com/robertjmoore I'm still actively
seeding my torrent - not sure why you can't connect. Looks like I've
transferred about 2.5 TB over the last 2 months

—
Reply to this email directly or view it on GitHub<
https://github.com/bitcoin-abe/bitcoin-abe/issues/15#issuecomment-32802471>

.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-32802573
.

from bitcoin-abe.

lobas commented on June 12, 2024

Can someone upload to Mega.co.nz or something

from bitcoin-abe.

lobas commented on June 12, 2024

Left this running for so long now, its super duper slow!

Bitcoin/blocks/blk00063.dat
block_tx 240087 18799454

Still just under HALF way through!

Im using a 14gb i3 mac not exactly slow when its running just this task!

from bitcoin-abe.

easmith commented on June 12, 2024

I downloaded the abe.sql. Now import into the database. Has passed 100 hours!
How much longer will this process go? How quickly import new block? Hopefully in less than 10 minutes?

from bitcoin-abe.

lobas commented on June 12, 2024

where have u downloaded abe.sql from ?

from bitcoin-abe.

easmith commented on June 12, 2024

@lobas, Form @acejam server http://acejam.com/files/bitcoin/abe/abe.sql.torrent

from bitcoin-abe.

acejam commented on June 12, 2024

I have uploaded a new copy of this torrent. The latest database goes up to block #287852 from 02/26/2014.

http://acejam.com/files/bitcoin/abe/abe_block_287852_02_26_2014.sql.torrent

from bitcoin-abe.

acejam commented on June 12, 2024

Also, it seems like many users are experiencing slow import times. Unfortunately this is what happens when using a database of this size. I am using an SSD which helps dramatically, but not everyone has this resource. If I created a Docker/Vagrant image that people could download with the MySQL database already populated and running, would that be preferred? The only issue with this approach is the 37 GB .sql file turns into about 80 GB on disk once it's populated into database tables.

from bitcoin-abe.

zzif commented on June 12, 2024

An another database dump (updated 2014-02-27):
ftp://ftp.netinch.com/temp/bitcoin-abe/abe.sql.gz

from bitcoin-abe.

edwardteach42 commented on June 12, 2024

Did you happen to use insert ignore on this export?

from bitcoin-abe.

oinquer commented on June 12, 2024

i really didnt look at anything in code, but looking at my disk usage and CPU...
its consuming 6-10% of a core i3 when normally python goes all the way to 25%
and in SSD im seeing 30MB/s writting and 1600 IOPS....
the system is IDLING while processing blocks...anyone knows why?

from bitcoin-abe.

goastler commented on June 12, 2024

How big is the database itself? I got a hold of the mysql dump from zzif above, but at 89GB my drive has run out of space. Is the database much larger than 89GB? Someone above said it was only 39GB and acejam said an image would be around 80GB.

from bitcoin-abe.

acejam commented on June 12, 2024

The .sql file contained within the torrent was 37GB. However this is simply
an exported version of the DB. (A raw dump) Once you import this SQL file,
the resulting DB will be over 80GB in size.

On Monday, March 31, 2014, goastler [email protected] wrote:

How big is the database itself? I got a hold of the mysql dump from zzif
above, but at 89GB my drive has run out of space. Is the database much
larger than 89GB? Someone above said it was only 39GB and acejam said an
image would be around 80GB.

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-39105947
.

from bitcoin-abe.

jtobey commented on June 12, 2024

Bear in mind that the blockchain is growing quickly. Abe takes 4-5x the raw block size, a little less if the dump uses keep-scriptsig=false and maybe binary-type=binary.

from bitcoin-abe.

goastler commented on June 12, 2024

Okay, so as of today what would you estimate the database size to be? I've got to decide which drive to put it on.

from bitcoin-abe.

zuijlen commented on June 12, 2024

There is also the possibility to only store the current state of addresses
without the history.
This option saves a lot of space.
Look for the other branch.
Op 31 mrt. 2014 10:30 schreef "goastler" [email protected]:

Okay, so as of today what would you estimate the database size to be? I've
got to decide which drive to put it on.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-39116601
.

from bitcoin-abe.

goastler commented on June 12, 2024

That would be more helpful for what I'm doing. Which other branch is it? (I'm new to github, i.e. this is my third ever comment)

from bitcoin-abe.

edwardteach42 commented on June 12, 2024

I have a mysql database of the no-statistics branch that has the blockchain
parsed up to block 292k with a trim depth of 40. If anyone can provide me a
place to upload it is about 12GB uncompressed and I could probably gzip it
to half that.

On Mon, Mar 31, 2014 at 11:41 AM, goastler [email protected] wrote:

That would be more helpful for what I'm doing. Which other branch is it?
(I'm new to github, i.e. this is my third ever comment)

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-39125210
.

from bitcoin-abe.

goastler commented on June 12, 2024

Why not torrent it?

from bitcoin-abe.

acejam commented on June 12, 2024

I don't want to be "that guy", but I've stopped using this project. It's just too slow and clunky. I highly suggest everyone checkout https://github.com/bitpay/insight. I have a running copy of it on my server here: http://acejam.com:3000/

The storage requirement is much less (30GB), and parsing the blockchain is significantly faster.

from bitcoin-abe.

edwardteach42 commented on June 12, 2024

Connection at my house is too slow for torrents due to Time Warner
restrictions. I can however upload over HTTP/FTP at about 5Mbps.

On Mon, Mar 31, 2014 at 11:53 AM, goastler [email protected] wrote:

Why not torrent it?

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-39126636
.

from bitcoin-abe.

goastler commented on June 12, 2024

Gunzip and upload to cloud storage?

from bitcoin-abe.

edwardteach42 commented on June 12, 2024

Yeah I just had to upgrade my dropbox... Will post a link as soon as I have
it.

On Mon, Mar 31, 2014 at 12:05 PM, goastler [email protected] wrote:

Gunzip and upload to cloud storage?

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-39128014
.

from bitcoin-abe.

goastler commented on June 12, 2024

Okay Cheers :)

from bitcoin-abe.

jtobey commented on June 12, 2024

@acejam Thanks for the tip, Insight looks great!

from bitcoin-abe.

acejam commented on June 12, 2024

No problem - it really is. The demo running on my server is literally just
checked out and running - I've made no changes to it.

On Monday, March 31, 2014, jtobey [email protected] wrote:

@acejam https://github.com/acejam Thanks for the tip, Insight looks
great!

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-39128925
.

from bitcoin-abe.

goastler commented on June 12, 2024

@jeffmthomas How's the upload coming along? :)

from bitcoin-abe.

edwardteach42 commented on June 12, 2024

It is still uploading. I have been downloading a 3.2TB file from one of my
clients so it is slowing my upload speed.

On Wed, Apr 2, 2014 at 4:45 PM, goastler [email protected] wrote:

@jeffmthomas https://github.com/jeffmthomas How's the upload coming
along? :)

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-39397878
.

from bitcoin-abe.

vodkadrunkinski commented on June 12, 2024

My suggestion is to switch to TokuDB for the database, then import the data
as csv using their bulk loader. I've used TokuDB for another Abe install
and it works fine. Also, you get to save space, a win win.

-Alex
On Apr 3, 2014 7:49 AM, "Jeff Thomas" [email protected] wrote:

It is still uploading. I have been downloading a 3.2TB file from one of my
clients so it is slowing my upload speed.

On Wed, Apr 2, 2014 at 4:45 PM, goastler [email protected] wrote:

@jeffmthomas https://github.com/jeffmthomas How's the upload coming
along? :)

Reply to this email directly or view it on GitHub<
#15 (comment)

.

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-39398116
.

from bitcoin-abe.

jtobey commented on June 12, 2024

On branch small, I have pushed a version of Abe that does not load transactions into the database but indexes them and reads them from blockfiles on demand. Testing suggests that the database will be about 70% smaller and load 5x faster. The implementation is incomplete: it doesn't support RPC loading and some APIs, but I expect those features to be relatively easy.

from bitcoin-abe.

jcrubino commented on June 12, 2024

@jtobey thanks for the putting that together.

from bitcoin-abe.

jcrubino commented on June 12, 2024

@acejam is there a public api for your insight instance?

from bitcoin-abe.

acejam commented on June 12, 2024

@jcrubino The instance I have up and running currently is already public. However I'd like to spin up a new instance which has higher availability as a service for the rest of the community to use. (perhaps as an alternative to blockchain.info)

You can find some of the API documentation here: https://github.com/bitpay/insight-api/

from bitcoin-abe.

jl2035 commented on June 12, 2024

so insted of finding a solution everybody just started using insight?
well, insight doesn't work these days.. :(

i'm facing this issue on two different servers: bitpay/insight#585

from bitcoin-abe.

dermoth commented on June 12, 2024

Hey there.... Since last week I've been playing with dumping and loading Abe database as well as trying out TokuDB (for the heads up I've been running a binary InnoDB Abe with compressed tables since I added support for Abe's binary storage on MySQL).

A database dump is quite fast (hours, mostly waiting on xz compression) even with InnoDB, and with xz -e9 compression I got the full SQL dump up to block 339967 shrunk down to 20G (I used xz-5.2.0 with multi-threaded encoding to achieve decent speed). However when it came to loading the DB it was going to take days and was IO-bound. Let me know if you're interested in a dump file; I would probably re-create it from Toku anyway...) The dump command was:

mysqldump --databases --opt [ --no-autocommit ] -uroot -p --routines --single-transaction abe | xz -9e -M0 -T<num_threads> -c >abe_db_<block#>.sql.xz

NB: I put hooks around --no-autocommit; I'm not sure whenever it help at all (definitely won't help TokuDB with tokudb_commit_sync=OFF) and it's a real pain to recover from if your load gets interrupted in the middle of a huge table. Also make sure you have enough RAM for xz when using threads!

Then I tried loading it to the official Maria/TokuDB build from TokuTek. Even though I wasn't using the quirks of LOAD DATA INFILE, the load was awesomely fast - finished in less than 5 hours with the only tuning being tokudb_commit_sync=OFF. The load command used sed to rename the database, engines and remove compression stuff (TokuDB has global compression settings):

xz -dc -M0 -T4 abe_db_339967.sql.xz |sed -r 's/^(CREATE.+|USE.+)`abe`/\1`abe_test`/;s/ENGINE=InnoDB/ENGINE=TokuDB/;s/ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=[0-9]+//' | /opt/mariadb/bin/mysql --defaults-file=/etc/maria.cnf -uroot -p

The InnoDB database with compressed tables was taking up 53GiB (not counting InnoDB logfiles, doublewrite buffer, redo space...). TokuDB (looking at _abe_test_* files) uses 41GiB. This is pretty good considering Bitcoin's own datatir is 35GiB!

Obviously quick restore form backups is good, but I was also concerned about the speed of normal loads. Keeping in mind that we're merely loading public data and can resume form anywhere, I kept commit_sync off for this test as well.

Loading into a new DB (blkfile loader) first threw these warnings which doesn't appear to be any issue:

Abe/SqlAbstraction.py:468: Warning: Converting column 'a' from VARCHAR to TEXT
  sql.cursor().execute(stmt)
Abe/SqlAbstraction.py:468: Warning: Converting column 'b' from VARCHAR to TEXT
  sql.cursor().execute(stmt)
Abe/SqlAbstraction.py:468: Warning: Converting column 'txout_scriptPubKey' from VARBINARY to BLOB
  sql.cursor().execute(stmt)
Abe/SqlAbstraction.py:468: Warning: Converting column 'txin_scriptSig' from VARBINARY to BLOB
  sql.cursor().execute(stmt)

The initial blocks without any transactions were loading at an incredible rate, so I waited a bit to see the actual blocks with transactions. When it started slowing down I noticed some issues with the Maria or TokuDB optimizer (I kept the InnoDB database on Debian's stock MySQL 5.5.40 so I can't compare with InnoDB on Maria), and worked around them using a patch to Abe (which i'm not sure can even make it upstream as I have no idea if it's even standard SQL nor any possible negative effect on non-MySQL databases). Once optimized, I got this rate (as of this writing, since i'm not yet done with the load):

Block time  Height
2012-12-27 01:26:58 213812
Last:  35.00/min    Avg60:  28.02/min

This shows the block rate for the last minute and average over the last hour - that script was too simple to compute tx count/avg, but looking at the table that is 521544 transactions loaded in that hour (8692.4 tx / minute average, 310.0737 tx / block average for the sampled data). I think it's a good improvement over InnoDB but I'll have to wait until it comes to the more recent blocks with many more tx/block to confirm TokuDB vs InnoDB speed...

Also interesting is TokuDB is far form being IO-bound - the writes are sparse (about one quick flush every minute) and reads are almost non-existent. With this kind of load TokuDB appears to be mostly CPU-bound, but since I've seen it uses all 4 cores on my workstation during the SQL dump load I think there's a lot of room for speedups if we could implement parallelism in the block loader.

The TokuDB patches are available on my fork, TokuDB branch (keep in mind it'll "upgrade" the DB and will require some tinkering if you want to switch back to mainline Abe! - if you skip the upgrade block load will be back but displaying loock TX will ve expremely slow!):

https://github.com/dermoth/bitcoin-abe/tree/TokuDB

Note that for my tests, both InnoDB and TokuDB were set with a 2GiB Buffer Pool/Cache and I'm not using DirectIO. I'm using two very cheap 2TB SATA disks in md RAID1, Linux 3.x.

Regards,

Thomas

EDIT: The first 50k block load at over 9000 block/minute, and that is in parallel with the other load (Same DB server/disks, diff DB) and with scriptSig enabled...

from bitcoin-abe.

dermoth commented on June 12, 2024

More on TokuDB testing... Please scratch all I've said about forcing indexes, it appears all I needed was to run ANALYZE TABLE to rebuild cardinality statistics!

There was clearly something broken, but I wasn't sure if it was some modified TokuDB optimizer or MySQL/Maria itself. Now I think I understand better the root of the issue... When the indexes are created, it looks like TokuDB fails to update cardinality on some index, which prevent them from being used properly in later queries. See for example this example on the very begining of database load (just a few thousand rows...):

MariaDB [abe_toku]> SHOW INDEX FROM txout;
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| txout |          0 | PRIMARY        |            1 | txout_id    | A         |        1712 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          0 | tx_id          |            1 | tx_id       | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
| txout |          0 | tx_id          |            2 | txout_pos   | A         |        1712 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          1 | x_txout_pubkey |            1 | pubkey_id   | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

MariaDB [abe_toku]> SHOW INDEX FROM txout;
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| txout |          0 | PRIMARY        |            1 | txout_id    | A         |        2195 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          0 | tx_id          |            1 | tx_id       | A         |        NULL |     NULL | NULL   |      | BTREE      |         |               |
| txout |          0 | tx_id          |            2 | txout_pos   | A         |        2195 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          1 | x_txout_pubkey |            1 | pubkey_id   | A         |        NULL |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

MariaDB [abe_toku]> ANALYZE TABLE `txout`;
+----------------+---------+----------+----------+
| Table          | Op      | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| abe_toku.txout | analyze | status   | OK       |
+----------------+---------+----------+----------+
1 row in set (0.01 sec)

MariaDB [abe_toku]> MariaDB [abe_toku]> SHOW INDEX FROM txout;
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| txout |          0 | PRIMARY        |            1 | txout_id    | A         |        2856 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          0 | tx_id          |            1 | tx_id       | A         |        2856 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          0 | tx_id          |            2 | txout_pos   | A         |        2856 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          1 | x_txout_pubkey |            1 | pubkey_id   | A         |        2856 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

MariaDB [abe_toku]> SHOW INDEX FROM txout;
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| txout |          0 | PRIMARY        |            1 | txout_id    | A         |        3056 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          0 | tx_id          |            1 | tx_id       | A         |        3056 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          0 | tx_id          |            2 | txout_pos   | A         |        3056 |     NULL | NULL   |      | BTREE      |         |               |
| txout |          1 | x_txout_pubkey |            1 | pubkey_id   | A         |        3056 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)

MariaDB [abe_toku]>

You can see the data being updated in real-time, but only after the analyze all keys gets updated. This oneliner analyzes all tables:

mysqldump -uabe -ppassword --databases abe --no-data | grep '^CREATE TABLE' | sed -r 's/^CREATE/ANALYZE/;s/ \($/;/' | mysql -uabe -ppassword abe

FWIW, doing the analyze of all tables on the fully imported database (from SQL backup, which was showing the same issue as the one being imported from scratch) took less than a minute!

from bitcoin-abe.

bkoffler commented on June 12, 2024

I have a simplistic question but i need to ask it nonetheless. Putting the speed of importing the blockchain into my database (MySQL) to the side, assuming i used this program to complete this task, what are the steps involved in keeping my database updated while my local blockchain updates? Again, i'm sure this is not too complicated of task but i am about to start on a new project and keeping the database up to date is important.

from bitcoin-abe.

dermoth commented on June 12, 2024

There are basically two ways. Either let the Abe "webserver" check for update and perform them on each request. This is the simplest form, and I personally don't like it as it hangs the server whenever a block has to be loaded. Also I'm not too sure about support for this method on FastCGI or other pooled server methods (I never used them as the only thing I ever used Abe with was a single-threaded script making API requests one after the other).

The second method is running a block loader on schedule. This lets you load the new blocks as they come in without affecting the webserver requests. There is a script in Abe's repo, tools/abe_loader, which, in addition to running on a schedule, also monitors Bitcoin's logfiles to detect new blocks and launch the loader immediately.

As for the speed, I didn't monitor the loader speed on InnoDB yet (I can have a look later) but right now on Maria/TokuDB it's loading 7 blocks/min for the 60min average (less than 10 seconds per block). This is loading December 2014's blocks, and I have very cheap rotational disks (RAID1) and only 2G TokuDB cache.

There are two options to disable web serving and loading blocks respectively: --no-serve and --no-load. I normally run one of each (with --no-serve is quits after loading so you have to schedule it or use the loader tool mentioned above). With both options it merely import a blank database (if none exists) or perform upgrades if --upgrade is specified.

from bitcoin-abe.

dermoth commented on June 12, 2024

For the heads up there is now a 3rd way to load, using rpc-load-mempool. This is like the normal loader, but it will keep running as long as bitcoind and the database is up. It requires using RPC to load the blocks, and after loading all blocks it will start importing mempool transactions until the next block shows up.

This method has the added benefit of faster block loading since in most cases only a few missed transaction will need to be inserted before the new block can be committed to the database (overall it's still the same work, but spread more evenly between blocks).

from bitcoin-abe.

Rsalazarm commented on June 12, 2024

Hello guys! Quick question: after you experience loading the data in the database.... any parameter to consider useful in order to improve the initial data load? I am using TokuDB and the initial load is really too slow.... More than 1 week running the initial load process and loading now the blk00245.dat file! And with a 4core processor with 12Gb of memory and SSD disks (Proliant ML110 G7 server).

Thanks in advance!

from bitcoin-abe.

dermoth commented on June 12, 2024

TokuDB was very fast initially, but as database grew it became pretty slow... A few months ago I had to drop it as I was no longer able to keep it up (it was running it in parallel with InnoDB and BitcoinCore on the same disks). There's still some nice features with TokuDB... Whichever you use make sure you allocate as much memory as possible to these engines.

What is it being slow on? Is Abe/MySQL blocked on IO or running at 100% CPU? One trick you could use to help with IO is running MySQL with libeatmydata for the time of the import (note though that any panic/powerfail during import with this preload lib will likely break the database beyond repair).

And keep in mind in about 600 blocks segwit will activale and I haven't yet finished segwit support for Abe... It will requires adding columns to tx and txin (txin is the largest Abe table; its SQL dump I did last week was 104G with +600M rows)

from bitcoin-abe.

Rsalazarm commented on June 12, 2024

MMMM... I think the machine is running smoothly... See below top, iostat and iotop and my.cnf file.

Do you recommend use InnoDB in place of TokuDB? Do you know if is possible to migrate from TokuDB to InnoDB without start from scratch?

top:

top - 17:51:07 up 3:30, 1 user, load average: 1,45, 1,49, 1,32
Tasks: 217 total, 2 running, 214 sleeping, 0 stopped, 1 zombie
%Cpu(s): 37,4 us, 3,6 sy, 0,0 ni, 57,3 id, 1,7 wa, 0,0 hi, 0,1 si, 0,0 st
KiB Mem : 12256188 total, 177212 free, 10927228 used, 1151748 buff/cache
KiB Swap: 97653760 total, 97653696 free, 64 used. 950820 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
905 mysql 20 0 11,212g 9,068g 5740 S 125,3 77,6 148:10.09 mysqld
3039 rsalazar 20 0 500540 96712 6840 R 29,7 0,8 70:51.56 python
1827 rsalazar 9 -11 355788 5824 4284 S 2,0 0,0 0:03.07 pulseaudio
14678 rsalazar 20 0 2057928 274916 115324 S 2,0 2,2 0:16.66 Web Content
1693 rsalazar 20 0 1707276 314796 37460 S 1,7 2,6 14:36.43 compiz
872 root 20 0 558776 129004 24584 S 1,3 1,1 3:19.14 Xorg
51 root 20 0 0 0 0 S 0,3 0,0 0:32.08 kswapd0
746 root 0 -20 0 0 0 S 0,3 0,0 0:18.32 kworker/3:+
2919 rsalazar 20 0 667232 23720 13192 S 0,3 0,2 0:45.60 gnome-term+
14531 rsalazar 20 0 2337072 260792 117876 S 0,3 2,1 0:13.17 firefox
14675 root 20 0 0 0 0 S 0,3 0,0 0:00.95 kworker/2:0
14872 root 20 0 0 0 0 S 0,3 0,0 0:00.42 kworker/1:2
14922 rsalazar 20 0 43020 3728 3160 R 0,3 0,0 0:00.08 top
14945 root 20 0 0 0 0 S 0,3 0,0 0:00.13 kworker/3:1
14958 root 20 0 0 0 0 S 0,3 0,0 0:00.05 kworker/0:0
1 root 20 0 185236 4640 2760 S 0,0 0,0 0:03.22 systemd
2 root 20 0 0 0 0 S 0,0 0,0 0:00.00 kthreadd

iostat -y 5:

rsalazar@Proliant:~$ iostat -y 5
Linux 4.10.0-32-generic (Proliant) 19/08/17 x86_64 (4 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
27,05 0,00 3,39 1,26 0,00 68,30

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 294,00 11836,00 0,00 59180 0
sdb 0,00 0,00 0,00 0 0

avg-cpu: %user %nice %system %iowait %steal %idle
24,19 0,00 3,18 1,57 0,00 71,06

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 388,80 14272,00 63,20 71360 316
sdb 0,00 0,00 0,00 0 0

avg-cpu: %user %nice %system %iowait %steal %idle
27,42 0,00 3,64 1,62 0,00 67,32

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 337,60 12642,40 2927,20 63212 14636
sdb 0,00 0,00 0,00 0 0

IOTOP:
Total DISK READ : 12.51 M/s | Total DISK WRITE : 1626.53 B/s
Actual DISK READ: 12.47 M/s | Actual DISK WRITE: 78.63 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
3050 be/4 mysql 12.24 M/s 0.00 B/s 0.00 % 6.28 % mysqld --mysqld.pid
14985 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.06 % [kworker/u128:1]
221 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.03 % [jbd2/sda1-8]
996 be/4 mysql 192.99 K/s 0.00 B/s 0.00 % 0.02 % mysqld --mysqld.pid
2919 be/4 rsalazar 3.18 K/s 1626.53 B/s 0.00 % 0.00 % gnome-ternal-server
3039 be/4 rsalazar 76.24 K/s 0.00 B/s 0.00 % 0.00 % python -m--no-serve

my.cnf file:
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/percona-server.conf.d/
[mysqld]
sql_mode=NO_ENGINE_SUBSTITUTION
query_cache_size=0
query_cache_size=0
query_cache_limit=10M
thread_pool_size=36
innodb_log_file_size=16M
tmp_table_size= 768M
max_heap_table_size= 768M
#open_files_limit = 8192
#max_connections = 1000
slow-query-log = 1
slow-query-log-file = /var/log/mysql/mysql-slow.log
long_query_time = 1
tokudb_cache_size = 8G
#tokudb_commit_sync = 0
#tokudb_fsync_log_period = 1000

Thanks for your help and support!!

from bitcoin-abe.

aekramer commented on June 12, 2024

@jtobey

One could probably gain speed by multithreading at the block level (treating the chain as a queue of blocks), or perhaps by separating the tasks of blockfile parsing, transaction importing, and block importing (including reorganizations and statistics). I could perhaps write this in a day, but unless someone hires me to do it, it must wait its turn on the to-do list.

Did you ever get around to crossing this off your to-do? B)

I am currently looking to do something similar for postgres, as each connection uses a seperate core and it loads a bunch first and then commits. I was hoping to be able to run multiple loaders, that have their own db connection.

from bitcoin-abe.

lsb commented on June 12, 2024

I've been getting reasonable performance on a large-ish machine using postgres, started up via docker run -d --name btcpg --net host -v /mnt/jumbo/bitcoin-abe/pgdata:/var/lib/postgresql/data postgres:11-alpine -c shared_buffers=40GB -c synchronous_commit=off -c wal_writer_delay=10000
(where /mnt/jumbo is a disk with over 1TB of free space, and 40GB is under the available memory)

from bitcoin-abe.

cercos commented on June 12, 2024

~~Anyone ever figure a way on speeding up the sync times? I know this is old but at this rate it may take about 5 years for it to sync.~~

I have migrated the mysql database to a SSD and moved the bitcoin folder to the same drive and it is strapped to a rocket now with sync times.

from bitcoin-abe.

Initial load is too slow about bitcoin-abe HOT 80 OPEN

Comments (80)

default-trim-depth and datadir's "trim_depth" enable a major

space-saving optimization for sites that do not require full history

or firstbits support.

A typical value would be 40. This reduces the database

size by about 50-70% (XXX untested) for chains with many "change"

addresses like Bitcoin.

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent