Coder Social home page Coder Social logo

Question about scaling about tileserver HOT 11 CLOSED

rwrx avatar rwrx commented on July 1, 2024
Question about scaling

from tileserver.

Comments (11)

zerebubuth avatar zerebubuth commented on July 1, 2024

Hi! Thanks for getting in touch.

It certainly sounds like the second machine should be considerably faster, but there are also a lot of potential bottlenecks that could be slowing it down.

Although the process of generating a single tile with tileserver / tilequeue is single-threaded, generating large numbers of tiles should see a good parallel speed-up. This means that you might need to request more tiles concurrently to see a bigger speed-up.

The first thing I'd check is whether the tileserver processes are able to use all the CPUs. Under full load, is it maxing out all the CPUs? On my computer, running top often shows me that processes are using only a single CPU. When running tileserver on production servers, I'd recommend doing it through Gunicorn with as many workers as CPU cores. For example, in the tileserver Dockerfile it configures 5 workers. In our (now defunct) Chef recipe, we configured twice the number of CPUs plus one as the number of workers.

The second thing I'd check is whether the database is the bottleneck. With an NVMe-based system with 32GB RAM, it seems unlikely, but if you're using the default PostgreSQL configuration then it might be performing poorly. A bottleneck that I've run into before is not having max_connections set high enough - I think the default is only 100, which is probably plenty for most installations, but I'd set it to 3,000. There are many other settings which have an important effect on performance, particularly on large machines.

If the database is on a different machine from tileserver, then it's also worth checking whether the network is the bottleneck. We've run into it several times when our queries were returning much more data than we realised. If it's on the same machine then I think the UNIX socket bandwidth is effectively infinite.

Because we had difficulties scaling out the database to support a few thousand rendering machines, we've since moved to high-zoom tiles being rendered from "RAWR tiles", which are "shards" of the database serialized to files, so we can scale out high-zoom rendering without querying the database.

Hope that helps!

from tileserver.

rwrx avatar rwrx commented on July 1, 2024

Thank you a lot for all of your suggestions. I am still finding why it is even slower. I suspect that I have wrongly set PostgreSQL config file so it does not fully utilize all the cores. Do you know some hints how to setup PostgreSQL to utilize many CPU cores not just 4? Also that new CPU is Threadripper 1920X which has two CPU dies glued together, so maybe PostgreSQL cannot utilize this properly.

from tileserver.

zerebubuth avatar zerebubuth commented on July 1, 2024

If you are running the most up to date version of PostgreSQL (10), then you can use the max_worker_processes and max_parallel_workers settings to control the concurrency level. In previous versions, I think it just used one thread per connection. However, since tileserver supports multiple concurrent requests, it's still possible to use all the cores by issuing more concurrent requests. What are you using to request the tiles which go into the mbtiles?

My experience has been that PostgreSQL bottlenecks have usually been on the disk or RAM, not the CPU. The most important settings for this are shared_buffers, which I think should be around 30-50% of the available RAM on the machine, and random_page_cost, which I think should be set to between 1.5 and 2.0 for a machine with a solid-state disk. My experience has been that both of these default to values which are OK for most servers, but inappropriate for high-end modern servers.

from tileserver.

rwrx avatar rwrx commented on July 1, 2024

Hi @zerebubuth I am sorry for this very late answer. I was trying anything. For requesting tiles I am using tilezen tilepacks modified to download tiles for countries defined in geojson. I have tried to tune PosgreSQL config file using this tool - https://pgtune.leopard.in.ua, I have entered there system specifications. I have tried several other CPUs best results was using Intel i7-8700K which has 6 cores and very high boost frequency which means very good single core performance. It had better results when I disable Hyper Threading. I have also tried another Intel CPU - Intel i7-7820X which has 8 cores but much less boost frequency so it has lower single core performance as i7-8700K. And this suprinsigly Intel i7-7820X was slower than Intel i7-8700K and Threadripper 1920X which has 12 cores was slowest. To me it seems that there is performance degradation when there are more CPU cores and more important is single core performance. I have also enabled on Threadripper 1920X only 4 cores and it was slower than using 12 cores only in a very small amount. I have set up benchmark for Slovakia tiles in zooms 0 - 14. I also get better results when I set concurrency in tilepack builder to be exact same number as number of cores. By default there is number of cores times 8. However for example for Alaska tiles zooms 0 - 14 it was faster when number of cores was multiplied by 8.

I am not really sure. I hope that I was doing something wrong and this Threadripper CPU could be way faster than these other CPUs. Do you have any thoughts? Or did you do some performance testing too?

from tileserver.

zerebubuth avatar zerebubuth commented on July 1, 2024

Hi! No worries.

The behaviour you describe is not what I would expect at all! I would expect the TR 1920X to be almost twice as fast as the i7-8700K. It certainly sounds like the rendering process is being limited to a single core.

Would you be able to share a screenshot of top on that machine during a rendering run, please? I suspect we might see tileserver maxing out at 100% on a single core, with the other cores mostly idle.

from tileserver.

rwrx avatar rwrx commented on July 1, 2024

Hi, ok, I am attaching screenshots. This is after 3 minutes of start of tileserver and tilepacks. Also both are on the same pc.

top:
top

glances:
glances

from tileserver.

zerebubuth avatar zerebubuth commented on July 1, 2024

Thanks for that! It looks like the tileserver process is only using about 3 cores out of the available 12 - is that right?

Would you be able to try again, but this time run gunicorn -w 12 "tileserver:wsgi_server('config.yaml')" rather than python tileserver/__init__.py config.yaml? If you run tilepacks with a concurrency level of 12 against that server, then hopefully we'll see that all the cores are being used in top or glances.

from tileserver.

rwrx avatar rwrx commented on July 1, 2024

Great! Thank you a lot! It just worked. Now all 12 cores are maxed to 100%. This should be a default way how to run tileserver described in vectordatasource tutorial. I have somewhere read about gunicorn, but I didn't know how to run it.

from tileserver.

nvkelso avatar nvkelso commented on July 1, 2024

from tileserver.

rwrx avatar rwrx commented on July 1, 2024

Ok, I have created PR with added gunicorn into dependencies and also edited the wiki.

from tileserver.

rwrx avatar rwrx commented on July 1, 2024

@zerebubuth thank you a lot for your help and assistance :).

from tileserver.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.