Coder Social home page Coder Social logo

Comments (3)

nothingface0 avatar nothingface0 commented on July 28, 2024 2

Solution

A FilePathField was used for the HistogramDataFile model. This field, given a directory, will iterate all files in order to populate possible values. I believe that this is part of its validation, which DRF uses every time it serializes its data.

As this path was pointing to EOS, Django, when trying to populate a list of all available files in the specified directory, took somewhere between 20s and 3 minutes (depending on EOS load, I guess) for every API query. This explains why, for a single results page (50 results, serialized with the ListModelMixin), the first serialization took so long (fetched a list of the files), while the consequent serializations were so fast (just validated that the existing filepath existed in the list of files).

Making this field a CharField automatically fixed the problem.

💀

Moral of the story

Don't use FilePathField for EOS directories with lots of files, if you're going to serialize them with DRF

from mlplayground.

nothingface0 avatar nothingface0 commented on July 28, 2024

@XavierAtCERN mentioned django-debug-toolbar.

This provides some insights to the cpu time each page render takes.
Profiling for local run and deployment is shown below.

Locally

image
image
image

On deployment

image
image
image

from mlplayground.

nothingface0 avatar nothingface0 commented on July 28, 2024

Things to check

  • IPv4-only configuration -- How??

  • @xicoa1: Create own image, then use it on Openshift
# Guarantee you have the base image 
podman/docker pull ubi8/python-38
# Run S2I 
s2i build <PATH_TO_SOURCE> ubi8/python-38  imagelocal

  • Other API endpoints. Does the same happen with 2D Lumisections?
    No; which is strange. Could this be related specifically with HistogramDataFile? If yes:
  • Is it due to FilePathField which points to /eos? -- Unlikely, no /eos access is done on API querying, only Database access
  • Is it due to percentage_processed in serializers.py which is run on each query? -- Probably not, this issue would also appear when running on the local computer.
    This should be narrowing it down to specifically the histogram_data_files API endpoint

  • Use a debugger (locally) to see what's going on after the DB query is complete

  • Investigate db lookup, serialization, dispatch, rendering
    By checking this procedure, the results for the HistogramDataFileViewSet response (for one page of 50 entries) are:
Operation Execution time
Database lookup 0.0000s
Serialization 0.0000s
Django request/response 0.0025s
API view 1.2326s
Response rendering 0.0005s

  • Pagination: Pagination is done within the get_context method of the renderers.py file of DRF. We are currently using a custom pagination method to add extra info.

After testing, the custom pagination.py file does not seem to affect performance at all.

Disabling pagination completely has also no impact. In fact, the response takes surprisingly less time than anticipated to render 6000+ results (same execution time as with 50 results):

Operation Execution time
Database lookup 0.0000s
Serialization 0.0000s
Django request/response 0.0272s
API view 1.3690s
Response rendering 0.0376s

  • Authentication: someone reports that authentication seems to be the bottleneck for the dispatch method.

Disabling API authentication completely has some, but not substantial, effect on performance


  • Allowing only some HTTP methods on the ViewSet

Adding the mixins.ListModelMixin and mixins.RetrieveModelMixin to the HistogramDataFileViewSet does not affect performance


  • Dig deeper into DRF
  • Downloaded DRF source from github and saved inside project
  • Added -e ./django-rest-framework in requirements.txt
  • pip install -r requirements.txt
  • Counted time in views.py -> dispatch

from mlplayground.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.