Comments (3)
Solution
A FilePathField
was used for the HistogramDataFile
model. This field, given a directory, will iterate all files in order to populate possible values. I believe that this is part of its validation, which DRF uses every time it serializes its data.
As this path was pointing to EOS, Django, when trying to populate a list of all available files in the specified directory, took somewhere between 20s and 3 minutes (depending on EOS load, I guess) for every API query. This explains why, for a single results page (50 results, serialized with the ListModelMixin
), the first serialization took so long (fetched a list of the files), while the consequent serializations were so fast (just validated that the existing filepath existed in the list of files).
Making this field a CharField
automatically fixed the problem.
💀
Moral of the story
Don't use FilePathField
for EOS directories with lots of files, if you're going to serialize them with DRF
from mlplayground.
@XavierAtCERN mentioned django-debug-toolbar
.
This provides some insights to the cpu time each page render takes.
Profiling for local run and deployment is shown below.
Locally
On deployment
from mlplayground.
Things to check
- IPv4-only configuration -- How??
- @xicoa1: Create own image, then use it on Openshift
# Guarantee you have the base image
podman/docker pull ubi8/python-38
# Run S2I
s2i build <PATH_TO_SOURCE> ubi8/python-38 imagelocal
- Other API endpoints. Does the same happen with 2D Lumisections?
No; which is strange. Could this be related specifically withHistogramDataFile
? If yes: - Is it due to
FilePathField
which points to/eos
? -- Unlikely, no/eos
access is done on API querying, only Database access - Is it due to
percentage_processed
inserializers.py
which is run on each query? -- Probably not, this issue would also appear when running on the local computer.
This should be narrowing it down to specifically thehistogram_data_files
API endpoint
- Use a debugger (locally) to see what's going on after the DB query is complete
- Investigate db lookup, serialization, dispatch, rendering
By checking this procedure, the results for theHistogramDataFileViewSet
response (for one page of 50 entries) are:
Operation | Execution time |
---|---|
Database lookup | 0.0000s |
Serialization | 0.0000s |
Django request/response | 0.0025s |
API view | 1.2326s |
Response rendering | 0.0005s |
- Pagination: Pagination is done within the
get_context
method of therenderers.py
file of DRF. We are currently using a custom pagination method to add extra info.
After testing, the custom pagination.py
file does not seem to affect performance at all.
Disabling pagination completely has also no impact. In fact, the response takes surprisingly less time than anticipated to render 6000+ results (same execution time as with 50 results):
Operation | Execution time |
---|---|
Database lookup | 0.0000s |
Serialization | 0.0000s |
Django request/response | 0.0272s |
API view | 1.3690s |
Response rendering | 0.0376s |
- Authentication: someone reports that authentication seems to be the bottleneck for the
dispatch
method.
Disabling API authentication completely has some, but not substantial, effect on performance
- Allowing only some HTTP methods on the ViewSet
Adding the mixins.ListModelMixin
and mixins.RetrieveModelMixin
to the HistogramDataFileViewSet
does not affect performance
- Dig deeper into DRF
- Downloaded DRF source from github and saved inside project
- Added
-e ./django-rest-framework
in requirements.txt pip install -r requirements.txt
- Counted time in
views.py
->dispatch
from mlplayground.
Related Issues (20)
- Fix Bootstrap `data` attributes to match Bootstrap 5.0 naming conventions
- [Data Taking Objects] Add API endpoints for Runs and Lumisections
- Add minimal plots HOT 2
- Provide Foreign Key value instead of id when using API HOT 1
- Re-assign bottom navbar for external data sources
- Add frontend for triggering kedro pipelines with Task(s) as parameters
- [API] Filtering Lumisections by `run` uses the Run's DB id instead of the actual run number
- Add API to certification
- Add CERN login for users
- [Histograms] `Lumisection1DHistogram` parsing does not calculate percentage complete properly
- [Histograms] Refactor data file parsing code
- Use the `Dockerfile` to upload a docker image (to Dockerhub?) to be used in the github workflow
- [Histogram File Manager] Allow multiple paths to be configured for looking for DQMIO files
- Unknown users can sign up for a new account via "Login with CERN" button HOT 3
- Histogram visualizer breaks when histogram does not have x_min and x_max information
- Expose functionality to initiate discovery of DQMIO files
- Histogram manager breaks when histogram name filter contains a forwardslash `/` HOT 1
- Users can access histogram lists without logging in HOT 1
- Add action-tracking information for Discovery/Parsing
- Redesign parsing of lumisection histograms HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlplayground.