Specify the purpose of the login (search or detail retreiever)
I recommend 1-3 logins for search and the remaining for more expensive attribute retrieval
search_retriever.py
Optional
details_retriever.py
MAX_UPDATES: - Number of job postings to look up before sleeping. Increase with more accounts/proxies (default = 25)
SLEEP_TIME: - Seconds to sleep between every iteration (default = 60)
Running
This program consists of 2 main scripts, running in parallel.
search_retriever.py - discovers new job postings and insert the most recent IDs and minimal attributes into the database
details_retriever.py - populates tables with complete job attributes
It's important to note that while search_retriever.py typically runs smoothly, even through your personal IP and a singular account, details_retriever.py can be a bit finicky. Each search generates approximately 25-50 results, all of which must be individually queried to obtain their attributes. To enhance its performance, I recommend the following strategies:
Utilize multiple proxies and accounts when running details_retriever.py.
Experiment with different time delays to find the optimal settings.
Run details_retriever.py during periods of lower online activity, such as late-night hours and weekends, to catch up with the progress of search_retriever.py. This will ensure that both processes remain synchronized and up to date.