datamade / bga-payroll Goto Github PK
View Code? Open in Web Editor NEWπ° How much do your public officials make?
π° How much do your public officials make?
we have some questions about what to do with them in the import.
the links could lead to search page w/ employees within the selected range
we will soon be managing multiple years of data. sometimes, we will want to be able to filter that data by year. the first year data appears, is accessible through vintage__standardized_files__reporting_year
on each object. a record is created for every incoming Salary
, every year, such that we can intuit when related Job
and Person
come and go, based on objects related to the Salary
. that means filtering should be done at the Salary
level. and that means our orm and sql queries need to be refactored to do this filtering by default, in a way that is also configurable via user input.
let's talk a bit about this irl.
i don't have any further thoughts on this, at this time.
the flush
/ match_or_create
routines provide us an opportunity to intercept records that error out for some reason when we try to insert them in to the database. let's make a queue for those things so the user can review what went wrong and act accordingly.
related to #55.
total expenditure, other hi-level stats, would make employer pages richer
The existing charts rely too heavily on tooltips. Add some labels that don't require interacting with the chart.
once the app talks to us, wire up delayed task to download & store files from google drive. more on that rig here.
we've defined a base task class with dynamic shared context for all of our delayed work in e52602f. this context is contigent on the standardized file we're operating on, e.g., each task expects a standardized file id.
however, the need for dynamic context introduces a challenge, because "the __init__
constructor (of the Task
class) will only be called once per process."
this means we cannot use the __init__
method to establish context, given a standardized file id. instead, we define a setup
method that accepts this id and sets class attributes accordingly.
we run this method each time a task is issued via celery's task_prerun
signal. this signal provides access to the pending task (sender
), as well as its args and kwargs, with which we can run setup
prior to executing any task code.
this has the effect of giving us access to those contextual attributes in the task, without having to call setup
at the top of each one.
however, it feels a little hacky.
when a task method is bound to a base task class, the code in the bound task is injected into that class as the run
method. however, because we are not in a class context in our task method, it's not possible to define a common because the task code is injected as the run method of the base class, running run
method the base class and extend it via super()
in the methodsuper(BaseClass, self).run()
calls the run method of celery's Task
class (the base class's parent), which raises a NotImplementedError
.
source:
def run(self, *args, **kwargs):
"""The body of the task executed by workers."""
raise NotImplementedError('Tasks must define the run method.')
exception:
tp = <class 'celery.backends.base.NotImplementedError'>
value = NotImplementedError('Tasks must define the run method.',), tb = None
def reraise(tp, value, tb=None):
"""Reraise exception."""
if value.__traceback__ is not tb:
raise value.with_traceback(tb)
> raise value
E celery.backends.base.NotImplementedError: Tasks must define the run method.
is there something else we should hook into, to run common code prior to task execution? or are the conventions here just unconventional?
Right now, we link a person to a position through a Salary model.
I think we might want to have "job" model that links those two. There are three reasons.
start_date
is not attached to the salary, but start_date
is not a property of salary conceptually. start_date is a property of how long someone has held a position, i.e a "job".our payroll
models are now related to the standardized file they came from via their vintage
. let's update the source urls in the front end to lead to those files!
after the first import (#49), there will be a canonical universe of data we need to squash new data into. wire this up, collapsing records only when we can be absolutely sure they belong together.
source link for each data element
It should accept parameters for employer / person / year...
Related to, but separate from, #4.
It looks like there are multiple records for one employee when they get a raise. For example, there are 3 unique records for Justin R Thiede in the 2017 data, each with a different "salary": $8.75, $9.25 and $9.50.
2 | 613 | WEST CHICAGO PARK DISTRICT | THIEDE | JUSTIN R | Β | AQUATICS | 8.75 | 3/19/13 | 2017
2 | 613 | WEST CHICAGO PARK DISTRICT | THIEDE | JUSTIN R | Β | AQUATICS | 9.25 | 3/19/13 | 2017
2 | 613 | WEST CHICAGO PARK DISTRICT | THIEDE | JUSTIN R | Β | AQUATICS | 9.5 | 3/19/13 | 2017
How should this be represented?
They'll look a little something like:
City of Chicago > Chicago Public Library > Brian A Bannon
when multiple users are reviewing, or when one user is reviewing, but navigates away from the page, the queue can be exhausted without all records being reviewed. checked out records expire after five minutes. tell the user of the situation, and ask them to come back in a few minutes. at such time, either review will be completed, or checked out records will be available again.
the app isn't yet smart enough to advance to the next step without observing that the queue is empty. it doesn't make that observation, until someone tries to access review. in the transition code, check whether there is anything to review in the queue; if there isn't, trigger the next step.
our choice for distributed review queues, saferedisqueue
, has bugs. ππππ·
perhaps most pressingly, the re-queueing mechanism "may work okay in a single-consumer setup", but doesn't otherwise. this is a big strike against distributed work!
let's:
there's a decent blueprint over in the stopgap import_data
command. start there, but do better. π
celery & redis!
Query for summary statistics separately, and only get the people we need, rather than fetching all 100k (or whatever) of them at once, which takes too long.
There are 240 of them.
i.e., links to bga stories about the chicago public library, on the chicago public library page
Do it good.
historically, bga has gathered prospective pay for a standard period (the calendar year). someday, they may gather actual pay. on that occasion, because employees start or leave jobs at all times of year βΒ not just jan 1 or dec 31 βΒ it would be nice to keep track of bounding dates for that pay. (see #56 / #57.)
chicago parks / chicago are a good place to start
More tk.
celery has an api for inspecting what's being done / enqueued: http://docs.celeryproject.org/en/latest/reference/celery.app.control.html
figure out how to use that as an indicator of whether work is underway for a file (e.g., must be a quick check β this approach is very heavy.)
this is mostly a to-do for me, in the morning
we should keep track of review decisions. let's do this a review model. here's an example from large-lots
.
Once this is done, protect admin views in data_import
.
Users are asked to review unknown responding agencies and employers during the data import process. They can choose to link them to an existing entity, or add them as a new entity. If they choose to link to an existing entity, we should retain both names for the entity as aliases. This will allow us to link the employer using either representation in future years of data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.