Coder Social home page Coder Social logo

auditor's Introduction

Degreepath, the automated degree auditor

Gobbldygook lives!

Degreepath is a tool to perform automated, comprehensive, and fast degree audits.

Automated: (eventually) A web interface, for reviewing the status of a cohort of students and for running reports

Comprehensive: Just checking degree audits? Nah. How about running graduation, general education, major, and concentration/minor audits, all at once?

Fast: (eventually) And of course, this only makes sense to run if it's faster than doing the checks by hand. Luckily, computers are fast, and they can do lots of things at once!


Requires Python 3.6+.

$ python3 -m venv ./venv
$ source ./venv/bin/activate  # or ./venv/bin/activate.fish
$ pip install pip-tools
$ pip-sync requirements.txt
$ python3 -m dp --help

To run tests:

$ pytest  # or, with coverage, pytest --cov=dp

Other commands:

CLI

$ python3 -m dp --help

The main CLI entry point; see --help.

Basic usage is as follows:

# somehow get a student JSON file; documentation coming at some point
# area.yaml can be any .yaml file in the stolaf-areas repository
$ python3 -m dp --student <student.json> --area <area.yaml>

Batch Job Server

The dp.server module handles sorting through the queued jobs, both batched and one-off what-ifs.

Basic usage is as follows:

# start listening for events from postgres
$ python3 -m dp.server &
# fetch the next batch of students to audit from SIS and queues them in postgres
$ python3 -m dp.server.batch
# or, run a one-off what-if audit
$ python3 -m dp.server.whatif --student <file> --code <code> --catalog <catalog>

The dp.server.audit module encapsulates driving the auditor and storing the final result into postgres.

Misc. Scripts

  • You can pair up python3 -m dp.bin.index <QUERY> | python3 -m dp.bin.batch to run batches of audits quickly on a folder of student files.
  • python3 -m dp.bin.discover <area-file> will give you a list of the bucket references and static course references contained within.
  • python3 -m dp.bin.expand <student-file> will print (student_file, area_file) pairs to stdout, one for each area in the student.
  • python3 -m dp.bin.print <student-file> <output-json> will print the same output that -m dp generates.
  • python3 -m dp.bin.validate <area-file> will validate that an area specification is syntactically valid.

Fancier CLI

$ python3 -m dp.testbed --help

Allows eas[ier] benchmarking of changes against the stable branch.

Usage:

$ python3 -m dp.testbed fetch
$ python3 -m dp.testbed baseline
$ python3 -m dp.testbed branch <name>
$ python3 -m dp.testbed compare <name>

You may notice that there are three requirements*.txt files. I split them apart so that I could install the dependencies easily.

filename why
requirements.txt Common runtime dependencies
requirements-dev.txt Development dependencies - pytest, mypy, etc
requirements-server.txt Database and error logging dependencies

Copyright (C) 2019  Hawken MacKay Rives

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

auditor's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar hawkrives avatar rye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

auditor's Issues

Implement exceptions

I'm envisioning two types of exception:

  1. Force exceptions match a requirement or a rule, and force it to evaluate to either True or False. This will completely bypass any claims or anything that would have been done by the rule, which can change how the overall area is audited.

  2. Additive exceptions include a clbid. They match a QueryRule, CourseRule, or CountRule, and insert the associated course into the rule.

  • For a QueryRule, the course is inserted into the claimed_items collection, so it appears after the WhereClause has finished, but before the assertions are run. There is currently no way to insert a course into a QueryRule assertion with a second WhereClause.

  • For a CourseRule, it bypasses the original course, and replace some it with the associated one instead. This happens before claims are run, and also skips the claim, so if you insert, you may use both the replaced course and the inserted course elsewhere. (Subject to change; it may make more sense to Claim the inserted course here, to prevent re-use.)

  • For a CountRule, it just inserts a new CountRule at the end of the items list. That's all.

Idea: check for overlap down to the leaf rules

Currently, we only check overlap between the top-level requirements (because my naive implementation would only check against sibling rules).

In order to defeat this, we need to (at the top level) somehow recurse into each rule, and provide the possible items into the nested levels as we check them.

Q: Does that actually get us anything? I think it does, because you should be able to reduce the number of iterations in nested Count rules…

Add some way to distinguish between types of "in-progress" rules

telegram-cloud-photo-size-1-5010317081975040039-x

Feedback: the "in-progress courses count as 'Completed + IP')" is slightly confusing

telegram-cloud-photo-size-1-5010317081975040040-x

Also is visually indistinguishable from this, despite the meaning being quite different?

Note to self: I should add a new state to the enum, something like CompletedWhenInProgressAreDone.

I should also store the number of in-progress items on Clause, so that we can have three fields: resolved_completed:, resolved_in_progress:, and resolved_with: which is the combination of the other two.

Add tool that can take to student files and output a plain text diffs of what changed between then

This tool should include things like course a grade change from B to A or etc.

There should be a similar tool, that takes two area specification files, and outputs the differences between them.m

Finally, there should be a tool that takes to audit reports and I'll put the differences between them, taking into account the differences between the input data files and specification files.

Add requirement-is-satisfied-by-id expression function

If a requirement has an "id" field, store it in a new structure on RequirementContext, requirement-state. Then allow the "requirement-is-satisfied-by-id" expression to check ctx.requirement-state, raising an error if the ID doesn't exist yet.

Requirements would need to populate this field when they are audited, I think.

Need to ensure uniqueness on the ID field.

Future optimization: check the full document for any expressions that need this value, and skip setting it if there are none.

Add special case for recursive check for only direct course rules

Ie, the following


For this requirement, you must do all of the following:

take THEAT 130
take THEAT 140
take THEAT 180
take THEAT 232
take either THEAT 240 or THEAT 250
take THEAT 270
take THEAT 271


… should omit the word Take on every line.

We should do this by checking recursively if all child rules are either a Course or a container (either, both, of) which only contains courses or containers … etc.

Add new types of "given" input data

given: music performances
where: {performance: 'Instrumental Student Recital | Vocal Lab Choir Solo'}
what: performances
do: count >= 2

music performances and music performance attendances.

(see hawkrives/gobbldygook-area-data#126 (comment)).

In the future, this type may expand to include Dance and Theater performances, at which point it will need to be renamed; until then, since I don't know if those will go under this type of data or as a new data type, I'm just going to call these explicitly.

limit clauses may cause differing output

auditing against 2017-18 530
2 attempts in 2ms (avg ~215us per attempt)

["$"]
'Exercise Science' audit was successful. (rank 2 of 2; gpa: 3.00)

["$", "%Electives"]
(2|2|t) 💚 Requirement(Electives)
    ["$", "%Electives", ".query"]
    (1|1|t) 💚 Given courses matching ("attributes" == "esth_elective" or "ap" == "AP Statistics")
    (1|1|t)  Subject to these limits:
    (1|1|t)  - at most 1 where ("course" ∈ "['STAT 110', 'STAT 212', 'STAT 214']" or "ap" == "AP Statistics")
    (1|1|t)  Matching courses:
    (1|1|t)     PSYCH 241 "Developmental Psych"
    (1|1|t)     STAT  "AP Statistics"
    (1|1|t)  There must be:
        ["$", "%Electives", ".query", ".assertions", "[0]", ".assert"]
        (1|1|t)  - 💚 "count(courses)" ('2') ≥ "2"
        (1|1|t)       resolved courses:
        (1|1|t)         - Course("PSYCH 241", name="Developmental Psych")
        (1|1|t)         - Course("STAT ", name="AP Statistics")
auditing against 2017-18 530
2 attempts in 4ms (avg ~273us per attempt)

["$"]
'Exercise Science' audit was successful. (rank 2 of 2; gpa: 3.15)

["$", "%Electives"]
(2|2|t) 💚 Requirement(Electives)
    ["$", "%Electives", ".query"]
    (1|1|t) 💚 Given courses matching ("attributes" == "esth_elective" or "ap" == "AP Statistics")
    (1|1|t)  Subject to these limits:
    (1|1|t)  - at most 1 where ("course" ∈ "['STAT 110', 'STAT 212', 'STAT 214']" or "ap" == "AP Statistics")
    (1|1|t)  Matching courses:
    (1|1|t)     PSYCH 241 "Developmental Psych"
    (1|1|t)     STAT 212 "Statistics for Science"
    (1|1|t)  There must be:
        ["$", "%Electives", ".query", ".assertions", "[0]", ".assert"]
        (1|1|t)  - 💚 "count(courses)" ('2') ≥ "2"
        (1|1|t)       resolved courses:
        (1|1|t)         - Course("PSYCH 241", name="Developmental Psych")
        (1|1|t)         - Course("STAT 212", name="Statistics for Science")

Implement paths at parse time instead of audit time

I'd like to have paths like [$, @ReqName, @Child, .of, 1, .course].

I need more consistent paths for exception application.

Paths start at the root, $.

Each requirement is @name.

Within a requirement, you have the … path to the node, I guess.

Anyway, I want this to be discovered at parse time, instead of audit time. Each path should be stored on the Rule or Requirement node itself.

Then exceptions can just check for the path at each stage.

Remove custom serializer from crate::rule::CourseRule

I was flip-flopping back and forth between having this serialize to a "canonical" format and having it act as a "formatter" and output a nice format… and crate::rule::CourseRule is the only thing left as "nice format".

I think I should go with the "generate a canonical version" option.

This should boil down to removing the custom impl Serialize for CourseRule block and adding Serialize to the #[serde()] block above CourseRule.

Remove python-dotenv dependency

For what we need - just loading a single .env file with key=value pairs - I don’t think that we need the weight of this dependency.

It’s not huge, but it’s (a) a dependency and (b) it adds to our startup time, which I’m trying to keep down somewhat.

I don’t think it’s a lot of time, maybe 50-100ms, but when ~5000+ of my 7000 nightly audits take less than a second, that’s nearly a tenth of the time, just to load this dependency.

Also, I’d like to better understand an algorithm for the purpose of .env discovery, because I dislike the current hard-coding of paths that I have to do.

Begin implementing audits

  • RequirementRule
  • CourseRule
  • BothRule
  • EitherRule
  • CountOfRule
  • GivenRule

I've ordered those somewhat in order of anticipated ease of implementation.

Remove python-markdown2 dependency

I enjoy writing markdown, but I don’t enjoy it enough to spend 200ms loading a module to parse it.

(And this module is faster to load than the original dependency I used, python-markdown)

I timed this with python -X importtime -m dp --help.

I plan to replace this module with HTML. This is only used to parse the "messages" for a requirement, and there aren't that many of them.

Think about removing methods from dataclasses

I've been realizing that the intent behind most "data classes" is to have no/very few methods on them, as they're intended to act like "records".

It would be interesting to experiment with making the DP classes into "records"?

Alternately, another idea might be to move from subclassing Rule > Solution > Result, to having a (say) CourseRule record, which is attached to a CourseSolution record, which is attached to a CourseResult record?

So instead of having to copy all of the properties, and subclassing as we do now, we could just essentially stick a pointer into it.

Something like

class CourseRule:
    course: str 

class CourseSolution:
    rule: CourseRule

class CourseResult:
    solution: CourseSolution
    
    def rank(self) -> Decimal:
        ... 

Hmm. Something like that.

Support `count: 1 | 2` in count/of rules

Needed for ENVST.

Since we've resolved that count/of rules are no longer greedy, we can support "either 1 or 2" as a value.

Need to decide if this should be a WrappedValue or something simpler – probably something simpler, as it doesn't need & abilities or things applied at the TaggedValue level either.

Rebuild codified areas with explicit names on requirements instead of using maps

iow, go from this

name: Biology
type: major
degree: Bachelor of Arts
catalog: 2018-19

result:
  count: all
  of:
    - requirement: Foundation
    - requirement: Core
    - requirement: Level III
    - requirement: Electives
    - requirement: Chemistry

limiters:
  - where: {level: '100'}
    at_most: 2
  - where: {institution: '! St. Olaf College'}
    at_most: 0

requirements:
  Foundation:
    result:
      course: BIO 150

  Core:
    result:
      count: all
      of:
        - requirement: Genetics
        - requirement: Cell Biology
        - requirement: Comparative Organismal Biology
        - requirement: Ecology

    requirements:
      Genetics:
        result:
          count: 1
          of:
            - BIO 233

      Cell Biology:
        result:
          count: 1
          of:
            - BIO 227
            - CH/BI 227

      Comparative Organismal Biology:
        result:
          count: 1
          of:
            - BIO 242
            - BIO 247
            - BIO 248
            - BIO 251
            - BIO 252
            - BIO 266
            - BIO 275
            - BIO 226

      Ecology:
        result:
          count: 1
          of:
            - BIO 261
            - BIO 226

  Level III:
    result:
      count: 1
      of:
        - BIO 315
        - BIO 341
        - BIO 348
        - BIO 363
        - BIO 364
        - BIO 371
        - BIO 372
        - BIO 382
        - BIO 383
        - BIO 385
        - BIO 386
        - BIO 391

  Electives:
    result:
      given: courses
      where: {attribute: bio_elective}
      what: courses
      do: count >= 2

  Chemistry:
    result:
      count: any
      of:
        - {count: all, of: [CHEM 121, CHEM 123, CHEM 126]}
        - {count: all, of: [CHEM 125, CHEM 126]}
        - {count: all, of: [CH/BI 125, CH/BI 126]}

to this

name: Biology
type: major
degree: Bachelor of Arts
catalog: 2018-19

result:
  count: all
  of:
    - requirement: Foundation
    - requirement: Core
    - requirement: Level III
    - requirement: Electives
    - requirement: Chemistry

limiters:
  - where: {level: '100'}
    at_most: 2
  - where: {institution: '! St. Olaf College'}
    at_most: 0

requirements:
  - name: Foundation
    result:
      course: BIO 150

  - name: Core
    result:
      count: all
      of:
        - requirement: Genetics
        - requirement: Cell Biology
        - requirement: Comparative Organismal Biology
        - requirement: Ecology

    requirements:
      - name: Genetics
        result:
          count: 1
          of:
            - BIO 233

      - name: Cell Biology
        result:
          count: 1
          of:
            - BIO 227
            - CH/BI 227

      - name: Comparative Organismal Biology
        result:
          count: 1
          of:
            - BIO 242
            - BIO 247
            - BIO 248
            - BIO 251
            - BIO 252
            - BIO 266
            - BIO 275
            - BIO 226

      - name: Ecology
        result:
          count: 1
          of:
            - BIO 261
            - BIO 226

  - name: Level III
    result:
      count: 1
      of:
        - BIO 315
        - BIO 341
        - BIO 348
        - BIO 363
        - BIO 364
        - BIO 371
        - BIO 372
        - BIO 382
        - BIO 383
        - BIO 385
        - BIO 386
        - BIO 391

  - name: Electives
    result:
      given: courses
      where: {attribute: bio_elective}
      what: courses
      do: count >= 2

  - name: Chemistry
    result:
      count: any
      of:
        - {count: all, of: [CHEM 121, CHEM 123, CHEM 126]}
        - {count: all, of: [CHEM 125, CHEM 126]}
        - {count: all, of: [CH/BI 125, CH/BI 126]}

Figure out how to keep ConditionalRules in the tree data through to the end

Store a ConditionalRuleSolution and Result, with a evaluation_result: bool key to denote which branch was selected. Then render these in the output.

Need to figure out how to store conditional clause chunks.

The goal here is twofold:

  1. Make it easier to write reports, by having a consistent structure for all results in a catalog year

  2. Actually show students what is required, and what might be required – for instance, currently, in the Education major, you have to take a course in your Area(s) of Expertise, which are determined by your other major(s). for instance: Chem -> Teaching Science; MATH -> Teaching Math. Currently, no one can see those unless they meet the prereq of having declared the other major(s). Similarly, American Studies allows you to count three AMCON courses if you take the final one in the sequence, but you can't count any of them if you don't complete the sequence. So, if you don't have the last one, there's no indication on the page that you can ever count the AMCON courses.

I guess I will probably still need a "hidden" flag on these, to hide the items like "alternative courses" that departments sometimes want to automatically count, but not show to students…

One more idea for fractional ranks

Define a constant somewhere to represent the smallest size of a course, ie 0.25 credits.

Then represent an assertion as something like

  • from areas: 1 of 3 areas = ⅓
  • from courses, count credits: N of M credits
    • IE, 0.25 of 1.25 credits would be 1/5, because the smallest Course unit is 0.25
  • from courses, count courses: 1 of 2 courses = ½

etc


A count-rule is N of 2, where the first 1 is the sum of the ranks of the children items, and the second is the sum of the ranks of the post-audit checks. (… yes, I think that makes sense.)

So if you have four children and two checks, and the sum of the children is ½, and the sum of the checks is ½, you get, uh, ½ lol because it's 1 out of 2.


A query rule is out of N, where N is the number of assertions within it. So if there's two rules, at ¼ and ½ respectively, the final value is … um, ¾ over 2? wait…

Hmm.

How do you do 1.5? 1 and ½, right?

So… ¾ of 2, but that's not it, I'm looking for 0.75 over 2, which is, uh, 7/16? I think? Sure.

Hmm. Anyway, stuff to think about.


Need to think through if this helps with the issue of

A: ECON 161
B: (both MGMT 120 and ECON 281)

where A's rank is currently 1.0, but B's rank can be 2.0, because it has two items.

If represented as fractions, both branches would be 1/1, I think, which would allow them to correspond properly.

Figure out how to trim the search space for a rule

If a Count rule is, say, the following:

all:
  - course: MUSIC 114
  - course: MUSIC 141
  - course: MUSIC 161
  - course: MUSIC 162
  - either:
      - course: MUSIC 212
      - course: MUSIC 214

and the student is missing MUSIC 161, then the rule can exclude any combinations that would require MUSIC 161.

It should still try to discover the best solution that doesn't have the missing courses, so as to give an accurate picture of the student's place in the degree, but that should help to trim the search space drastically.

As to how to do so… still working on that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.