ccao-data / ptaxsim Goto Github PK
View Code? Open in Web Editor NEWR package for estimating Cook County property tax bills
Home Page: https://ccao-data.github.io/ptaxsim/
License: GNU Affero General Public License v3.0
R package for estimating Cook County property tax bills
Home Page: https://ccao-data.github.io/ptaxsim/
License: GNU Affero General Public License v3.0
PINs within the special RPM TIF district have slightly incorrect total tax amounts, especially compared to non-TIF PINS. The RPM TIF has the following disbursement:
The first two are correct, but calculating the correct proportional distribution for part 3 is challenging. Currently using the following formula: agency_tax_rate / sum(agency_tax_rate (excluding CPS))
The following script yields the example data below:
left_join(
tax_bill(2019, "14081020190000", simplify = FALSE),
ptaxsim::sample_tax_bills_detail %>%
filter(pin == "14081020190000") %>%
select(pin, tax, agency),
by = c("pin", "agency")
) %>%
select(
agency_name, tax_amt_calced = tax_amt_final, tax_amt_real = tax
)
# A tibble: 10 x 3
agency_name tax_amt_calced tax_amt_real
<chr> <dbl> <dbl>
1 COUNTY OF COOK 606. 624.
2 FOREST PRESERVE DISTRICT OF COOK COUNTY 78.8 78.4
3 CITY OF CHICAGO 2141. 2131.
4 CITY OF CHICAGO LIBRARY FUND 162. 161.
5 CITY OF CHICAGO SCHOOL BLDG & IMP FUND 226. 225.
6 CHICAGO COMMUNITY COLLEGE DISTRICT 508 199. 198.
7 BOARD OF EDUCATION 4561. 4560.
8 CHICAGO PARK DISTRICT 435. 433.
9 CHICAGO PARK DIST. AQUARIUM & MUSEUM BONDS 0 0
10 METRO WATER RECLAMATION DIST OF GR CHGO 520. 517.
Need to add data frame of assessment levels for different classes and years. Structure would be something like:
year class level
2016 205 .1
2016 599 .25
This would enable one to back out the market rate from the assessed values saved within PTAXSIM.
SSAs are a tool that add a fixed tax rate district to an area to provide services. Unlike other taxing districts, the tax rate of an SSA does not vary. In the data, the tax rate is kept fixed by varying the levy of the SSA as the tax base grows or shrinks.
For historical bills, PTAXSIM works fine as is. However, when calculating counterfactuals, SSAs can become a problem since any change in the base must be reflected in the levy in order to keep the SSA rate constant (and the counterfactual accurate). As such, we should add a warning or a helper function specifically to deal with SSAs.
The Town of Cicero (agency ID = 020060000
) is miscoded as a township when it should be a municipality. We should fix this type and any others like it + adjust the regex to classify subtypes.
GitHub and AWS can authenticate using OIDC. We should switch to using that method for S3 auth. More details here: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect#adding-permissions-settings
Example here: ccao-data/data-architecture#50
Original issue:
Please add the ability to easily calculate each property's tax price (See this paper for an explanation of tax price). This could be achieved by adding a new tax_price() function to the package, or adding tax price or agency tax base to the tax_bill() function output.
Not sure that a separate argument/function just for tax share is warranted, but I think it would be great to add a vignette using it as a technique to decompose a tax bill (or area).
See the PR here for an example: ccao-data/data-architecture#50.
I was using ptaxsim, and I tried to recover a tax bill for the year 2022:
raceway <- c("02252020080000", "02243030070000", "02251000050000", "02262010100000")
test <- tax_bill(year_vec = 2022, pin_vec = raceway)
Error in check_pin_dt_str(pin_dt) :
pin_dt must be in the same format as the PIN data returned by lookup_pin(). Ensure there is 1 row per PIN per year and all column names and types are the same
I think that this year just isn't available in ptaxsim. Might want to include an error message to that effect.
Is there a data source that has the assessment rate used for incentive properties in each tax year?
Incentive properties have their assessment rate change over time so backing out the market value of the property is not easily or accurately done. The Assessed Value and EAV are included in PTAXSIM, but without the assessment rate in a given year for these property classes, can the market value be calculated?
A CSV file of PIN level data for properties (PINS or parcels, either would be great) with 600 and higher property classes with the tax year and assessment rate would be appreciated!
There seems to be some potential wonkiness with the 2022 exemptions data. Most PINs correctly aggregate to their final 2022 tax bill, however some of the rarer exemptions seem to be missing or not recorded in the Clerk's data. Here's a quick test PIN - 10252080490000
- which receives a veteran's exemption on their bill but not in the Clerk's data.
Now that we've set up dbt tests in the data-architecture repo, we should set up something similar for PTAXSIM.
All scripts in data-raw/
actually populate an S3 bucket that feed the tax
database in Athena. It should be relatively simple to add dbt tests on top of the tax data which need to pass prior to PTAXSIM export. Some ideas for tests:
Need to create a vignette to show how levy changes affect tax bills. Specifically, we should show how inflation relates to levy increases, how PTELL works, what the effect of inflation will be on levies, etc.
The current PTAXSIM vignettes are for the most part specific to a place and year. It would be useful to parameterize a few of them such that they could work with any municiaplity/tax code and year combination.
I want to test the impact of different hypothetical values of a Homeowners' Exemption in Cook County on 1) progressivity of the tax structure and 2) the cost of modeling errors to low-income residents. I want to answer the question "What would happen in Cook County if Illinois increased the EAV amount of the Homeowners' Exemption by 100%? 200%? 300%? etc."
A naïve approach would be to use parcels' estimated market values from Cook County's open data and apply a fixed tax rate while varying the value of the Homeowners Exemption. This approach would not account for the impact of the Homeowners exemption on the overall tax base, and therefore on the tax rate applied in each iteration.
Reading the PTAXSIM documentation, it does not appear clear to me that it is easy to use PTAXSIM to simulate the impact of policy changes on the entire tax base.
I have to complete this by Thursday of next week. Is there a way to easily user PTAXSIM to simulate this scenario?
https://www.elkgrove.org/Home/Components/News/News/6046/31
030350504 TIF VIL OF ELK GROVE VILLAGE-MIDWAY COURT 16157
This tif seems to have been created/discontinued within the same tax year (2022). It appears that tax code 16157 does not have any pins / is not used in tax year 2022. Possible data quality issue.
The database includes a few parcels in tax year 2021 that have classes that are not in the assessor class code documentation.
Specifically:
Can you please clarify whether these classes are accurate in the database and, if so, provide more information about the classes (i.e. name and assessment ratio)?
My query was "select pin, class, tax_code_num, av_clerk from pin where year = 2021"
Thanks!
Certain municipalities and taxing bodies can span both Cook and a bordering county. In such cases, the tax rate calculation in PTAXSIM may be wrong, since the denominator in the cty_cook_eav
).
Need to look into this more and possibly see how the Clerk does it. Best way to start is to pull some bills from a border area.
See code here for reference: https://github.com/AleaWM/CookCounty-PropertyTaxes/blob/6a3e4f737e9453d74cdaaa71dd02a936da6bda63/Residential_Burden_Shift.Rmd#L301
The goal of this issue is to create a vignette which entirely replicates the IDOR calculations of tentative and final equalization factors for tax year 2022. It remains to be seen if this is entirely possible. The vignette then would show various impacts in differences between tentative and final equalization factors.
2022 016 Cook FINAL(25E)CCAO PTAX Forms 002 Forms 210-215-236 B-R Action Not Equal-1.pdf
2022 016 Cook Tentative CCAO PTAX Forms 002 Forms 210-215-236 DH.pdf
Collect additional property tax related data from documents, using LLMs for parsing.
There is a significant amount of useful taxing district data currently locked in non-machine-readable formats, including: TIF ordinance, TIF redevelopment plans, municipal/district budgets, SSA info, etc. If this data can be extracted and parsed, it would be a huge boon to PTAXSIM and would likely be the first ever collection of such data.
The problem is that this data is messy. There is no standard format for something like TIF ordinance, so each document will have a completely different format and language, depending on the municipality. Further, nearly all data of this type comes as PDF scans of legislative text - usually without any OCR applied - spanning hundreds or thousands of pages. As such, parsing this data into useful SQL tables is a massive challenge.
Fortunately, new tech may be able to help with this task. Current LLMs have proven especially capable of extracting relevant information from a large document or corpus. We may be able to use such LLMs to convert PDF scans of taxing district data into useful SQL tables.
⚠️ This is an experimental/moonshot task. We don't know for sure that LLMs will work here or that there's enough structured information to be useful. However, if it does work, then it will produce the first digitized collection of such data and a nice proof-of-concept that we can potentially use elsewhere in the office.
The first thing we need to do is take inventory, first of data, then of LLMs. I would make spreadsheets tracking each of the relevant datapoints.
We need to take stock of what data actually exists that is:
I recommend we start with the following datasets:
The landscape around LLMs is changing pretty much daily right now. For this project to work, we need to take a snapshot of existing LLMs and determine their capabilities/whether they fit our needs. You'll need to do some exploration in this space. We're specifically looking for LLMs that:
Before proceeding to coding, the following tasks should be complete:
Once the above tasks are complete, it's time to get coding. Since this will likely be a lot of data in various states of processing, I recommend making a data flow diagram + using the specific inventory (from above) to help track things. The coding can be divided into two stages: processing and package updates.
Broadly, you'll need to come up with a data collection schema that divides things into raw, processed, and completed buckets. We can create a new S3 bucket/dir you can use to store each stage. This will be the stage actually using LLMs. We can scope it out further as we get closer to this stage.
data-raw/
, though we may not want the raw data itself thereOnce parsing is complete, the collected data needs to be added to the actual PTAXSIM database. This will be much simpler than the processing stage:
data-raw/create_db.sql
to add new table definitions for your finished datadata-raw/
that pulls the processed data from S3 and loads it into the SQLite DB (via data-raw/create_db.R
)vignettes/
describing what your data is and how to use itWe need to update the raw data sets used to generate the PTAXSIM database as tax year 2022 data becomes available from the Clerk, BoR, IDOR, and Treasurer. Once all TY2022 data is available, we will update both the R package and database. The following data sets need to be updated for 2022:
Additionally, we should add 2022 unit/integration tests and update PTAXSIM vignettes:
PA 102-0519 alters PTELL to allow taxing districts to recapture funds lost through refunds, appeals, etc. We may need to alter code to capture the changes caused by this act. See full statute here:
https://www.ilga.gov/legislation/publicacts/102/PDF/102-0519.pdf
For reference: https://www.cookcountyclerkil.gov/sites/default/files/2022-12/Levy%20Adjustment%202022%20for%20PA%20102-0519.xlsx
The mermaid diagram in the readme is very helpful for explaining the db structure and previewing what columns are in which tables. It used to work beautifully, but now it breaks when trying to expand it in the readme. Let's fix it such that an expanded version is made available via the readme (e.g., creating a pdf image of the full expanded diagram and linking to it).
My understanding is that the below pins are possibly getting a retroactive raise to new_av
for tax year 2021. Luckily this is all in a TIF so no impact to tax rates. Similar issues would occur for retroactive reductions made by PTAB.
updates <- tibble( pin = c('17084150010000', '17084070160000', '17084070170000'), new_av = c(41422313, 16158090, 4780), old_av = c(109231, 62244, 4792) )
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.