Coder Social home page Coder Social logo

bosterbuhr / fec-data-wranglin Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 3.0 5.94 MB

Dashboard for displaying political donations.

License: GNU General Public License v3.0

Python 13.33% CSS 29.15% JavaScript 7.66% SCSS 27.20% HTML 22.59% Dockerfile 0.07%
hacktoberfest

fec-data-wranglin's Introduction

This project is rapidly evolving and the README will be updated soon.

FEC-Data-Wranglin

get_that_data.py pulls data from FEC API clean_that_data.py finds duplicate values from the same column using TF-IDF i.e. "Not Employeed" is replaced by "Not Employed" or "Apple INC" is replaced with "Apple"

If you want to make more than a handful of requests you need an api_key, visit fec.gov to get a key. Set your api_key as an environment variable. i.e. FEC_API_KEY=DEMO_KEY

When you run python3 get_that_data.py in the terminal:

Resulting data will be saved in a csv in FEC-Data-Wranglin/data/raw_data and will be structured as seen below.

Each row contains the information of one donation, the first five columns reference the contributor's information and party is the party of the candidate they donated to.

contributor_occupation contributor_employer contributor_city contributor_state contributor_zip party
0 RETIRED RETIRED BENTON AR 72019 OTH
1 SYSTEMS MANAGER AVANIR PHARMACEUTICALS ALISO VIEJO CA 92656 OTH
2 RETIRED RETIRED OSHKOSH WI 549048984 OTH
3 INSURANCE REP BRS FINANCIAL GROUP FRESNO CA 93701 IND
4 EDITOR GLOBAL FINANCE MEDIA GREENLAWN NY 11740 OTH
5 FALSE FORMATIV HEALTH HOOSICK FALLS NY 12090 OTH

After gathering your data you can try to clean it up a little.

The process I have used for cleaning the data can be found here

The more data you have the better the cleaning will work!

Run python3 clean_that_data.py after python3 get_that_data.py.

You can change the how many times a file is 'cleaned' by adding or removing this line to the if statement in clean_that_data.py:

return_df_as_csv(build_classes(csv, lowest_similarity, ngram_size), saved_file_name)

csv -- data/raw_data/ .csv -- File you want to clean. It must be a .csv and must be in data/raw_data/

lowest_similarity -- float between 0 and 1 -- similarity threshold between two values in a column, values with similarity greater than lowest_similarity will become the same value.

ngram_size -- int (ideally between 2 and 4) -- size of character chunks used to assess similarity. i.e. ngram_size of 3 for similarity: ' si' 'sim' 'imi' 'mil' 'ila' 'lar' 'ari' 'rit' 'ity' 'ty '

fec-data-wranglin's People

Contributors

bosterbuhr avatar dependabot[bot] avatar napsterinblue avatar sharkness avatar

Watchers

 avatar  avatar

Forkers

sharkness

fec-data-wranglin's Issues

Testing

Added a commit that introduces a bit of unit testing. This is really good practice as well-- both in terms of Python Development and getting used to Git/GitHub

Configure Git

I don't use Macs, but if memory serves correctly, they should come with git, batteries-included. What you're going to want to do is configure it so that you can use your Terminal to interface with GitHub.

You do that by setting up your name/email and (optionally, but highly-recommended) setting up your SSH credentials so you don't have to put in your password each time you make a change

Get the Code Locally

Now what I'd like for you to do pull the code down locally by

  1. Just like the first step in the other issue, make a new branch off of big_refactor, called big_refactor_test or something like that.
  2. Click the Download button and copy the URL (I don't think there's a huge difference between HTTPS and SSH)

image

  1. Open up the terminal, navigate to a directory where you want to drop this project (let's keep it separate from your existing code for now) and type git clone and then paste this URL. This should pull all of your code down locally.

  2. Go into the directory with cd FEC-Data-Wranglin

  3. Establish a new working branch with

git checkout -b <your branch name> origin/<your_branch_name>

Essentially what this is doing is:

  • checkout: I want to work on the branch
  • -b: A new branch, at that
  • <your branch name>: that we'll call <your branch name>
  • origin/<your_branch_name>: that's based on the code on GitHub (origin) called <your_branch_name>

Onto Python

As far as the project goes, I took this chunk of the code

https://github.com/BOsterbuhr/FEC-Data-Wranglin/blob/big_refactor/src/data/data_fetcher.py#L85-L91

and made it reference two new functions

https://github.com/BOsterbuhr/FEC-Data-Wranglin/blob/big_refactor/src/data/data_fetcher.py#L106-L140

What I'd like you to do is rewrite _handle_recipient_committee_type() such that it:

  1. Accepts lower-case input arguments
  2. Cleans up the length of the if/elif statements (hint: consider googling "python in list")

Doing so, will allow your code to pass the unit tests I wrote (hence the introduction of conftest.py and tests/ at the root.

To run them:

  1. First, install pytest via pip install pytest
  2. Then navigate to the root of your project /FEC-Data-Wranglin and type pytest. Doing so at the beginning, will yield a bunch of fails

image

along with printouts of what the result was vs what the expected value was.

  1. Change the code, run the tests, change the code, run the tests, rinse, repeat, until you see a less-ominous pytest message

image

  1. Once it's all good you'll run something that looks like

image

The final line should give you something "success"-sounding, like

image

Query speed

The data_fetcher step takes an extraordinary amount of time.
Find a way to process data as it comes in or, improve the speed dramatically.

Summarize Data

Create dashboard to display:

    - committee_name.value_counts()
    - Total contribution_receipt_amount for:
        1. Each committee_name
        2. Each party
        3. Compare PAC versus direct campaign donations
        4. Top 5 contributor_employers and contributor_occupations

Which file?

Structure wise, where should this go?

#### Need to write this into the other file
# Create instance of data_fetcher class class
# fec = data_fetcher(starting_url, complete_list)
# # Inizialize get_the_data to work
# work = get_the_data(fec)
# # Starts looping over all transactions
# work.gimmie_data()
# # Once complete_list has all transaction we create a pandas dataframe
# df = pd.DataFrame(
# complete_list,
# columns=[
# "contributor_occupation",
# "contributor_employer",
# "contributor_city",
# "contributor_state",
# "contributor_zip",
# "party",
# ],
# )
# Saves DataFrame as serialized object, will import this to ML program
# Add path and uncommet to save pickled data
# df.to_pickle("./pickled_data.pkl")
# Prints DataFrame, be careful, this can get bigggggg
# print(df)
# Thanks for reading my Ted Talk

Also,

# df.to_pickle("./pickled_data.pkl")

Where would this be saved? /data is my guess.

What is the difference in the /data and /src/data? What is your idea for what goes in /src/analysis?

Ok I think that's it for now, more later I assume.

Stylize input and output

Transfer map.html components to either landing.html or generic.html

Design and implement the summary page

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.