A little about me
My Projects so far (from last to first)
Full Stack Healthcare Data Analysis Project
This project's main purpose was to make a full stack app that utilizes multiple tools including Python, SQL, Flask API and front-end work with HTML, CSS and JavaScript (Plotly, Leaflet, D3 libraries). We chose several datasets to achieve that specifically for healthcare revenue space. We originally pulled and cleaned data that had multiple data points to quantify value of care as well including readmission ratio in each facility based on different care categories (that table is in our PostreSQL, but we ended up not using it in our visualizations), clinical outcomes dataset per facility/zip code as well as dataset with hospital-acquired infections. We have also explored more detailed Census data including poverty rate per zipcode as well as total population and it's racial distribution. We wanted to explore a deeper relationshios between multitude of those variables, but were limited with having to choose to present 3 visualizations as well as very limited timeframe. There is definitely space for further exploration within these topics.
Tools and Technology Used for Analysis
- Visual Studio Code
- Python
- Jupyter Notebook
- Pandas
- QuickDBD
- SQL
- PostgreSQL
- Census API
- JavaScript
- D3.js
- Plotly.js
- Leaflet.js
- MapBox
- CSS
- HTML
- Flask App
Process Flow
- CSV and API into Pandas
- Cleanse data
- Push data to SQL database
- Create Flask API
- Pull data into browser using JavaScript
- Generate Dashboard using D3, MapBox, Plotly, and Leaflet
- HTML and CSS formatting / organization
Dataset Values Utilized and Ensuing Relationships (QuickDBD)
Dashboard Charts & Visualizations
Tableau Interactive Dashboards
Please see my Tableau Public here to interact with these and other visualizations.
Medicare/Medicaid Excess Readmissions by State
Poverty Rate and Hopsital Mortality by State
ETL Project
Worked on 6 CSVs from Data.World to find information on Oscar-winning movies to get the data on the movies that have won Oscars since year 2000. Please see my Jupyter notebook here. What I wanted to look into was the following:
- What was the rating from the critics vs IMDB rating of the movies that won Oscars.
- How many of the movies that won Oscars were Animated ovies (cartoons).
- Which directors created most Oscar-winning movies. I had to the do the following transformations to the data to prepare it for the database:
- Remove and/or replace NaN values.
- Drop duplicates as some movies were showing up in the dataframes multiple times.
- Used .iterrows to iterate through all the rows in a large datfarme to split the Genre column and count how many genres were attached to one movie and to see if Genre column contained the word "Animated" to create a boolean column for cartoons in the dataset.
- Changed values in columns from "American" to "USA" and from "Yes/No" string to True/False boolean value.
- Other things I did was: change column names, lowercase them, replace spaces in column names with the underscore symbol, reset index in the dataframe. As the end result, I had created 5 tables in the Database in PosgreSQL, which we later joined on movie_title. Please see my Database image as well as final join code and output for the 5 tables in PosgreSQL.
First Python Visualization Project
So far I've done one project and working on my next one! My first project involved analyzing large pieces of Data from 12 CSVs (from WHO, Our World in Data and Kaggle) to get correlation statistics between different socio-econimic factors and alcohol consumption in general and wine consumption in particular. I have background in Holistic Health, so I was mostly interested to see if wine consumption correlates with longer life expectancy. There is definitely more to look into, but what I have found so far is that life expectancy and wine drinking only have a low uphill correlation (both worldwide and in Europe - the highest wine drinking continent!). But both wine drinking and life expectancy are really highly correlated with country's GDP. So the most fitting conclusion with my reserach so far is to say that the higher the GDP of your country, the more likely you are to both live longer and drink wine. Things I worked on in the project were: Python, Python API, Pandas, Matplotlib, Numpy, Scipy Stats, Seaborn, Google Maps API. I became really good at merging CSVs, binning, boolean masks, regression and statistical analysis and much more! Here is some visuals from my project! And feel free to go here to see our Jupyter notebooks! ![]