sussmanbu / final-project-team1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from sussmanbu/ma415_final_project

0.0 0.0 0.0 16.11 MB

JavaScript 82.97% R 13.49% CSS 2.07% SCSS 1.48%

final-project-team1's Introduction

ma4615-final-project-quarto

https://drive.google.com/drive/folders/1VjAMAWWPs-8v2p5-oEltpz3KynAvJIKn?usp=drive_link

final-project-team1's People

Contributors

final-project-team1's Issues

Data Page

Describe where/how to find data.
You must include a link to the original data source(s). From what you can tell, why was the data collected/curated? Who collected the data?

Evaluation: R

Describe the different data files used and what each variable means.
If you have many variables then only describe the most relevant ones and summarize the rest. Bulletted lists or tables are recommended.

Evaluation: R

Describe any cleaning you had to do for your data.
You must include a link to your load_and_clean_data.R file.
Also, describe any additional R packages you used outside of those covered in class.
Describe how you combined multiple data files and any cleaning that was necessary for that.
Some repetition of what you do in your load_and_clean_data.R file is fine and encouraged if it helps explain what you did.

Evaluation: R

Organization, clarity, cleanliness of the page
Make sure to remove excessive warnings, use clean easy-to-read code (without side scrolling), organize with sections, use bullets and other tools, etc.

Evaluation: R

Feedback from presentation by Alexandra Rodriguez

1. Describe the main idea of the project

Shooting data for Boston PD shooting cases, using tidycenus

examine shooting cases in Boston and link it with demographic data

The team is trying to find the correlation of shooting cases with races

The main idea of the project is to analyze occurrences of shootings in Boston and use census data to determine correlation between employment status, single parent households, household ownership, sex, race, and geographic information and the number of occurrences.

2. What was the best part of the teams work?

They tried to explore the correation with each variable to decide whats the best predictor and giving very clear graph to visualization

Rich dataset and brings in Tidycensus data to enrich the analysis comparing different regions, the demographic data for each region in Boston is really good combination

The team is doing great with the tidycensus package by showing the correlation of shooting cases with races in each area of boston

I think that the logistic modeling looked well done and they were able to determine the best predicter for the frequency of shootings, which was location.

3. How would you suggest improving the team's work?

Maybe merging more data to support the orginal datset so it might be easier writting analysis and big picture

Maybe some other forms of statistical modeling can be done to improve the project, could think about a region-level regression analysis

The team did some preliminary data analysis to the work, I think the team should include some more high level model.

Similar projects had maps for the occurrences, I think that it would be a good thing to include in order to visualize their results.

4. Do you have any other comments or ideas?

N/A

The overall is good

Feedback from presentation by Eric Yang

1. Describe the main idea of the project

In general as income per capita and shootings have a negative correlation

income per capita and shooting frequencies have a negative correlation with a focus on race

They are focusing on the relationship between income and crime rate/shooting frequency.

The analysis of crime rates per race, particularly focusing on shooting frequencies, reveals significant disparities, especially in certain districts of Boston where higher shooting frequencies are observed, with data segmented by race and gender. This underscores the complex interplay of socio-economic factors such as poverty and education, which potentially impact the frequency of shootings.

what are the factors of shooting accident within the area of Boston

2. What was the best part of the teams work?

The visual graphs were very informative, such as finding that Brighton and Roxbury had the highest frequency of shootings. Looking at unemployment rate was also an informative metric to use.

the graphics are very nice and I like how the color of the bars indicates a certain level on the scale. lots of good analyses

They explored the data from many different aspects, such as gender, income, race, poverty, education, etc.

The team's collaborative effort yielded valuable insights, notably identifying areas such as Brighton and Roxbury with the highest shooting frequencies. Through thorough analysis, the group concluded that individuals involved in these shootings are predominantly Black, often residing in lower-income households.

They did good job at visualizing the data and stats by using different models to indicate different areas in Boston, especially the Roxbury and Brighton areas

3. How would you suggest improving the team's work?

If they used a model to predict frequency of shootings in other districts to see how the graph would look it would be interesting

I am unsure if they have a prediction model but incorporating one of those would be the biggest improvement

I think overall it's going pretty well. Eric mainly focused on explaining how they cleaned the data and find the data. He didn't go in very details, but sounds fun!

Including other variables such as neighborhood demographics, police presence, and access to social services may also contribute to variations in crime rates and shooting incidents.

They could look more into different variable that could affect this shooting accident percentage that would make the data more convicing

4. Do you have any other comments or ideas?

In addition to the analysis of shooting frequencies and demographic patterns, it could be beneficial for the team to explore the underlying factors contributing to these disparities. For example, conducting qualitative research or community engagement sessions to understand residents' perspectives on crime, policing, and neighborhood dynamics could provide valuable context. Additionally, investigating the impact of structural inequalities, such as housing segregation, access to healthcare, and employment opportunities, can further enrich the analysis.

Feedback from presentation by Hirotaka Fujii

1. Describe the main idea of the project

the shooting data in boston based on region as well as race, income and education.

Boston shooting data 2015-2024
In their exploratory analysis, they looked at relationships between shooting incidents vs different variables such as education, employment, race, and median income. They found correlations for race and median income with shooting incidents by district.

This project explores shooting data in Boston. They explore where shooting occur and what factors may lead to this. They explore factors such as high school completion, race, employment, etc.

Boston shooting data. Police data. Comes w when shooting happened, victims, and if were fatal. Race, incident numbers, etc.

The distribution of the Boston shooting data 2021-2024.

relationship between shooting incident and variables such as race and age and district. 1. Average income and shooting incidents (negative correlation). 2. High school completion( not a clear trend)

2. What was the best part of the teams work?

They try to explain the shooting based on various aspects. This provide a general view on how each aspect do or do not has an impact on the subject.

They did a lot of exploratory data. with some well put together graphs. I liked the use of color in their graphs.

I like how their project is very clear and concise. It is very easy to tell what this group's thesis is, and every visualization and explanation they had clearly helped explain their thesis.

nice graphs and interesting findings. well-explained. especially the finding about fatal shootings was interesting

Geographical data and the educational data combination are of clear reasoning. The data is divided into smaller categories of more specific situations such as fatal and non-fatal.

The team really captures a good variables when drawing conclusions and also they use many regression models.

3. How would you suggest improving the team's work?

there is some amont of the data imcompelete. the heat map is incompelete. They may explain more about why choosing such a model to interpret the data

He mentioned working on a heatmap that would be really helpful for visualizing data. They haven't gotten to any regression analysis yet, so no specific conclusions about any variables and how that relates to number of shooting incidents.

The project is a bit repetitive. The first 5-6 graphs all look exactly the same. I feel that they could have used some sort of time series data, or some other kind of data to make their visualizations more exciting.

Working on developing the project for submission. No current suggestions, just continue working.

More regressions analysis is needed in the future steps, a line of best-fit correlation coefficient needed explained?

They could do more explanatory data analysis which is useful when presenting, and conducting more research.

4. Do you have any other comments or ideas?

I think a geographic data visualization would fit really well with this project.

Feedback from presentation by Nathan Rosenblum

1. Describe the main idea of the project

The project is about analyzing shooting data (victims and perpetrators) in the Boston area

Analyzing shooting data in Boston area by neighborhoods

They're investigating shootings in the Boston area separated by district.

analyze the shooting incident across the Boston using the location of the district, and the education level of the

seeing what factors shooting data in the boston area are related to demographics wise

Frequency of shootings with income level, parent background, and educational level as predictor variables

2. What was the best part of the teams work?

I like how they analyze the relationship of median income vs total cases over all years. Their analysis on avg median income vs total cases is interesting.

The graphs they have right now tell good information. For example, they have a bar graph that shows the average income by the different neighborhoods and have the bars shaded to fit the number of shootings in the area.

I liked the variety of figures and how they were able to combine multiple variables onto one figure.

the bar graph of cases in different area is visualize the data well, and I think their topic is interesting and helpful in real world

choosing good variables to run regressions on and making sure they had good correlation coefficients

I think they had a great background on their data and how they merged datasets together. They also have a good understanding of how to go forward with their project and to add more to make it more compelling

3. How would you suggest improving the team's work?

Make sure the trends are more clearer. Strength of trends should also be explained more. Reliability of trends also must be analyzed.

I would suggest changing the color of the bars so that the darker colors indicate more shootings. Also exploring more variety of different graphs as its just bar graphs right now. More information needed for the modeling.

Their project looks good so far, I don't have any suggestions. I think they just need to put everything together and polish the site.

they can improve their regression model to change the variables usages to make the fitted model better.

clearer charts in terms of coloring and axis labels as well as more exploratory data analysis would be good

More work on the analysis on the website itself and more of a comprehensive understanding of what the data is saying.

4. Do you have any other comments or ideas?

Excellent project overall!

Change the color scale of the figures so that darker colors are greater values.

I think they could change some of the graphs to be more visually appealing

Feedback from presentation by John Markowicz

1. Describe the main idea of the project

Gun violence across difference race, interacting with residence on difference distribute. employment status, education level, household weight.

They investigated gun violence in the Boston area. They were interested in socioeconomic factors, as well as, other variables that would influence number of incidents in a district. These included education, number of people below the poverty line, median income and more.

social and economical factors on gun violence in Boston area, looking at different factors such as income and education.

social aspect incidents in Boston. Using income, district, and household informations as features. Roxbury has the most cases of incident and Brighton has the least cases of incident.

Gun issue at MA. Analysis include the number of incidence at different district and poverty level. Roxbury has the highest number of incident cases, while Brighton has the least number of incident cases.

This data includes both fatal and non-fatal shootings, and they analyze if a victim was struck by a bullet within the City of Boston.

2. What was the best part of the teams work?

Plot bar chart to present their income on different race.
Have scatter plot: clean out the outlier
Linear regression mode: have pval, confidence interval for each coefficient.

The graphs were well-labeled and color-coded clearly. They also had linear regressions. It was great to see a variety of graphs being used appropriate to the data being presented.

The topic was really interesting to get into and the data discovery was in depth. I would say their visualization looks good too.

Correlation analysis on their dataset was pretty interesting
Median income has a negative correlation.
district has a positive correlation
poverty has a positive correlation versus cases of incidents.

They use linear regression models, where district is one of the variable. They evaluate the p-value and the coefficient given by the model.

They have analysis about the non institutional population vs distribution by districts and cases, they showed that some districts have very low population.

3. How would you suggest improving the team's work?

Maybe include map will be more interesting. And the districts of Roxbury is unclear, because it separate into west and east, but this group did not encounter that.

We did not get to discuss data equity but given the nature of this information it would be important to highlight any factors that can skew this, over-reporting or under-reporting etc.

Making the blog post presentable would be great! I would love to see more exploration on different location.

Because the team works with a dataset of 10 years, it can be difficult to analyze. They can make comparison between previous years and the recent years through a plot.

Maybe include more variables to do analyze. Right now the group only has their main variables as district and income level. If they include some more variables, they can have a better conclusion for their topic.

Everything is good, but it is better to have more detailed analysis, because I think their analysis is not very enough.

4. Do you have any other comments or ideas?

It pretty informative and interesting. Good job for this project.

Blog Post 1: Data Proposal Feedback

Data set 1

I would try to find some kind of data set that provides the spatial regions of the different districts since this would allow you to incorporate other location level information.

Data set 2

There would be some challenges because I think this data is aggregated and it might be hard to get cross tabulations like the number of white, male, smokers, aged 26-30. Maybe look around for other tobacco related data.

Data set 3

Many possibilities but this also has the challenges of aggregated data. Read about how the data is suppressed as well.
http://edsight.ct.gov/relatedreports/BDCRE%20Data%20Suppression%20Rules.pdf

Final Project Feedback and Grade

Total
121/125 (96.8%)

Data Page

Describe where/how to find data.
You must include a link to the original data source(s). From what you can tell, why was the data collected/curated? Who collected the data?

Evaluation: E

Describe the different data files used and what each variable means.
If you have many variables then only describe the most relevant ones and summarize the rest. Bulletted lists or tables are recommended.

Evaluation: E

Describe any cleaning you had to do for your data.
You must include a link to your load_and_clean_data.R file.
Also, describe any additional R packages you used outside of those covered in class.
Describe how you combined multiple data files and any cleaning that was necessary for that.
Some repetition of what you do in your load_and_clean_data.R file is fine and encouraged if it helps explain what you did.

Evaluation: M

There seem to be issues with how you did the district assignments in the code you are showing: GEOID == c(2136,2137).

Organization, clarity, cleanliness of the page
Make sure to remove excessive warnings, use clean easy-to-read code (without side scrolling), organize with sections, use bullets and other tools, etc.

Evaluation: M

Code is OK but not the easiest to follow. Use the tidyverse!

Analysis Page(s)

Introduce what motivates your Data Analysis (DA)
Which variables and relationships are you most interested in?
What questions are you interested in answering?
Provide context for the rest of the page. This will include figures/tables that illustrate aspects of the data of your question.

Evaluation: E

Modeling and Inference
The page will include some kind of formal statistical model. This could be a linear regression, logistic regression, or another modeling framework.
Explain the ideas and techniques you used to choose the predictors for your model. (Think about including interaction terms and other transformations of your variables.)
Describe the results of your modelling and make sure to give a sense of the uncertainty in your estimates and conclusions.

Evaluation: R

Polynomial doesn't really make sense for categorical.

No interpretation of coefficients or explanation of results.

Explain the flaws and limitations of your analysis
Are there some assumptions that you needed to make that might not hold? Is there other data that would help to answer your questions?

Evaluation: E

Clarity Figures
Are your figures/tables/results easy to read, informative, without problems like overplotting, hard-to-read labels, etc?
Each figure should provide a key insight. Too many figures or other data summaries can detract from this. (While not a hard limit, around 5 total figures is probably a good target.)
Default lm output and plots are typically not acceptable.

Evaluation: M

Why not just show correlations with total cases? Do you care about the other correlations?

Figures are interesting and suggest future questions but those questions remain unanswered.

Tables are just one-step beyond standard lm output.

Clarity of Explanations
How well do you explain each figure/result?
Do you provide interpretations that suggest further analysis or explanations for observed phenomenon?

Evaluation: R

Not a lot of explanation of figures/results.

Organization and cleanliness.
Make sure to remove excessive warnings, use clean easy-to-read code, organize with sections or multiple pages, use bullets, etc.
This page should be self-contained.

Evaluation: E

Big Picture Page

Clarity of Explanation
You should have a clear thesis/goal for this page. What are you trying to show? Provide details that support your thesis but don't go into to much mathematics or statistics. The audience for this page is the general public (to the extent possible).

Evaluation: E

Quality of Figures
Each figure should be very polished and also not too complicated. There should be a clear interpretation of each figure and each figure should have a clear purpose.

Evaluation: M

Don't include 2024 in plot.

Colorss to indicate total cases is hard to parse.

Don't include correlation coefficient.

Creativity
Do your best to make things interesting. Think of a story. Think of how each part of your analysis supports the previous part or provides a different perspective.

Evaluation: M

Not much of a flow/story.

Video

Video Recording
Make a video recording (probably using Zoom) providing a quick explanation of your data and demonstrate some of the conclusions from your EDA.
This video should be no longer than 4 minutes.
Include a link to your video (and password if needed) in your README.md file on your Github repository. You are not required to provide a link on the website.
This can be presented by any subset of the team members.

Evaluation: E

Rest of the Site

General organization and cleanliness of website
The main title of your page is informative.
Each post has an author/description/informative title.
All lab required posts are present.
Each page (including the home page) has a nice featured image associated with it.
Your about page is up to date and clean.
You have removed the generic posts from the initial site template.

Evaluation: E

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.