This project was carried out as a group for CS210 Introduction to Data Science Course.
This project explores the relationship between education level, age group, and crime rates in Turkey. Our analysis includes data preprocessing, exploratory data analysis (EDA), hypothesis testing, and machine learning.
The dataset was obtained from the Turkish Statistical Institute (TÜİK) website, including different metrics about crime statistics such as age groups and education levels of crime committers.
The initial step involved cleaning the dataset provided in Excel format to ensure its usability in subsequent stages. This process included handling missing values, removing outliers, and standardizing formats of the data and labels to prepare the data for analysis.
Key findings about crime committers' demographics were illustrated through various visualizations such as bar charts, heatmaps, and line graphs. We analyzed patterns across age groups, gender, and education levels, and computed the crime rate by including general population data from TÜİK.
We tested the hypothesis (H0) that there is no association between age and the likelihood of committing a crime. A Chi-square test was conducted to examine the relationship between these variables.
We used linear regression to predict crime rates for subsequent years. The predictions were visualized using a heatmap to show the predicted crime numbers across different age groups more comprehensively.