Hypothesis Testing is done on a payroll database using various testing methods.
Dataset link: https://drive.google.com/file/d/1ZyB6LInrRhh27Knp6CnPcolE5muIDWo2/view?usp=sharing
-
Importing Relevant Librari es: Pandas, Numpy, Seaborn, Matplotlib, Sklearn, Scipy, Warning, Researchpy
-
Exploratory Data Analysis:
- Checking variables
df.info()
- Checking Null Values
df.isnull().sum()
- Correlation Matrix
- Visualization
- Preparing the dataset:
- Dropping Correlated Rows
- Removing Dollar Sign & Converting to float
- Renaming the fields
- Scaling the fields
- Grouping the dataset
- Normality Test:
- Visually: Q-Q Plot
- Shapiro-Wilk Test
-
Sampling: Divided into 4 samples according to year 2013,2014,2015,2016
-
Levene’s test: Checking variance
Stats, Pvalue = stats.levene(sample_1['Annual_sal'], sample_2['Annual_sal'])
- Student’s test:
std_err = np.std(sample_3['Base_Pay'])/np.sqrt(48063) z_stat = (np.mean(sample_4['Base_Pay'])-np.mean(sample_3['Base_Pay']))/std_err
- Student’s Independent test:
t_stat, df, cv, p = independent_ttest(data1, data2, alpha)
- ANOVA test:
_,p=st.f_oneway(groups[comb[0]],groups[comb[1]])