Detecting fake job postings with deep learning
I have used deep learning to solve a binary classification problem: "Is this job description real? Isn't it a fake one?"
The used dataset can be found here.
After reading data from the CSV file they should be vectorized, so I used tf-idf algorithm for the strings. Then, I implemented a fully-connected neural network in PyTorch framework for processing those vectors:
class Network(nn.Module):
def __init__(self, input_size=NETWORK_INPUT_SIZE, output_size=NETWORK_OUTPUT_SIZE):
super(Network, self).__init__()
self.fc1 = nn.Linear(input_size, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 32)
self.fc5 = nn.Linear(32, 16)
self.fc6 = nn.Linear(16, 8)
self.fc7 = nn.Linear(8, 4)
self.fc8 = nn.Linear(4, output_size)
def forward(self, x):
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
x = F.relu(x)
x = self.fc3(x)
x = F.relu(x)
x = self.fc4(x)
x = F.relu(x)
x = self.fc5(x)
x = F.relu(x)
x = self.fc6(x)
x = F.relu(x)
x = self.fc7(x)
x = F.relu(x)
x = self.fc8(x)
return x
We have an imbalanced dataset for this binary classification problem. Because of that, I have used torch.nn.BCEWithLogitsLoss
as my loss function. And for the cross-validation part, skorch library has been used in my code.
After running the code, a confusion matrix and some related statistics will be shown to you:
Predict real fake
Actual
real 16864 150
fake 384 482
Overall Statistics:
95% CI (0.96764,0.97263)
Kappa 0.62834
NIR 0.95157
Overall ACC 0.97013
Class Statistics:
Classes real fake
ACC(Accuracy) 0.97013 0.97013
ERR(Error rate) 0.02987 0.02987
F0.5(F0.5 score) 0.9804 0.71008
F1(F1 score - harmonic mean of precision and sensitivity) 0.98441 0.64352
F2(F2 score) 0.98846 0.58838
FN(False negative/miss/type 2 error) 150 384
FNR(Miss rate or false negative rate) 0.00882 0.44342
FP(False positive/type 1 error/false alarm) 384 150
FPR(Fall-out or false positive rate) 0.44342 0.00882
PPV(Precision or positive predictive value) 0.97774 0.76266
TN(True negative/correct rejection) 482 16864
TNR(Specificity or true negative rate) 0.55658 0.99118
TP(True positive/hit) 16864 482
TPR(Sensitivity, recall, hit rate, or true positive rate) 0.99118 0.55658
First of all, install the dependencies:
pip3 install -r requirements.txt
Then, run the project using Python version 3:
python3 main.py
fake-job-posting-detection's People
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.