PROJECT DESCRIPTION
This project aims to model credit risk by using logistic regression and decision trees in R. The area of credit risk modeling itself is all about the event of loan default. When a bank grants a loan to a borrower, which could be an individual or a company, the bank will usually transfer the entire amount of the loan to the borrower. The borrower will then reimburse this amount in smaller chunks, including some interest payments, over time. Usually, these payments happen monthly, quarterly, or yearly. However, there is a certain risk that a borrower will not be able to fully reimburse this loan. This results in a loss for the bank. The expected loss a bank will incur is composed of three elements. The first element is the probability of default (PD), which is the probability that the borrower will fail to make a full repayment of the loan. This project focuses on building this PD model.
Banks keep information on the default behavior of past customers, which can be used to predict default for new customers. Broadly, this information can be classified in two types, The first type of information is application information, such as income, marital status, etc. The second type is behavioral information, such as current account balance and payment arrears in account history.
The dataset being used contains information on past loans. Each line represents one customer and his or her information along with a loan status indicator, which equals 1 if the customer defaulted, and 0 if the customer did not default.