Here is a sample project I completed for Py-Spark, in this project, you will find some basic but useful commands documented with respective headers, any Spark Beginner can take a look and give it a try. Hope this helps you kick off !
A Portuguese banking institution—ran a marketing campaign to convince potential customers to invest in bank term deposit. Information related to direct marketing campaigns of the bank are as follows. The marketing campaigns were based on phone calls. Often, the same customer was contacted more than once through phone, in order to assess if they would want to subscribe to the bank term deposit or not.
The following questions were answered by a data analysis with PySpark:
- Load data and create Spark data frame
- Give marketing success rate. (No. of people subscribed / total no. of entries) 2a Give marketing failure rate
- Maximum, Mean, and Minimum age of average targeted customer
- Check quality of customers by checking average balance, median balance of customers
- Check if age matters in marketing subscription for deposit
- Check if marital status mattered for subscription to deposit.
- Check if age and marital status together mattered for subscription to deposit scheme
- Do feature engineering for column—age and find right age effect on campaign