The Dataset has been extracted from Kaggle to predict the house prices in the city of Melbourne,Australia.
Following are the featurs used:
This data was scraped from publicly available results posted every week from Domain.com.au, I've cleaned it as best I can, now it's up to you to make data analysis magic. The dataset includes Address, Type of Real estate, Suburb, Method of Selling, Rooms, Price, Real Estate Agent, Date of Sale and distance from C.B.D.
….Now with extra data including including property size, land size and council area, you may need to change your code!
Some Key Details Suburb: Suburb
Address: Address
Rooms: Number of rooms
Price: Price in Australian dollars
Method: S - property sold; SP - property sold prior; PI - property passed in; PN - sold prior not disclosed; SN - sold not disclosed; NB - no bid; VB - vendor bid; W - withdrawn prior to auction; SA - sold after auction; SS - sold after auction price not disclosed. N/A - price or highest bid not available.
Type: br - bedroom(s); h - house,cottage,villa, semi,terrace; u - unit, duplex; t - townhouse; dev site - development site; o res - other residential.
SellerG: Real Estate Agent
Date: Date sold
Distance: Distance from CBD in Kilometres
Regionname: General Region (West, North West, North, North east …etc)
Propertycount: Number of properties that exist in the suburb.
Bedroom2 : Scraped # of Bedrooms (from different source)
Bathroom: Number of Bathrooms
Car: Number of carspots
Landsize: Land Size in Metres
BuildingArea: Building Size in Metres
YearBuilt: Year the house was built
CouncilArea: Governing council for the area
Lattitude: Self explanitory
Longtitude: Self explanitory
**Steps for execution:
- Extracted the Data
- Perfromed EDA for the different features to check how they ar correlated.
- Imputed the missing values and dropped features for the missing values which can't be imputed, because a major chunk of missing values were missing from the feature.
- Encodied the categorical features, while keeping in mind the curse of dimensionality.
- Seleted the relevent features to be used while modelling using the SelectKBest (Chi-Square method).
- Perfromed Feature Scaling using MinMaxScaler.
- Perfromed a train_train_split for the data to split it into 75% training and 25% testing data.
- Created a Machine Learning Model using RandomForestRegressor Algorithm using n_estimators=700.
- Trained a model and made predictions.
- Plotted a scatter plot for y_test and predictions they follow an increasing linear relationship.
- Plotted a residual as well using distplot by calculating y_test-predictions.