Home Price Prediction


Summary

Summary

  • Created a tool that calculates and predicts home prices based on various features.
  • Based on the public release data available on Kaggle, the features are employed.
  • Engineered features to calculate dependencies of the target variable on the independent variables.
  • Optimized Linear, Lasso and Decision Tree regressor using GridSearchCV to reach best model.
  • Achieved an accuracy of 90.62% in correctly predicting the outcome.

Some of the major factors include price, number of bedrooms , bathrooms, location, etc… By correlating the different features we can arrive on a decision based on the prediction. Since the output here is continuous therefore it is termed as a regression problem.

The target variable price depends on numerous features but we can identify it’s distribution by visualizing.

Price distribution

By determining the correlation between the variables we can analyse the dependency as well as the correlation. A classification model attempts to draw some conclusion from observed values. Given one or more inputs a classification model will try to predict the value of one or more outcomes.

Correlation

Relationship between various features

Determining the relationship between the variables helps us understand the features which are most import and dynamically change the result by even smallest enhancements.

  • Important features are identified

  • Level of correlation

  • Determining negative or positive impact of the feature

Feature engineering

Feature engineering involves outlier detection as well as removing datapoints that may very well disturb the balance of a gaussian distribution.

Model building

The problem discussed in this project indicates that a continuous value is in the form of the output. It is an example of Regression analysis. Regression problems can be countered with algorithms such as -

  • Linear Regression
  • Lasso Regression
  • Decision Tree regressor

Model optimization

Model optimization can be performed by GridSearchCV. In this method we supply different of parameters to find result desirable for our outcome.

Result

Using GridSearchCV we determine that Linear Regression based machine learning model works best with an accuracy of 90.62%

Link to GitHub repository