Employee Resignation Prediction

Summary

  • Created a tool that predicts when an employee is going to resign in order to help perfect employee management and improve working space environment.
  • Based on the public release data available on Kaggle, the features are employed.
  • Engineered features to calculate dependencies of the target variable on the independent variables.
  • Optimized Logistic regression, Random Forest, Decision Tree and Naive Bayes classifier using GridSearchCV to reach best model.
  • Achieved an accuracy of 95% in correctly predicting the outcome.
  • Developed a public facing API for live user interaction using Streamlit imported as library.
  • Deployed the Web App based on cloud application platform as a service through Heroku.

Getting to understand employees is a major task in any successful organization. Various factors combine enough to create a snowball effect in the minds of employee on whether he/she sees the future of oneself in their current working environment.

Some of the major factors include Salary, Satisfaction level, promotion, average working hours, etc… By correlating the different features we can arrive on a decision based on the prediction. Since the output here is whether the employee leaves or not therefore it is termed as a classification problem.

Impact of salary on employee retention

Impact of average working hours on employee retention

Dynamic relationship between average montly hours, salary, left and promotion last 5 years

Conclusion

From above charts we can conclude these points-

  • Employees with high salaries are more likely to stay with the company for longer duration whereas mid to lower tier salary employees are more likely to quit.

  • Average monthly hours plays a major role in mindset on employees. From the visualization we can clearly see that some employees even with similar working hours have a high difference in quitting an organization just because they lie in different salary cap.

  • Finally a reward for studious work is also expected from employees as we can see. Based on the data, promotion in last 5 years plays a role to defining a future of an individual with the group.

Preprocessing

  • It consisted of 3 major parts:
  1. Handling class imbalance
  1. Handling non numerical features
  • Achieved with help of Dummy variables created with Pandas get_dummy function
  1. Scaling the features
  • Sklearn’s preprocessing library contains Standard Scalar which scales the data points with help of mean and standard deviation of the data.

Model building

  • A classification model attempts to draw some conclusion from observed values.
  • Given continuous or categorical or both features, the model classifies according to the target class.
  • Popular algorithms that can be used for binary classification include:
  1. Logistic Regression
  2. k-Nearest Neighbors
  3. Decision Trees
  4. Support Vector Machine
  5. Naive Bayes

Result

  • sklearn’s Ensemble library lets us access Random Forest ML technique which helps in determining important features
  • with the help of Streamlit’s button function, features can be showcased to the user.

Most Important Features

By employing the GridSearchCV on the above classifying algorithms we conclude-

SVC of support vector machine to be optimum with 95% accuracy

User Interface

  • Creating a user interface lets new users to visualize practical application of a machine learning model.
  • This WebApp is based on open source app framework called Streamlit.
  • More about this framework Here
  • Here is demo of the API

WebApp Demo

  • With user clicking a button, important features can be seen.

Link to GitHub repository