Revenue Generation Prediction

February 5, 2019 2-minute read

Summary

To predict whether revenue would be generated when a customer visit a website page
Based on the features available for a website we need to predict whether a visit from a customer would result in generation in revenue.
By analysis of a customer’s behavior pattern e.g- type of product been searched, duration of stay, we can determine the optimal variation that would corresponds to maximization of correlation of dependency of target variable on independent feature.
With the help of analysis through visualization calculated better enhancements for more streamline target output, in this case revenue generation.
Deployed clustering algorithm such as KNN to predict the outcome
Arrived at an accuracy of 85.915%

In websites the amount of traffic can determine the revenue generated by keeping an eye on the customer visit on the webpage.

Like any other data, it contain various features which determine the outcome and lay an impact on the outcome.

Based on the above bar charts we can determine -

The features in which we have the highest number of customer visits which helps us to understand the customer trend.
We identify Product Related query to be the most sort out.

In the above pie chart we can determine the dominance of a single category over the rest.

Determining the correlation

Correlation plays an important role factoring the most important features that play highly critical role. By giving an attention to these specfic features would hamper the target greatly.

Following observations are made-

Revenue has a direct proportionality with Page values.
Revenue has inverse proportionality with exit rates and bounce rates.

Modeling and Prediction

This problem corresponds to classification problem and KNN or K nearest neighbor classifier can be employed in usage.

KNN can be used for both classification and regression predictive problems. However, it is more widely used in classification problems in the industry. To evaluate any technique we generally look at 3 important aspects:

Ease to interpret output
Calculation time
Predictive Power

Value of K can be determined by calculating Euclidean distance as our distance metric since it’s the most popular method. The other metrics that can be used are Chebyshev, cosine, etc.

Conclusion

We determine optimal value of K is 4
Model performs with an accuracy of 86%

Link to GitHub repository

Revenue Generation Prediction

Summary

Number of customers in various features

Determining the correlation

Correlation in continuous features

Correlation heatmap

Modeling and Prediction

Optimal number of neighbors

Conclusion

Classfication report

Summary

Number of customers in various features

Determining share of categorical features

Determining the correlation

Correlation in continuous features

Correlation heatmap

Modeling and Prediction

Optimal number of neighbors

Conclusion

Classfication report