How to avoid Over fitting using Regularization In Machine Learning Algorithms?

Leave a comment

“Among competing hypotheses, the one with the fewest assumptions should be selected. Other, more complicated solutions may ultimately prove correct, but—in the absence of certainty—the fewer assumptions that are made, the better.”Business Situation:In the world of analytics, where we try to fit a curve to every pattern, Over-fitting is one of the biggest concerns. However, in general models are equipped enough to avoid over-fitting, but in general there is a manual intervention required to make

Learn More

Feature Engineering

Leave a comment

How to transform variables and create new ones?One of common advice machine learning experts have for beginners is – focus on Feature Engineering. Be it a beginner building his first model or some one who has won Kaggle competitions – following this advice works wonders for every one!I have personally seen predictive power of several models improve significantly with application of feature engineering.What is Feature Engineering?Feature engineering is the science (and art) of extracting more

Learn More

Machine Learning Data Exploration & Preparation

Leave a comment

IntroductionHypothesis generation requires you to have structured thinking whereas data exploration requires patience to slice and dice data in multiple ways. In this article, I will focus on the steps required to clean and understand data in a comprehensive way.To improve your structured thinking, I would suggest you to check out the flawless post written by Kunal – “Tools to Improve structure Thinking“.7 Steps of Data Exploration and Preparation (Part 1)Remember the quality of your inputs

Learn More

Machine Learning Data Preprocessing – Common data preparation mistakes and how to avoid them?(Part – 1 )

Leave a comment

A few days back, one of my friend was building a model to predict propensity of conversion of leads procured through an Online Sales partner. While presenting his findings to stakeholders, one of the insights he mentioned lead to a very involved discussion in the room. The insight was as follows:The higher the number of times a lead is shared by partner, higher are its chances of conversion.Following arguments were presented during the debate which ensued:Group 1 (Pro-insight) main hypothesis:

Learn More

Data Science

Leave a comment

Skills for Data Scientist:

Data scientist way of thinking:

In early stages, when we have a problem, physicist or chemist understand the problem and come up with theory. And, engineers took these equation and change few parameters to…

Learn More

Tags