Apache Spark Tutorial (Part 2 – RDD)

Resilient Distributed Datasets (RDD): RDD is an abstraction, a fundamental unit of data and computation in Spark. As the name indicates, among others, they have...

Learn More

Apache Spark Tutorial (Part 1 – Introduction & Architecture)

INTRODUCTION Apache Spark is being an open-source distributed data processing engine for clusters, which provides a unified programming model engine across different types of data...

Learn More

Decision Tree Algorithms Simplified 2

One of the advantages of using a Decision tree is that it efficiently identifies the most significant variable and splits the population on it. In...

Learn More

Decision Tree Simplified!

What is a Decision Tree? Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification...

Learn More

Comparing a Random Forest to a CART model part 2

Random forest is one of the most commonly used algorithm in Kaggle competitions. Along with a good predictive power, Random forest model are pretty simple...

Learn More

Tags