Big data is becoming part of every company’s business. Although there are various technologies for collecting data, the real value is in the data analysis, and application of the results to business planning. Understanding the algorithms is important to determine what algorithms can be used for various data problems. This presentation provides an understanding of two of the most common yet powerful data mining algorithms – the Apriori algorithm and the Bayesian classifier. The objective is to demystify the mathematics, provide an intuitive understanding of the algorithms and Java code (on GitHub) to demonstrate the simplicity of implementation. As a example of the Apriori algorithm...“On Friday afternoons, married men under the age of 35 tend to buy both diapers and beer”. Observations such as this can be used for product placement, advertising, etc. The Apriori algorithm is used to develop item relationships and association rules from a transactional log. In this example, the transactional log is purchases at a store. Another example is the on-line book seller that indicates “people who bought this book also bought the following items” or “the following items were commonly purchased together”. The second algorithm is the Bayesian classifier for data classification. Data classification learns a model from labeled training data and predicts the classification of new data. The Bayesian classifier is one of the most important classifiers because it is highly accurate, very simple to implement and efficient. Although wrapped in probability theory, the implementation of Bayes Theory for classifiers is fairly simple. The Bayesian classifier can be implemented using simple loops and counters. Two classifiers will be explained including a document classifier that is typically used as the basis for email classification (spam filtering). Spend a little time coming up to speed on the two most important algorithms you will encounter on a
path to being a data scientist!