More than a thousand keywords with detailed explanations, and hundreds of machine learning / data science books categorized by programming language used to illustrate the concepts.

Here’s a selection of keywords, from the mega-list

10 keywords starting with A, this is indeed a small subset of all the keywords starting with A.

A/B Testing – In marketing, A/B testing is a simple randomized experiment with two variants, A and B, which are the control and treatment in the controlled experiment. It is a form of statistical hypothesis testing.

Other names include randomized controlled experiments, online controlled experiments, and split testing. In online settings, such as web design (especially user experience design), the goal is to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement).

Adaptive Boosting (AdaBoost) – AdaBoost, short for “Adaptive Boosting”, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire who won the prestigious “Gödel Prize” in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance.

The output of the other learning algorithms (‘weak learners’) is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers.

In some problems, however, it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (i.e., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.

While every learning algorithm will tend to suit some problem types better than others, and will typically have many different parameters and configurations to be adjusted before achieving optimal performance on a dataset, AdaBoost (with decision trees as the weak learners) is often referred to as the best out-of-the-box classifier. When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative ‘hardness’ of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder to classify examples.

Algorithmic Complexity (AC) – The information content or complexity of an object can be measured by the length of its shortest description. For instance the string “01010101010101010101010101010101” has the short description “16 repetitions of 01″, while “11001000011000011101111011101100” presumably has no simpler description other than writing down the string itself.

More formally, the Algorithmic “Kolmogorov” Complexity (AC) of a string x is defined as the length of the shortest program that computes or outputs x , where the program is run on some fixed reference universal computer.

Agglomerative Hierarchical Clustering (AHC) – Hierarchical clustering algorithms are either top-down or bottom-up. Bottom-up algorithms treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all documents. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC . Top-down clustering requires a method for splitting a cluster. It proceeds by splitting clusters recursively until individual documents are reached.

Analysis of Covariance (ANCOVA) – Covariance is a measure of how much two variables change together and how strong the relationship is between them. Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression.