Mega collection of data science books and terminology

Mega collection of data science books and terminology

Mega collection of data science books and terminology

More than a thousand keywords with detailed explanations, and hundreds of machine learning / data science books categorized by programming language used to illustrate the concepts.

Here's a selection of keywords, from the mega-list

10 keywords starting with A, this is indeed a small subset of all the keywords starting with A.
A/B Testing - In marketing, A/B testing is a simple randomized experiment with two variants, A and B, which are the control and treatment in the controlled experiment. It is a form of statistical hypothesis testing.

Other names include randomized controlled experiments, online controlled experiments, and split testing. In online settings, such as web design (especially user experience design), the goal is to identify changes to web pages that increase or maximize an outcome of interest (e.g., click-through rate for a banner advertisement).

Adaptive Boosting (AdaBoost) - AdaBoost, short for “Adaptive Boosting”, is a machine learning meta-algorithm formulated by Yoav Freund and Robert Schapire who won the prestigious “Gödel Prize” in 2003 for their work. It can be used in conjunction with many other types of learning algorithms to improve their performance.

Read Also:
History of Data Mining

The output of the other learning algorithms (‘weak learners’) is combined into a weighted sum that represents the final output of the boosted classifier. AdaBoost is adaptive in the sense that subsequent weak learners are tweaked in favor of those instances misclassified by previous classifiers. AdaBoost is sensitive to noisy data and outliers.

In some problems, however, it can be less susceptible to the overfitting problem than other learning algorithms. The individual learners can be weak, but as long as the performance of each one is slightly better than random guessing (i.e., their error rate is smaller than 0.5 for binary classification), the final model can be proven to converge to a strong learner.

While every learning algorithm will tend to suit some problem types better than others, and will typically have many different parameters and configurations to be adjusted before achieving optimal performance on a dataset, AdaBoost (with decision trees as the weak learners) is often referred to as the best out-of-the-box classifier. When used with decision tree learning, information gathered at each stage of the AdaBoost algorithm about the relative ‘hardness’ of each training sample is fed into the tree growing algorithm such that later trees tend to focus on harder to classify examples.

Read Also:
Volume, velocity, and variety: Understanding the three V's of big data

Algorithmic Complexity (AC) - The information content or complexity of an object can be measured by the length of its shortest description. For instance the string “01010101010101010101010101010101” has the short description “16 repetitions of 01″, while “11001000011000011101111011101100” presumably has no simpler description other than writing down the string itself.

More formally, the Algorithmic “Kolmogorov” Complexity (AC) of a string x is defined as the length of the shortest program that computes or outputs x , where the program is run on some fixed reference universal computer.
Agglomerative Hierarchical Clustering (AHC) - Hierarchical clustering algorithms are either top-down or bottom-up. Bottom-up algorithms treat each document as a singleton cluster at the outset and then successively merge (or agglomerate) pairs of clusters until all clusters have been merged into a single cluster that contains all documents. Bottom-up hierarchical clustering is therefore called hierarchical agglomerative clustering or HAC . Top-down clustering requires a method for splitting a cluster. It proceeds by splitting clusters recursively until individual documents are reached.

Read Also:
'Everydata': Littlest Data Causes Biggest Impact

Analysis of Covariance (ANCOVA) - Covariance is a measure of how much two variables change together and how strong the relationship is between them. Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression.



Chief Analytics Officer Europe

25
Apr
2017
Chief Analytics Officer Europe

15% off with code 7WDCAO17

Read Also:
Managing information policy compliance to prevent fraud

Chief Analytics Officer Spring 2017

2
May
2017
Chief Analytics Officer Spring 2017

15% off with code MP15

Read Also:
14 Traits Of The Best Data Scientists

Big Data and Analytics for Healthcare Philadelphia

17
May
2017
Big Data and Analytics for Healthcare Philadelphia

$200 off with code DATA200

Read Also:
Volume, velocity, and variety: Understanding the three V's of big data

SMX London

23
May
2017
SMX London

10% off with code 7WDATASMX

Read Also:
Applications of Predictive Analytics in various industries

Data Science Congress 2017

5
Jun
2017
Data Science Congress 2017

20% off with code 7wdata_DSC2017

Read Also:
Big data continues to integrate into world economies

Leave a Reply

Your email address will not be published. Required fields are marked *