Apache Mahout is a machine learning library built for scalability. Its core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm.

It contains various algorithms which we are defining below. Each of them may define multiple implementations. A mojority but not all of the implementations are distributed.

## Classification

Classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.

## Clustering

Clustering is the task of grouping a set of objects in such a way that objects in the same group (called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

## Pattern mining

Pattern mining is a data mining method that involves finding existing patterns in data. In this context patterns often means association rules.

## Regression analysis

Regression analysis is a statistical technique for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables.

## Dimension reduction

Dimension reduction is the process of reducing the number of random variables under consideration and can be divided into feature selection and feature extraction.

## Evolutionary algorithm

Evolutionary algorithm uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the environment within which the solutions “live”

## Recommenders / Collaborative filtering

Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.

## Vector Similarity

Vector Similarity allows one to compare one or more vectors with another set of vectors.

## Collocation

Collocation defines a sequence of words or terms that co-occur more often than would be expected by chance.

## Leave A Comment