Rocchio Classification

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean (centroid) is closest to the observation. When applied to text classification using word vectors containing tf*idf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback.[1]

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors.[2]

Algorithm

edit

Training

edit

Given labeled training samples with class labels , compute the per-class centroids where is the set of indices of samples belonging to class .

Prediction

edit

The class assigned to an observation is .

See also

edit

References

edit
  1. ^ Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press.
  2. ^ Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences. 99 (10): 6567–6572. Bibcode:2002PNAS...99.6567T. doi:10.1073/pnas.082099299. PMC 124443. PMID 12011421.

📚 Artikel Terkait di Wikipedia

K-nearest neighbors algorithm

also be applied.[how?] Mathematics portal Nearest centroid classifier Closest pair of points problem Nearest neighbor graph Segmentation-based object categorization

K-means clustering

each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid). This results in a partitioning of the data space

Outline of machine learning

regression (LARS) Classifiers Probabilistic classifier Naive Bayes classifier Binary classifier Linear classifier Hierarchical classifier Dimensionality

Rocchio algorithm

model, though they both contain similar origins. Nearest centroid classifier, aka Rocchio classifier Christopher D. Manning, Prabhakar Raghavan, Hinrich

Triangle

its centroid in a uniform gravitational field. The centroid cuts every median in the ratio 2:1, i.e. the distance between a vertex and the centroid is

Cluster analysis

assignment step, which labels each point by its nearest centroid, and an update step, which recomputes each centroid as the mean of its assigned points. Convergence

Planigon

Generated by centroid-edge midpoint construction by polygon-centroid-vertex detection, rounding the angle of each co-edge to the nearest 15 degrees. Since

Oversampling and undersampling in data analysis

undersampling is effective, refer to Cluster centroids is a method that replaces cluster of samples by the cluster centroid of a K-means algorithm, where the number