Sparse binary polynomial hashing (SBPH) is a generalization of Bayesian spam filtering that can match mutating phrases as well as single words.

SBPH is a way of generating a large number of features from an incoming text automatically, and then using statistics to determine the weights for each of those features in terms of their predictive values for spam/nonspam evaluation.

edit
  • A paper on the subject as it relates to spam (some article text comes from this document, which is under the GFDL)
  • Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. No Starch Press. 2005. p. 108. ISBN 978-1-59327-052-0.


📚 Artikel Terkait di Wikipedia

Hash function

hashing is known as geometric hashing or the grid method. In these applications, the set of all inputs is some sort of metric space, and the hashing function

Schwartz–Zippel lemma

probabilistic polynomial identity testing. Identity testing is the problem of determining whether a given multivariate polynomial is the 0-polynomial, the polynomial

Vowpal Wabbit

features may be: Binary Numerical Categorical (via flexible feature-naming and the hash trick) Can deal with missing values/sparse-features Other features

List of algorithms

Fowler–Noll–Vo hash function: fast with low collision rate Pearson hashing: computes 8-bit value only, optimized for 8-bit computers Zobrist hashing: used in

Computation of cyclic redundancy checks

derived from the mathematics of polynomial division, modulo two. In practice, it resembles long division of the binary message string, with a fixed number

List of statistics articles

similarity index Spaghetti plot Sparse binary polynomial hashing Sparse PCA – sparse principal components analysis Sparsity-of-effects principle Spatial

Prime number

⁠-independent hashing by using higher-degree polynomials, again modulo large primes. As well as in the hash function, prime numbers are used for the hash table

Outline of machine learning

Farthest-first traversal Fast-and-frugal trees Feature Selection Toolbox Feature hashing Feature scaling Feature vector Firefly algorithm First-difference estimator