📑 Table of Contents

In statistics and machine learning, discretization refers to the process of converting or partitioning continuous attributes, features or variables to discretized or nominal attributes/features/variables/intervals. This can be useful when creating probability mass functions – formally, in density estimation. It is a form of discretization in general and also of binning, as in making a histogram. Whenever continuous data is discretized, there is always some amount of discretization error. The goal is to reduce the amount to a level considered negligible for the modeling purposes at hand.

Typically data is discretized into partitions of K equal lengths/width (equal intervals) or K% of the total data (equal frequencies).[1]

Mechanisms for discretizing continuous data include Fayyad & Irani's MDL method,[2] which uses mutual information to recursively define the best bins, CAIM, CACC, Ameva, and many others[3]

Many machine learning algorithms are known to produce better models by discretizing continuous attributes.[4]

Software

edit

This is a partial list of software that implement MDL algorithm.

See also

edit

References

edit
  1. ^ Clarke, E. J.; Barton, B. A. (2000). "Entropy and MDL discretization of continuous variables for Bayesian belief networks" (PDF). International Journal of Intelligent Systems. 15: 61–92. doi:10.1002/(SICI)1098-111X(200001)15:1<61::AID-INT4>3.0.CO;2-O. Retrieved 2008-07-10.
  2. ^ Fayyad, Usama M.; Irani, Keki B. (1993) "Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning" (PDF). 29 July 2023. hdl:2014/35171., Proc. 13th Int. Joint Conf. on Artificial Intelligence (Q334 .I571 1993), pp. 1022-1027
  3. ^ Dougherty, J.; Kohavi, R.; Sahami, M. (1995). "Supervised and Unsupervised Discretization of Continuous Features". In A. Prieditis & S. J. Russell, eds. Work. Morgan Kaufmann, pp. 194-202
  4. ^ Kotsiantis, S.; Kanellopoulos, D (2006). "Discretization Techniques: A recent survey". GESTS International Transactions on Computer Science and Engineering. 32 (1): 47–58. CiteSeerX 10.1.1.109.3084.


📚 Artikel Terkait di Wikipedia

Discretization

applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This

Discrete mathematics

with natural numbers), rather than "continuous" (analogously to continuous functions). Objects studied in discrete mathematics include integers, graphs

Grouped data

(statistics) Data binning Partition of a set Level of measurement Frequency distribution Discretization of continuous features Logistic regression § Minimum

Data binning

(statistics) Discretization of continuous features Grouped data Histogram Level of measurement Quantization (signal processing) Rounding "Use of binning in

Granular computing

implementation of intelligent systems. Rough Sets, Discretization Type-2 Fuzzy Sets and Systems An, Aijun; Cercone, Nick (1999), "Discretization of continuous attributes

Discrete calculus

called infinitesimal calculus or "the calculus of infinitesimals", is the study of continuous change. Discrete calculus has two entry points, differential

Discrete Fourier transform

the function means discretizing its frequency spectrum and discretization means periodic summation of the spectrum, the discretized and periodically summed

List of statistics articles

distribution Discrete probability distribution – redirects to section of Probability distribution Discrete time Discretization of continuous features Discriminant