📑 Table of Contents

In statistics, quantile normalization is a technique for making two distributions identical in statistical properties. To quantile-normalize a test distribution to a reference distribution of the same length, sort the test distribution and sort the reference distribution. The highest entry in the test distribution then takes the value of the highest entry in the reference distribution, the next highest entry in the reference distribution, and so on, until the test distribution is a perturbation of the reference distribution.

To quantile normalize two or more distributions to each other, without a reference distribution, sort as before, then set to the average (usually, arithmetic mean) of the distributions. So the highest value in all cases becomes the mean of the highest values, the second highest value becomes the mean of the second highest values, and so on.

Generally a reference distribution will be one of the standard statistical distributions such as the Gaussian distribution or the Poisson distribution. The reference distribution can be generated randomly or from taking regular samples from the cumulative distribution function of the distribution. However, any reference distribution can be used.

Quantile normalization is frequently used in microarray data analysis. It was introduced as quantile standardization[1] and then renamed as quantile normalization.[2]

Example

edit

A quick illustration of such normalizing on a very small dataset, organized into columns (1-3) and rows (A-D):

For each column, rank the entries from lowest to highest (i to iv):

Set aside these rank values to use later. Go back to the first set of data. Rearrange each columns' values such that each column is in order from lowest to highest. The result is:

Now find the mean for each row, and rank them lowest to highest (i to iv):

Now take the ranking order from earlier and substitute in the means according to their corresponding ranks:

These are the new normalized values.

However, note that when, as in column two, values are tied in rank, they should instead be assigned the mean of the values corresponding to the ranks they would normally represent if they were different. In the case of column 2, they represent ranks iii and iv. So we assign the two tied rank iii entries the average of rank iii and rank iv ((4.67 + 5.67)/2 = 5.17). And so we arrive at the following set of normalized values:

The new values have the same distribution and can now be easily compared. Here are the summary statistics for each of the three columns:

References

edit
  1. ^ Amaratunga, D.; Cabrera, J. (2001). "Analysis of Data from Viral DNA Microchips". Journal of the American Statistical Association. 96 (456): 1161. doi:10.1198/016214501753381814. S2CID 18154109.
  2. ^ Bolstad, B. M.; Irizarry, R. A.; Astrand, M.; Speed, T. P. (2003). "A comparison of normalization methods for high density oligonucleotide array data based on variance and bias". Bioinformatics. 19 (2): 185–193. doi:10.1093/bioinformatics/19.2.185. PMID 12538238.
edit

📚 Artikel Terkait di Wikipedia

Normalization (statistics)

statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured

Quantile

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities

Q–Q plot

plot (quantile–quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their quantiles against

Normalization

in statistics Quantile normalization, statistical technique for making two distributions identical in statistical properties Normalizing (abstract rewriting)

MA plot

(array 20B v 10A)") library(preprocessCore) #do a quantile normalization x <- normalize.quantiles(y) x11() ma.plot( rowMeans(log2(x)), log2(x[, 1])-log2(x[

List of RNA-Seq bioinformatics tools

sequence bias for RNA-seq. cqn is a normalization tool for RNA-Seq data, implementing the conditional quantile normalization method. EDASeq is a Bioconductor

Microarray analysis techniques

on the number of samples analyzed. Quantile normalization, also part of RMA, is one sensible approach to normalize a batch of arrays in order to make

Choropleth map

example, if the 3,141 counties of the United States were divided into four quantile classes (i.e., quartiles), then the first class would include the 785 poorest