GSP algorithm (Generalized Sequential Pattern algorithm) is an algorithm used for sequence mining. The algorithms for solving sequence mining problems are mostly based on the apriori (level-wise) algorithm. One way to use the level-wise paradigm is to first discover all the frequent items in a level-wise fashion. It simply means counting the occurrences of all singleton elements in the database. Then, the transactions are filtered by removing the non-frequent items. At the end of this step, each transaction consists of only the frequent elements it originally contained. This modified database becomes an input to the GSP algorithm. This process requires one pass over the whole database.

GSP algorithm makes multiple database passes. In the first pass, all single items (1-sequences) are counted. From the frequent items, a set of candidate 2-sequences are formed, and another pass is made to identify their frequency. The frequent 2-sequences are used to generate the candidate 3-sequences, and this process is repeated until no more frequent sequences are found. There are two main steps in the algorithm.

  • Candidate Generation. Given the set of frequent (k-1)-frequent sequences Fk-1, the candidates for the next pass are generated by joining F(k-1) with itself. A pruning phase eliminates any sequence, at least one of whose subsequences is not frequent.
  • Support Counting. Normally, a hash tree–based search is employed for efficient support counting. Finally non-maximal frequent sequences are removed.

Algorithm

edit
   F1 = the set of frequent 1-sequence
   k=2,
   do while Fk-1 != Null;
       Generate candidate sets Ck (set of candidate k-sequences);
       For all input sequences s in the database D
       do
           Increment count of all a in Ck if s supports a
       End do
       Fk = {a ∈ Ck such that its frequency exceeds the threshold}
       k = k+1;
   End do
   Result = Set of all frequent sequences is the union of all Fk's

The above algorithm looks like the Apriori algorithm. One main difference is however the generation of candidate sets. Let us assume that:

A → B and A → C

are two frequent 2-sequences. The items involved in these sequences are (A, B) and (A,C) respectively. The candidate generation in a usual Apriori style would give (A, B, C) as a 3-itemset, but in the present context we get the following 3-sequences as a result of joining the above 2- sequences

A → B → C, A → C → B and A → BC

The candidate–generation phase takes this into account. The GSP algorithm discovers frequent sequences, allowing for time constraints such as maximum gap and minimum gap among the sequence elements. Moreover, it supports the notion of a sliding window, i.e., of a time interval within which items are observed as belonging to the same event, even if they originate from different events.

See also

edit

References

edit
  • R. Srikant and R. Agrawal. 1996. Mining Sequential Patterns: Generalizations and Performance Improvements. In Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology (EDBT '96), Peter M. G. Apers, Mokrane Bouzeghoub, and Georges Gardarin (Eds.). Springer-Verlag, London, UK, UK, 3–17.
  • Pujari, Arun K. (2001). Data Mining Techniques. Universities Press. pp. 256–260. ISBN 81-7371-380-4.
  • Zaki, M.J. Machine Learning (2001) 42: 31.
edit
  • SPMF includes an open-source implementation of the GSP algorithm as well as PrefixSpan, SPADE, SPAM, ClaSP, CloSpan and BIDE.

📚 Artikel Terkait di Wikipedia

Sequential pattern mining

PrefixSpan algorithm and place the products on shelves based on the order of mined purchasing patterns. Commonly used algorithms include: GSP algorithm Sequential

Ant colony optimization algorithms

computer science and operations research, the ant colony optimization algorithm (ACO) is a probabilistic technique for solving computational problems

Internet water army

Some scholars adopted the Dirichlet process mixture model (DPMM)-based GSP algorithm to detect Internet water armies from Tianya forum. They used DPMM to

Belief propagation

survey propagation algorithms are two different improvements to belief propagation. The name generalized survey propagation (GSP) is waiting to be assigned

Generalized second-price auction

The generalized second-price auction (GSP) is a non-truthful auction mechanism for multiple items. Each bidder places a bid. The highest bidder gets the

Series–parallel graph

series–parallel graphs (GSP-graphs) are an extension of the SP-graphs with the same algorithmic efficiency for the mentioned problems. The class of GSP-graphs include

Zach Sage Fox

(2017-10-26). "Altitude Boards 'The Mystery Of DB Cooper'; UK's Goldfinch Ent, GSP Studios Merge – AFM". Deadline. Retrieved 2026-05-08. Gonzalez, Umberto (2018-10-01)

Sponsored search auction

to results from a search engine that are not output by the main search algorithm, but rather clearly separate advertisements paid for by third parties