Definition 2. (Expected-Support-bas

Definition 2. (Expected-Support-based Frequent Itemset) Given an uncertain transaction database U DB which in- cludes N transactions, and a minimum expected support ratio, min esup, an itemset X is an expected support-based frequent itemset if and only if esup(X ) ≥ N × min esup

Example 1. (Expected Support-based Frequent Itemset) Given an uncertain database in Table 1 and the minimum ex- pected support, min esup=0.5, there are only two expected support-based frequent itemsets: A(2.1) and C (2.6) where the number in each bracket is the expected support of the corresponding itemset.

Definition 3. (Frequent Probability) Given an uncertain transaction database U DB which includes N transactions, a minimum support ratio min sup, and an itemset X , X ’s frequent probability, denoted as P r(X ), is shown as follows:
P r(X ) = P r{sup(X ) ≥ N × min sup}

Definition 4. (Probabilistic Frequent Itemset) Given an uncertain transaction database U DB which includes N trans- actions, a minimum support ratio min sup, and a probabilis- tic frequent threshold pf t, an itemset X is a probabilistic frequent itemset if X ’s frequent probability is larger than the probabilistic frequent threshold, namely,
P r(X ) = P r{sup(X ) ≥ N × min sup} > pf t

Example 2. (Probabilistic Frequent Itemset) Given an un- certain database in Table 2, min sup=0.5, and pf t = 0.7, the probability distribution of the support of A is shown in Table 2. So, the frequent probability of A is: P r(X ) = P r{sup(A) ≥ 4 × 0.5} = P r{sup(A) ≥ 2} = P r{sup(A) =
2} + P r{sup(A) = 3} = 0.4 + 0.32 > 0.7 = pf t. Thus, {A}
is a probabilistic frequent itemset.

3. ALGORITHMS OF FREQUENT ITEM- SET MINING
We categorize the eight representative algorithms into three groups. The first group is the expected support-based fre- quent algorithms. These algorithms aim to find all expected support-based frequent itemsets. For each itemset, these algorithms only consider the expected support to measure its frequency. The complexity of computing the expected support of an itemset is O(N ), where N is the number of transactions. The second group is the exact probabilistic frequent algorithms. These algorithms discover all proba- bilistic frequent itemsets and report exact frequent proba- bility for each itemset. Due to complexity of computing the exact frequent probability instead of the simple expectation,

these algorithms need to spend at least O(N logN ) compu- tation cost for each itemset. Moreover, in order to avoid redundant processing, the Chernoff bound-based pruning is a way to reduce the running time of this group of algorithm- s. The third group is the approximate probabilistic frequent algorithms. Due to the sound properties of the Poisson Bi- nomial distribution, this group of algorithms can obtain the approximate frequent probability with high quality by on- ly acquiring the first moment (expectation) and the second moment (variance). Therefore, the third kind of algorithms have the O(N ) computation cost and return the complete probability information when uncertain databases are large enough. To sum up, the third kind of algorithms actually build a bridge between two different definitions of frequent itemsets over uncertain databases.

3.1 Expected Support-based Frequent Algo- rithms
In this subsection, we summarize three the most represen- tative expected support-based frequent itemset mining algo- rithms: U Apriori [17, 18], U F P − growth [22], U H − M ine [4]. The first algorithm is based on the generate-and-test framework employing the breath-first search strategy. The other two algorithms are based on the divide-and-conquer framework which uses the depth-first search strategy. Al- though Apriori algorithm is slower than the other two al- gorithms in deterministic databases, UApriori which is the uncertain version of Apriori, actually performs rather well among the three algorithms and is usually the fastest one in dense uncertain datasets based on our experimental result- s in Section 4. We further explain three algorithms in the following subsections and Section 4.

3.1.1 UApriori
The first expected support-based frequent itemset mining algorithm was proposed by Chui et al. in 2007 [18]. This algorithm extends the well-known Apriori algorithm [17, 18] to the uncertain environment and uses the generate-and- test framework to find all expected support-based frequent itemsets. We generally introduced UApriori algorithm as follows. The algorithm first finds all the expected support- based frequent items firstly. Then, it repeatedly joins all expected support-based frequent i-itemsets to produce i + 1- itemset candidates and test i+1-itemset candidates to obtain expected support-based frequent i + 1-itemsets. Finally, it ends when no expected support-based frequent i+1-itemsets are generated.
Fortunately, the well-known downward closure property [8] still works in uncertain databases. Thus, the traditional Apriori pruning can be used when we check whether an item- set is an expected support-based frequent itemset. In other words, all supersets of this itemset must not be expected support-based frequent itemsets. In addition, several decre- mental pruning methods [17, 18] were proposed for further improving the efficiency. These methods mainly aim to find the upper bound of the expected support of an itemset as early as possible. Once the upper bound is lower than the minimum expected support, the traditional Apriori pruning can be used. However, the decremental pruning methods de- pend on the structure of datasets, thus, the most important pruning method in UApriori is still the traditional Apriori pruning.

Example 1. (Expected Support-based Frequent Itemset) Given an uncertain database in Table 1 and the minimum ex- pected support, min esup=0.5, there are only two expected support-based frequent itemsets: A(2.1) and C (2.6) where the number in each bracket is the expected support of the corresponding itemset.

Definition 3. (Frequent Probability) Given an uncertain transaction database U DB which includes N transactions, a minimum support ratio min sup, and an itemset X , X ’s frequent probability, denoted as P r(X ), is shown as follows:
P r(X ) = P r{sup(X ) ≥ N × min sup}

Definition 4. (Probabilistic Frequent Itemset) Given an uncertain transaction database U DB which includes N trans- actions, a minimum support ratio min sup, and a probabilis- tic frequent threshold pf t, an itemset X is a probabilistic frequent itemset if X ’s frequent probability is larger than the probabilistic frequent threshold, namely,
P r(X ) = P r{sup(X ) ≥ N × min sup} > pf t

Example 2. (Probabilistic Frequent Itemset) Given an un- certain database in Table 2, min sup=0.5, and pf t = 0.7, the probability distribution of the support of A is shown in Table 2. So, the frequent probability of A is: P r(X ) = P r{sup(A) ≥ 4 × 0.5} = P r{sup(A) ≥ 2} = P r{sup(A) =
2} + P r{sup(A) = 3} = 0.4 + 0.32 > 0.7 = pf t. Thus, {A}
is a probabilistic frequent itemset.

3. ALGORITHMS OF FREQUENT ITEM- SET MINING
We categorize the eight representative algorithms into three groups. The first group is the expected support-based fre- quent algorithms. These algorithms aim to find all expected support-based frequent itemsets. For each itemset, these algorithms only consider the expected support to measure its frequency. The complexity of computing the expected support of an itemset is O(N ), where N is the number of transactions. The second group is the exact probabilistic frequent algorithms. These algorithms discover all proba- bilistic frequent itemsets and report exact frequent proba- bility for each itemset. Due to complexity of computing the exact frequent probability instead of the simple expectation,
 
these algorithms need to spend at least O(N logN ) compu- tation cost for each itemset. Moreover, in order to avoid redundant processing, the Chernoff bound-based pruning is a way to reduce the running time of this group of algorithm- s. The third group is the approximate probabilistic frequent algorithms. Due to the sound properties of the Poisson Bi- nomial distribution, this group of algorithms can obtain the approximate frequent probability with high quality by on- ly acquiring the first moment (expectation) and the second moment (variance). Therefore, the third kind of algorithms have the O(N ) computation cost and return the complete probability information when uncertain databases are large enough. To sum up, the third kind of algorithms actually build a bridge between two different definitions of frequent itemsets over uncertain databases.

3.1 Expected Support-based Frequent Algo- rithms
In this subsection, we summarize three the most represen- tative expected support-based frequent itemset mining algo- rithms: U Apriori [17, 18], U F P − growth [22], U H − M ine [4]. The first algorithm is based on the generate-and-test framework employing the breath-first search strategy. The other two algorithms are based on the divide-and-conquer framework which uses the depth-first search strategy. Al- though Apriori algorithm is slower than the other two al- gorithms in deterministic databases, UApriori which is the uncertain version of Apriori, actually performs rather well among the three algorithms and is usually the fastest one in dense uncertain datasets based on our experimental result- s in Section 4. We further explain three algorithms in the following subsections and Section 4.

3.1.1 UApriori
The first expected support-based frequent itemset mining algorithm was proposed by Chui et al. in 2007 [18]. This algorithm extends the well-known Apriori algorithm [17, 18] to the uncertain environment and uses the generate-and- test framework to find all expected support-based frequent itemsets. We generally introduced UApriori algorithm as follows. The algorithm first finds all the expected support- based frequent items firstly. Then, it repeatedly joins all expected support-based frequent i-itemsets to produce i + 1- itemset candidates and test i+1-itemset candidates to obtain expected support-based frequent i + 1-itemsets. Finally, it ends when no expected support-based frequent i+1-itemsets are generated.
Fortunately, the well-known downward closure property [8] still works in uncertain databases. Thus, the traditional Apriori pruning can be used when we check whether an item- set is an expected support-based frequent itemset. In other words, all supersets of this itemset must not be expected support-based frequent itemsets. In addition, several decre- mental pruning methods [17, 18] were proposed for further improving the efficiency. These methods mainly aim to find the upper bound of the expected support of an itemset as early as possible. Once the upper bound is lower than the minimum expected support, the traditional Apriori pruning can be used. However, the decremental pruning methods de- pend on the structure of datasets, thus, the most important pruning method in UApriori is still the traditional Apriori pruning.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.