Thus, we say that C contains comple

Thus, we say that C contains complete information regarding its corresponding frequent itemsets. On the other hand, M registers only the support of the maximal itemsets. It usually does not contain the complete support information regarding its corresponding frequent itemsets. We illustrate these concepts with Example 6.2.

Example 6.2 Closed and maximal frequent itemsets. Suppose that a transaction database has only two transactions: {(a1, a2, . . . , a100); (a1, a2, . . . , a50)}. Let the minimum support count threshold be min sup = 1. We find two closed frequent itemsets and their support counts, that is, C = {{a1, a2, . . . , a100} : 1; {a1, a2, . . . , a50} : 2}. There is only one max- imal frequent itemset: M = {{a1, a2, . . . , a100} : 1}. Notice that we cannot include
{a1, a2, . . . , a50} as a maximal frequent itemset because it has a frequent superset,
{a1, a2, . . . , a100}. Compare this to the preceding where we determined that there are 2100 − 1 frequent itemsets, which are too many to be enumerated!
The set of closed frequent itemsets contains complete information regarding the fre-
quent itemsets. For example, from C, we can derive, say, (1) {a2, a45 : 2} since {a2, a45} is a sub-itemset of the itemset {a1, a2, . . . , a50 : 2}; and (2) {a8, a55 : 1} since {a8, a55} is not a sub-itemset of the previous itemset but of the itemset {a1, a2, . . . , a100 : 1}. However, from the maximal frequent itemset, we can only assert that both itemsets ({a2, a45} and
{a8, a55}) are frequent, but we cannot assert their actual support counts.

6.2 Frequent Itemset Mining Methods
In this section, you will learn methods for mining the simplest form of frequent pat- terns such as those discussed for market basket analysis in Section 6.1.1. We begin by presenting Apriori, the basic algorithm for finding frequent itemsets (Section 6.2.1). In Section 6.2.2, we look at how to generate strong association rules from frequent item- sets. Section 6.2.3 describes several variations to the Apriori algorithm for improved efficiency and scalability. Section 6.2.4 presents pattern-growth methods for mining frequent itemsets that confine the subsequent search space to only the data sets contain- ing the current frequent itemsets. Section 6.2.5 presents methods for mining frequent itemsets that take advantage of the vertical data format.

6.2.1 Apriori Algorithm: Finding Frequent Itemsets by Confined Candidate Generation
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for min- ing frequent itemsets for Boolean association rules [AS94b]. The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent itemset prop- erties, as we shall see later. Apriori employs an iterative approach known as a level-wise search, where k-itemsets are used to explore (k + 1)-itemsets. First, the set of frequent 1-itemsets is found by scanning the database to accumulate the count for each item, and

Example 6.2 Closed and maximal frequent itemsets. Suppose that a transaction database has only two transactions: {(a1, a2, . . . , a100); (a1, a2, . . . , a50)}. Let the minimum support count threshold be min sup = 1. We find two closed frequent itemsets and their support counts, that is, C = {{a1, a2, . . . , a100} : 1; {a1, a2, . . . , a50} : 2}. There is only one max- imal frequent itemset: M = {{a1, a2, . . . , a100} : 1}. Notice that we cannot include
{a1, a2, . . . , a50} as a maximal frequent itemset because it has a frequent superset,
{a1, a2, . . . , a100}. Compare this to the preceding where we determined that there are 2100 − 1 frequent itemsets, which are too many to be enumerated!
The set of closed frequent itemsets contains complete information regarding the fre-
quent itemsets. For example, from C, we can derive, say, (1) {a2, a45 : 2} since {a2, a45} is a sub-itemset of the itemset {a1, a2, . . . , a50 : 2}; and (2) {a8, a55 : 1} since {a8, a55} is not a sub-itemset of the previous itemset but of the itemset {a1, a2, . . . , a100 : 1}. However, from the maximal frequent itemset, we can only assert that both itemsets ({a2, a45} and
{a8, a55}) are frequent, but we cannot assert their actual support counts.

6.2 Frequent Itemset Mining Methods
In this section, you will learn methods for mining the simplest form of frequent pat- terns such as those discussed for market basket analysis in Section 6.1.1. We begin by presenting Apriori, the basic algorithm for finding frequent itemsets (Section 6.2.1). In Section 6.2.2, we look at how to generate strong association rules from frequent item- sets. Section 6.2.3 describes several variations to the Apriori algorithm for improved efficiency and scalability. Section 6.2.4 presents pattern-growth methods for mining frequent itemsets that confine the subsequent search space to only the data sets contain- ing the current frequent itemsets. Section 6.2.5 presents methods for mining frequent itemsets that take advantage of the vertical data format.

6.2.1 Apriori Algorithm: Finding Frequent Itemsets by Confined Candidate Generation
Apriori is a seminal algorithm proposed by R. Agrawal and R. Srikant in 1994 for min- ing frequent itemsets for Boolean association rules [AS94b]. The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent itemset prop- erties, as we shall see later. Apriori employs an iterative approach known as a level-wise search, where k-itemsets are used to explore (k + 1)-itemsets. First, the set of frequent 1-itemsets is found by scanning the database to accumulate the count for each item, and

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

Vì vậy, chúng ta nói rằng C có chứa các thông tin đầy đủ về itemsets thường xuyên tương ứng. Mặt khác, M đăng ký chỉ là sự hỗ trợ của itemsets tối đa. Nó thường không chứa thông tin đầy đủ hỗ trợ liên quan đến nó thường xuyên itemsets tương ứng. Chúng tôi minh họa cho những khái niệm với ví dụ 6.2.Ví dụ 6.2 đóng và tối đa itemsets thường xuyên. Giả sử rằng một cơ sở dữ liệu giao dịch có hai giao dịch: {(a1, a2,..., a100); (a1, a2,..., a50)}. Cho phép hỗ trợ tối thiểu tính ngưỡng là min sup = 1. Chúng tôi tìm thấy hai đóng cửa thường xuyên itemsets và số lần hỗ trợ của họ, có nghĩa là, C = {{a1, a2,..., a100}: 1; {a1, a2,..., a50}: 2}. đó là chỉ có một tối đa-imal thường xuyên itemset: M = {{a1, a2,..., a100}: 1}. thông báo rằng chúng tôi không thể bao gồm{a1, a2,..., a50} như là một tối đa thường xuyên itemset bởi vì nó có một superset thường xuyên,{a1, a2,..., a100}. So sánh này để các ngay trước nơi mà chúng tôi xác định rằng không có 2100 − 1 thường xuyên itemsets, đó là quá nhiều để được liệt kê!Các thiết lập của đóng cửa thường xuyên itemsets chứa các thông tin đầy đủ về fre-quent itemsets. Ví dụ: từ C, chúng tôi có thể lấy được, nói rằng, (1) {a2, a45: 2} kể từ {a2, a45} là một lần phụ itemset itemset {a1, a2,..., a50: 2}; và (2) {a8, a55: 1} kể từ {a8, a55} không phải là một tiểu-itemset itemset trước nhưng itemset {a1, a2,..., a100: 1}. Tuy nhiên, từ thường xuyên itemset tối đa, chúng tôi có thể chỉ khẳng định rằng cả hai itemsets ({a2, a45} và{a8, a55}) đều thường xuyên, nhưng chúng tôi không thể khẳng định số lần thực sự hỗ trợ của họ.6.2 thường xuyên các phương pháp khai thác mỏ ItemsetTrong phần này, bạn sẽ tìm hiểu các phương pháp để khai thác các hình thức đơn giản nhất của pat-Nhạn thường xuyên như những thảo luận để phân tích thị trường giỏ trong phần 6.1.1. Chúng tôi bắt đầu bằng cách trình bày Apriori, các thuật toán cơ bản cho việc tìm kiếm thường xuyên itemsets (phần 6.2.1). Trong phần 6.2.2, chúng tôi xem xét làm thế nào để tạo quy tắc của Hiệp hội mạnh mẽ từ bộ mặt thường xuyên. Phần 6.2.3 mô tả một số biến thể với các thuật toán Apriori để cải thiện hiệu quả và khả năng mở rộng. Phần 6.2.4 trình bày phương pháp mô hình tăng trưởng cho khai thác mỏ, itemsets thường xuyên nhốt trong không gian tìm kiếm tiếp theo để chỉ bộ dữ liệu chứa-ing thường xuyên itemsets hiện nay. Phần 6.2.5 trình bày phương pháp khai thác mỏ, itemsets thường xuyên tận dụng lợi thế của các định dạng dữ liệu theo chiều dọc.6.2.1 thuật toán Apriori: việc tìm kiếm thường xuyên Itemsets bằng cách giới hạn thế hệ ứng cử viênApriori là một thuật toán hội thảo được đề xuất bởi R. Agrawal và R. Srikant vào năm 1994 cho min-ing thường xuyên itemsets cho phép Hiệp hội các quy tắc [AS94b]. Tên của các thuật toán là dựa trên thực tế là các thuật toán sử dụng kiến thức trước khi thường xuyên itemset prop-erties, như chúng ta sẽ thấy sau này. Apriori sử dụng một cách tiếp cận lặp đi lặp lại được biết đến như một tìm kiếm level-wise, nơi k-itemsets được sử dụng để khám phá (k + 1)-itemsets. Trước tiên, thiết lập 1-itemsets thường xuyên được tìm thấy bằng cách quét các cơ sở dữ liệu tích lũy số lượng cho mỗi mục, và

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

Như vậy, chúng ta nói rằng C có chứa đầy đủ thông tin liên quan đến các tập phổ biến tương ứng của nó. Mặt khác, M chỉ ghi lại sự hỗ trợ của các tập phổ biến tối đa. Nó thường không có các thông tin hỗ trợ đầy đủ về tập phổ biến tương ứng của nó. Chúng tôi minh họa các khái niệm này với Ví dụ 6.2.

Ví dụ 6.2 tập phổ biến đóng và tối đa. Giả sử rằng một cơ sở dữ liệu giao dịch chỉ có hai giao dịch: {(... A1, a2,, A100); (a1, a2,..., A50)}. Hãy để các ngưỡng hỗ trợ số lượng tối thiểu được min sup = 1. Chúng tôi tìm thấy hai đóng tập phổ biến và số lượng hỗ trợ của họ, đó là, C = {{a1, a2,. . . , A100}: 1; {a1, a2,. . . , A50}: 2}. Chỉ có một imal max- tập phổ biến: M = {{a1, a2,. . . , A100}: 1}. Chú ý rằng chúng ta không thể bao gồm
{a1, a2,. . . , A50} là một tập phổ biến tối đại bởi vì nó có một siêu thường xuyên,
{a1, a2,. . . , A100}. So sánh với trước đó mà chúng tôi xác định rằng có 2100-1 tập phổ biến, đó là quá nhiều để được liệt kê!
Tập hợp các tập phổ biến đóng chứa thông tin đầy đủ về các fre-
tập phổ biến quent. Ví dụ, từ C, chúng ta có thể lấy được, nói, (1) {a2, A45: 2} từ {a2, A45} là một tiểu tập phổ biến của các itemset {a1, a2,. . . , A50: 2}; và (2) {a8, a55: 1} từ {a8, a55} không phải là một tiểu tập phổ biến của các tập phổ biến trước đó nhưng các itemset {a1, a2,. . . , A100: 1}. Tuy nhiên, từ các tập phổ biến tối đại, chúng tôi chỉ có thể khẳng định rằng cả hai tập phổ biến ({a2, A45} và
{a8, a55}) là thường xuyên, nhưng chúng tôi không thể khẳng định được tính hỗ trợ thực tế của họ.

6.2 tập phổ biến phương pháp khai thác
Trong phần này, bạn sẽ tìm hiểu phương pháp để khai thác các hình thức đơn giản nhất của đàn chim nhạn pat- thường xuyên như những thảo luận để phân tích giỏ thị trường tại mục 6.1.1. Chúng ta bắt đầu bằng cách trình bày Apriori, các thuật toán cơ bản cho việc tìm kiếm các tập phổ biến (mục 6.2.1). Trong phần 6.2.2, chúng ta nhìn như thế nào để tạo ra các luật kết hợp mạnh mẽ từ bộ item- thường xuyên. Mục 6.2.3 mô tả một số biến thể của thuật toán Apriori để cải thiện hiệu suất và khả năng mở rộng. Mục 6.2.4 trình bày các phương pháp mô hình tăng trưởng cho khai thác tập phổ biến mà giới hạn không gian tìm kiếm tiếp theo để chỉ các bộ dữ liệu hộp đựng ing các tập phổ biến hiện nay. Mục 6.2.5 trình bày các phương pháp khai thác tập phổ biến mà tận dụng lợi thế của các định dạng dữ liệu theo chiều dọc.

6.2.1 Apriori Thuật toán: Tìm tập phổ biến bởi có hạn chế trên Candidate hệ
Apriori là một thuật toán chuyên đề của R. Agrawal và R. Srikant đề xuất vào năm 1994 cho min - ing tập phổ biến cho hiệp hội Boolean cai [AS94b]. Tên của thuật toán được dựa trên thực tế rằng các thuật toán sử dụng kiến thức của erties prop- tập phổ biến, như chúng ta sẽ thấy sau này. Apriori sử dụng một cách tiếp cận lặp được biết đến như một tìm kiếm mức độ khôn ngoan, với k-tập phổ biến được sử dụng để khám phá (k + 1) -itemsets. Đầu tiên, các bộ thường xuyên 1-tập phổ biến được tìm thấy bằng cách quét các cơ sở dữ liệu để tích lũy tính cho từng mục, và

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.