4. Related Work:4.1 Sequential Patt

4. Related Work:

4.1 Sequential Pattern Mining:
The sequential pattern mining problem was first introduced by Agrawal and
Srikant in [2]: Given a set of sequences, where each sequence consists of a list of elements and each element consists of a set of items, and given a user-specified min_support threshold, sequential pattern mining is to find all frequent subsequences, i.e., the subsequences whose occurrence frequency in the set of sequences is no less than min_support.

Sequential Pattern Mining comes in Association rule mining. For a given transaction database T, an association rule is an expression of the form X Y, where X and Y are subsets of A and X Y holds with confidence τ, if τ % of transactions in D that support X also Y. The rule X Y has support σ in the transaction set T if σ % of transactions in T support X U Y. Association rule mining can be divided into two steps. Firstly, frequent patterns with respect to support threshold minimum support are mined. Secondly association rules are generated with respect to confidence threshold minimum confidence.
[3] Proposed a method for discovering access patterns from web logs based on a new type of association patterns. They handle the order between page accesses, and allow gaps in sequences. They use a candidate generation algorithm that requires multiple scans of the database. Their pruning strategy assumes that the site structure is known. [2] presented an algorithm for finding generalized sequential patterns that allows user- specified window-size and user-defined taxonomy over items in the database. This algorithm required multiple scans of the database to generate candidates.

In this paper, we systematically explore a pattern-growth approach for efficient mining of sequential patterns in large sequence database. The approaches adopts a divide-and conquer, pattern-growth principle as follows: Sequence databases are recursively projected into a set of smaller projected databases based on the current sequential pattern(s), and sequential patterns are grown in each projected databases by exploring only locally frequent fragments. Based on this philosophy, we first examine a straightforward pattern growth method, FreeSpan (for Frequent pattern-projected Sequential pattern mining), which reduces the efforts of candidate subsequence generation. we examine another and more efficient method, called PrefixSpan (for Prefix- projected Sequential pattern mining), which offers ordered growth and reduced projected databases. To further improve the performance, a pseudo projection technique is

developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the Apriori-based algorithm GSP, FreeSpan. PrefixSpan, integrated with pseudo projection, is the fastest among all the tested algorithms. The PrefixSpan consumes a much smaller memory space in comparison with GSP. This pattern-growth methodology can be further extended to mining multilevel, multidimensional sequential patterns, and mining other structured patterns.

we examine whether one can fix the order of item projection in the generation of a projected database. Intuitively, if one follows the order of the prefix of a sequence and projects only the suffix of a sequence, one can examine in an orderly manner all the possible subsequence’s and their associated projected database. WE examine WAP tree structure for frequent sequence pattern mining in web log files.

5.1 WAP-tree:

WAP-tree, which stands for web access pattern tree. A nice data structure, WAP-tree, is devised to register access sequences and corresponding counts compactly, so that the tedious support counting can be avoided. It also maintains linkages for traversing prefixes with respect to the same suffix pattern efficiently. A WAP-tree registers all and only all information needed by the rest of mining. Once such a data structure is built, all the remaining mining processing is based on the WAP-tree. The original access sequence database is not needed any more. Because the size of WAP-tree is usually much smaller than that of the original access sequence database, the construction of WAP-tree is quite efficient by simply scanning the access sequence database twice.

An efficient recursive algorithm is proposed to enumerate access patterns from WAP- tree. No candidate generation is required in the mining procedure, and only the patterns with enough support will be under consideration. The philosophy of this mining algorithm is conditional search. Instead of searching patterns level-wise as Apriori, conditional search narrows the search space by looking for patterns with the same suffix, and count frequent events in the set of prefixes with respect to condition as suffix. Conditional search is a partition-based divide-and-conquer method instead of bottom-up generation of combinations. It avoids generating large candidate sets.

The main steps involved in this technique are summarized. The WAP-tree stores the web log data in a prefix tree format similar to the frequent pattern tree (FP-tree) for non- sequential data.

• The algorithm first scans the web log once to find all frequent individual events.

• Secondly, it scans the web log again to construct a WAP-tree over the set of frequent individual events of each transaction.

• Thirdly, it finds the conditional suffix patterns.

• In the fourth step, it constructs the intermediate conditional WAP-tree using the pattern found in previous step.

• Finally, it goes back to repeat Steps 3 and 4 until the constructed conditional
WAP-tree has only one branch or is empty.

Based on the above observations, a Web access pattern tree structure, or WAP-tree in short, can be defined as follows.

1. Each node in a WAP-tree registers two pieces of information: label and count, denoted as label: count. The root of the tree is a special virtual node with an empty label and count 0. Every other node is labeled by an event in the event set E, and is associated with a count which registers the number of occurrences of the corresponding prefix ended with that event in the Web access sequence database.

2. The WAP-tree is constructed as follows: for each access sequence in the database, filter out any non frequent events, and then insert the resulting frequent subsequence into WAP-tree. The insertion of frequent subsequence is started from the root of WAP-tree. Considering the first event, denoted as e, increment the count of child node with label e by 1 if there exists one; otherwise create a child labeled by e and set the count to 1. Then, recursively insert the rest of the frequent subsequence to the subtree rooted at that child labeled e.

3. Auxiliary node linkage structures are constructed to assist node traversal in a WAP- tree as follows. All the nodes in the tree with the same label are linked by shared-label linkages into a queue, called event-node queue, The event node queue of with label ei is also called ei queue. There is one header table H for a WAP-tree, and the head of each event-node queue is registered in H.

4. Related Work:

4.1 Sequential Pattern Mining:
The sequential pattern mining problem was first introduced by Agrawal and
Srikant in [2]: Given a set of sequences, where each sequence consists of a list of elements and each element consists of a set of items, and given a user-specified min_support threshold, sequential pattern mining is to find all frequent subsequences, i.e., the subsequences whose occurrence frequency in the set of sequences is no less than min_support.

Sequential Pattern Mining comes in Association rule mining. For a given transaction database T, an association rule is an expression of the form X Y, where X and Y are subsets of A and X Y holds with confidence τ, if τ % of transactions in D that support X also Y. The rule X Y has support σ in the transaction set T if σ % of transactions in T support X U Y. Association rule mining can be divided into two steps. Firstly, frequent patterns with respect to support threshold minimum support are mined. Secondly association rules are generated with respect to confidence threshold minimum confidence.
[3] Proposed a method for discovering access patterns from web logs based on a new type of association patterns. They handle the order between page accesses, and allow gaps in sequences. They use a candidate generation algorithm that requires multiple scans of the database. Their pruning strategy assumes that the site structure is known. [2] presented an algorithm for finding generalized sequential patterns that allows user- specified window-size and user-defined taxonomy over items in the database. This algorithm required multiple scans of the database to generate candidates.

In this paper, we systematically explore a pattern-growth approach for efficient mining of sequential patterns in large sequence database. The approaches adopts a divide-and conquer, pattern-growth principle as follows: Sequence databases are recursively projected into a set of smaller projected databases based on the current sequential pattern(s), and sequential patterns are grown in each projected databases by exploring only locally frequent fragments. Based on this philosophy, we first examine a straightforward pattern growth method, FreeSpan (for Frequent pattern-projected Sequential pattern mining), which reduces the efforts of candidate subsequence generation. we examine another and more efficient method, called PrefixSpan (for Prefix- projected Sequential pattern mining), which offers ordered growth and reduced projected databases. To further improve the performance, a pseudo projection technique is

developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the Apriori-based algorithm GSP, FreeSpan. PrefixSpan, integrated with pseudo projection, is the fastest among all the tested algorithms. The PrefixSpan consumes a much smaller memory space in comparison with GSP. This pattern-growth methodology can be further extended to mining multilevel, multidimensional sequential patterns, and mining other structured patterns.

we examine whether one can fix the order of item projection in the generation of a projected database. Intuitively, if one follows the order of the prefix of a sequence and projects only the suffix of a sequence, one can examine in an orderly manner all the possible subsequence’s and their associated projected database. WE examine WAP tree structure for frequent sequence pattern mining in web log files.

5.1 WAP-tree:

WAP-tree, which stands for web access pattern tree. A nice data structure, WAP-tree, is devised to register access sequences and corresponding counts compactly, so that the tedious support counting can be avoided. It also maintains linkages for traversing prefixes with respect to the same suffix pattern efficiently. A WAP-tree registers all and only all information needed by the rest of mining. Once such a data structure is built, all the remaining mining processing is based on the WAP-tree. The original access sequence database is not needed any more. Because the size of WAP-tree is usually much smaller than that of the original access sequence database, the construction of WAP-tree is quite efficient by simply scanning the access sequence database twice.

An efficient recursive algorithm is proposed to enumerate access patterns from WAP- tree. No candidate generation is required in the mining procedure, and only the patterns with enough support will be under consideration. The philosophy of this mining algorithm is conditional search. Instead of searching patterns level-wise as Apriori, conditional search narrows the search space by looking for patterns with the same suffix, and count frequent events in the set of prefixes with respect to condition as suffix. Conditional search is a partition-based divide-and-conquer method instead of bottom-up generation of combinations. It avoids generating large candidate sets.

The main steps involved in this technique are summarized. The WAP-tree stores the web log data in a prefix tree format similar to the frequent pattern tree (FP-tree) for non- sequential data.

• The algorithm first scans the web log once to find all frequent individual events.

• Secondly, it scans the web log again to construct a WAP-tree over the set of frequent individual events of each transaction.

• Thirdly, it finds the conditional suffix patterns.

• In the fourth step, it constructs the intermediate conditional WAP-tree using the pattern found in previous step.

• Finally, it goes back to repeat Steps 3 and 4 until the constructed conditional
WAP-tree has only one branch or is empty.

Based on the above observations, a Web access pattern tree structure, or WAP-tree in short, can be defined as follows.

1. Each node in a WAP-tree registers two pieces of information: label and count, denoted as label: count. The root of the tree is a special virtual node with an empty label and count 0. Every other node is labeled by an event in the event set E, and is associated with a count which registers the number of occurrences of the corresponding prefix ended with that event in the Web access sequence database.

2. The WAP-tree is constructed as follows: for each access sequence in the database, filter out any non frequent events, and then insert the resulting frequent subsequence into WAP-tree. The insertion of frequent subsequence is started from the root of WAP-tree. Considering the first event, denoted as e, increment the count of child node with label e by 1 if there exists one; otherwise create a child labeled by e and set the count to 1. Then, recursively insert the rest of the frequent subsequence to the subtree rooted at that child labeled e.

3. Auxiliary node linkage structures are constructed to assist node traversal in a WAP- tree as follows. All the nodes in the tree with the same label are linked by shared-label linkages into a queue, called event-node queue, The event node queue of with label ei is also called ei queue. There is one header table H for a WAP-tree, and the head of each event-node queue is registered in H.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

4. liên quan đến công việc:4.1 tuần tự mô hình khai thác:Vấn đề khai thác mỏ tuần tự mẫu đã được giới thiệu lần đầu tiên bởi Agrawal vàSrikant trong [2]: cho một tập hợp các trình tự, nơi mỗi chuỗi bao gồm một danh sách các yếu tố và mỗi yếu tố bao gồm một tập hợp các mục, và đưa ra một ngưỡng quy định người dùng min_support, khai thác mỏ tuần tự mô hình là để tìm thấy tất cả thường xuyên subsequences, tức là, các subsequences có tần số xuất hiện trong các thiết lập của trình tự là không ít hơn min_support.Mô hình tuần tự khai thác khoáng sản đi kèm trong khai thác mỏ Hiệp hội quy tắc. Đối với một cơ sở dữ liệu giao dịch nhất định của T, quy tắc Hiệp hội là một biểu hiện của các hình thức X Y, nơi X và Y là các tập con của A và X Y giữ với sự tự tin khoảng, nếu τ % giao dịch rê hỗ trợ X Y. Quy tắc X Y đã hỗ trợ σ trong giao dịch đặt T nếu σ % của các giao dịch trong T hỗ trợ quy tắc X U Y. Hiệp hội khai thác mỏ có thể được chia thành hai bước. Trước hết, các mô hình thường xuyên đối với hỗ trợ ngưỡng tối thiểu hỗ trợ được khai thác. Thứ hai hiệp hội quy tắc được tạo ra đối với sự tự tin ngưỡng tối thiểu sự tự tin.[3] đề xuất một phương pháp để phát hiện truy cập mẫu từ Nhật ký web dựa trên một loại mới của Hiệp hội mẫu. Họ xử lý bộ giữa trang truy cập, và cho phép những khoảng trống trong chuỗi. Họ sử dụng một thuật toán thế hệ ứng cử viên đòi hỏi nhiều quét của cơ sở dữ liệu. Chiến lược cắt tỉa của họ giả định rằng cấu trúc trang web được biết đến. [2] trình bày một thuật toán cho việc tìm kiếm tổng quát các mẫu tuần tự cho phép người sử dụng-chỉ định kích thước cửa sổ và người dùng xác định phân loại trên mục trong cơ sở dữ liệu. Thuật toán này yêu cầu nhiều quét của cơ sở dữ liệu để tạo ra các ứng cử viên.Trong bài này, chúng tôi có hệ thống khám phá một phương pháp tiếp cận mô hình tăng trưởng cho khai thác hiệu quả của các mô hình tuần tự trong cơ sở dữ liệu lớn. Các phương pháp tiếp cận thông qua một phân chia- và chinh phục, sự phát triển mô hình nguyên tắc như sau: cơ sở dữ liệu chuỗi là đệ quy dự kiến vào một tập hợp nhỏ hơn dự kiến cơ sở dữ liệu dựa trên pattern(s) tuần tự hiện tại, và tuần tự mẫu được trồng ở mỗi cơ sở dữ liệu dự kiến bằng cách khai thác những mảnh vỡ chỉ tại địa phương thường xuyên. Dựa trên triết lý này, chúng tôi đầu tiên xem xét một phương pháp đơn giản mô hình tăng trưởng, FreeSpan (đối với các mô hình mô hình dự kiến thường xuyên của tuần tự khai thác), làm giảm các nỗ lực của ứng cử viên subsequence thế hệ. chúng ta xem xét phương pháp khác và hiệu quả hơn, được gọi là PrefixSpan (đối với tiền tố-dự kiến trình tự mô hình khai thác mỏ), mà cung cấp đã ra lệnh phát triển và giảm cơ sở dữ liệu dự kiến. Để tiếp tục cải thiện hiệu suất, một kỹ thuật chiếu giả là phát triển trong PrefixSpan. Một nghiên cứu toàn diện hiệu suất cho thấy rằng PrefixSpan, trong hầu hết trường hợp, nhanh hơn so với các Apriori dựa trên thuật toán GSP, FreeSpan. PrefixSpan, tích hợp với giả chiếu, là nhanh nhất trong số tất cả các thuật toán thử nghiệm. PrefixSpan tiêu thụ một nhỏ hơn bộ nhớ không gian nhiều khi so sánh với GSP. Phương pháp phát triển mô hình này có thể được mở rộng hơn nữa để khai thác mỏ đa, đa chiều mô hình tuần tự, và khai thác mỏ khác xây dựng mô hình.chúng tôi kiểm tra cho dù một trong những có thể sửa chữa Huân chương mục chiếu trong thế hệ của một cơ sở dữ liệu dự kiến. Trực giác, nếu một theo thứ tự của tiền tố của một chuỗi và dự án chỉ là hậu tố các chuỗi, một có thể kiểm tra một cách có trật tự của cơ sở dữ liệu dự kiến liên kết và tất cả subsequence có thể của. Chúng ta xem xét cấu trúc cây WAP cho mô hình thường xuyên tự khai thác mỏ trong tệp nhật ký web.5.1 WAP-cây:WAP-cây, đó là viết tắt của web truy cập mẫu cây. Một cấu trúc dữ liệu tốt đẹp, WAP-cây, được đưa ra để đăng ký truy cập trình tự và số lần tương ứng compactly, nỗi buồn tẻ hỗ trợ đếm có thể tránh được. Nó cũng duy trì mối liên kết cho các tiền tố traversing đối với cùng một khuôn mẫu hậu tố hiệu quả. WAP cây đăng ký tất cả và chỉ có tất cả thông tin cần thiết của phần còn lại của khai thác mỏ. Một khi một cấu trúc dữ liệu được xây dựng, tất cả việc xử lý còn lại của khai thác mỏ dựa trên WAP-cây. Cơ sở dữ liệu chuỗi truy cập ban đầu không cần thiết nữa. Bởi vì kích thước của WAP-cây là thường nhỏ hơn nhiều so với cơ sở dữ liệu chuỗi truy cập ban đầu, việc xây dựng WAP-cây là khá hiệu quả bằng cách chỉ đơn giản là quét cơ sở dữ liệu chuỗi truy cập hai lần.Một thuật toán đệ quy hiệu quả được đề xuất để liệt kê các mẫu truy cập từ WAP-cây. Không có ứng cử viên thế hệ là cần thiết trong các thủ tục khai thác mỏ, và chỉ có các mô hình với hỗ trợ đủ sẽ xem xét. Triết lý của thuật toán khai thác khoáng sản này là có điều kiện tìm kiếm. Thay vì tìm kiếm mô hình level-wise như Apriori, có điều kiện tìm thu hẹp không gian tìm kiếm bằng cách tìm kiếm các mô hình với cùng một hậu tố, và tính sự kiện thường xuyên trong các thiết lập của các tiền tố đối với điều kiện là hậu tố. Có điều kiện tìm kiếm là một dựa trên phân vùng phân chia và chinh phục phương pháp thay vì dưới lên thế hệ của kết hợp. Nó tránh tạo ra các ứng cử viên lớn bộ.Các bước chính tham gia vào kỹ thuật này được tóm tắt. WAP-cây lưu trữ dữ liệu Nhật ký web trong một định dạng cây tiền tố tương tự như các cây thường xuyên mô hình (FP-cây) phòng không - dữ liệu tuần tự.• Các thuật toán đầu tiên quét đăng nhập web một lần để tìm tất cả các sự kiện cá nhân thường xuyên.• Thứ hai, nó quét đăng nhập web một lần nữa để xây dựng một WAP-cây trên các thiết lập của sự kiện cá nhân thường xuyên của mỗi giao dịch.• Thứ ba, nó tìm thấy các mô hình hậu tố có điều kiện.• Trong bước thứ tư, nó xây dựng trung gian có điều kiện WAP-cây bằng cách sử dụng các mô hình được tìm thấy trong bước trước. • Cuối cùng, nó đi lại cho lặp lại bước 3 và 4 cho đến khi có điều kiện xây dựngWAP-cây có chỉ có một chi nhánh hoặc là sản phẩm nào.Dựa trên các quan sát ở trên, một Web truy cập mô hình cấu trúc cây, hoặc WAP-cây trong ngắn hạn, có thể được định nghĩa như sau.1. mỗi nút trong WAP cây đăng ký hai mẩu thông tin: nhãn và tính, biểu hiện như nhãn: đếm. Gốc cây là một nút ảo đặc biệt với một nhãn sản phẩm nào và số 0. Mỗi nút được dán nhãn bởi một sự kiện trong sự kiện này set E, và được liên kết với một số đăng ký số lần xuất hiện của tiền tố tương ứng kết thúc với sự kiện đó vào cơ sở dữ liệu chuỗi truy cập Web.2. WAP-cây được xây dựng như sau: cho mỗi chuỗi truy cập vào cơ sở dữ liệu, lọc ra bất kỳ sự kiện không thường xuyên, và sau đó chèn subsequence thường xuyên kết quả vào WAP-cây. Việc thêm nhân vật thường xuyên subsequence bắt đầu từ gốc rễ của cây WAP. Xem xét sự kiện đầu tiên, ký hiệu như e, tăng số lượng trẻ em nút với nhãn e bởi 1 nếu tồn tại một; Nếu không tạo ra một đứa trẻ có nhãn bởi e và đặt số 1. Sau đó, đệ quy chèn phần còn lại của subsequence thường xuyên để subtree bắt nguồn từ lúc đó trẻ em có nhãn e.3. phụ trợ nút liên kết cấu trúc được xây dựng để hỗ trợ nút traversal WAP-cây như sau. Tất cả các nút trong cây với cùng một nhãn được liên kết bởi nhãn chia sẻ liên kết vào một hàng đợi, hàng đợi sự kiện được gọi là nút, hàng đợi sự kiện nút của với nhãn ei cũng được gọi là ei hàng đợi. Có một tiêu đề bảng H cho WAP cây, và người đứng đầu của mỗi hàng đợi sự kiện-nút được đăng ký trong H.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

4. Làm việc liên quan: 4.1 Mining Pattern Sequential: Các sequential vấn đề khai thác mô hình lần đầu tiên được giới thiệu bởi Agrawal và Srikant trong [2]: Cho một tập hợp của các trình tự, nơi mà mỗi dãy gồm một danh sách các phần tử và mỗi phần tử bao gồm một tập hợp các mặt hàng, và đưa ra một ngưỡng min_support người dùng quy định, khai thác mô hình tuần tự là để tìm tất cả subsequences thường xuyên, tức là, các subsequences mà xảy ra tần số trong tập hợp các chuỗi là không ít hơn min_support. Mining mẫu tuần tự đến trong Hiệp hội khai thác mỏ quy tắc. Đối với một cơ sở dữ liệu cho giao dịch T, một luật kết hợp là một biểu hiện của các XY hình thức, trong đó X và Y là tập con của A và XY giữ với τ sự tự tin, nếu τ% các giao dịch trong D có hỗ trợ X cũng Y. Các quy tắc XY có hỗ trợ σ trong giao dịch đặt T nếu σ% các giao dịch trong hỗ trợ T Hiệp hội Y. XU khai thác mỏ quy tắc có thể được chia thành hai bước. Thứ nhất, các mẫu thường xuyên liên quan đến hỗ trợ hỗ trợ tối thiểu ngưỡng với được khai thác. Thứ hai luật kết hợp được tạo ra đối với ngưỡng sự tự tin tự tin với tối thiểu. [3] đề xuất một phương pháp để phát hiện các mẫu truy cập từ các bản ghi web dựa trên một loại mới của mô hình hiệp hội. Họ xử lý trật tự giữa trang truy cập, và cho phép khoảng trống trong chuỗi. Họ sử dụng một thuật toán thế hệ ứng viên đòi hỏi nhiều quét của các cơ sở dữ liệu. Chiến lược cắt tỉa chúng chỉ giả định rằng các cấu trúc trang web được biết đến. [2] trình bày một thuật toán để tìm kiếm các mẫu tuần tự tổng quát cho phép user- định cửa sổ cỡ lớn và người dùng xác định phân loại các hạng mục trong cơ sở dữ liệu. Thuật toán này yêu cầu nhiều quét của các cơ sở dữ liệu để tạo ra các ứng cử viên. Trong bài báo này, chúng tôi có hệ thống khám phá một phương pháp tiếp cận mô hình tăng trưởng khai thác có hiệu quả các mô hình tuần tự trong cơ sở dữ liệu chuỗi lớn. Các phương pháp tiếp cận thông qua một nguyên tắc phân chia và chinh phục, mô hình tăng trưởng như sau: cơ sở dữ liệu trình tự được đệ quy chiếu vào một tập hợp các cơ sở dữ liệu dự nhỏ hơn dựa trên các mô hình tuần tự hiện tại (s), và các mẫu tuần tự được trồng ở mỗi cơ sở dữ liệu dự bởi chỉ khám phá mảnh thường xuyên tại địa phương. Dựa trên triết lý này, đầu tiên chúng ta xem xét một phương pháp đơn giản tăng trưởng mô hình, FreeSpan (cho thường xuyên khai thác mô hình tuần tự mô-dự tính), làm giảm các nỗ lực của các thế hệ ứng viên dãy. chúng ta nghiên cứu khác và hiệu quả hơn phương pháp, gọi là PrefixSpan (cho Prefix- dự khai thác mô hình tuần tự), trong đó cung cấp ra lệnh tăng trưởng và giảm cơ sở dữ liệu dự. Để nâng cao hơn nữa hiệu suất, một kỹ thuật chiếu giả được phát triển trong PrefixSpan. Một nghiên cứu cho thấy rằng hiệu suất toàn diện PrefixSpan, trong hầu hết các trường hợp, nhanh hơn so với các thuật toán Apriori-GSP dựa, FreeSpan. PrefixSpan, tích hợp với chiếu giả, là nhanh nhất trong số tất cả các thuật toán kiểm tra. Các PrefixSpan tiêu thụ một không gian bộ nhớ nhỏ hơn nhiều so với GSP. Phương pháp mô hình tăng trưởng này có thể được tiếp tục mở rộng để khai thác mỏ đa cấp, mô hình tuần tự đa chiều, và khai thác mô hình cấu trúc khác. Chúng tôi kiểm tra xem liệu người ta có thể sửa chữa các thứ tự chiếu mục trong thế hệ của một cơ sở dữ liệu dự. Bằng trực giác, nếu người ta theo thứ tự của các tiền tố của một trình tự và các dự án chỉ có các hậu tố của một chuỗi, người ta có thể kiểm tra một cách có trật tự tất cả các cơ sở dữ liệu liên quan của họ dự dãy con của thể và. WE kiểm tra cấu trúc cây WAP cho chuỗi thường xuyên khai thác mô hình trong các file log web. 5.1 WAP-tree: WAP-tree, viết tắt của cây mô hình truy cập web. Một cấu trúc dữ liệu tốt đẹp, WAP-tree, được đưa ra để đăng ký trình tự truy cập và số lượng tương ứng gọn, do đó hỗ trợ đếm tẻ nhạt có thể tránh được. Nó cũng duy trì mối liên kết để vượt qua các tiền tố đối với các mô hình hậu tố cùng hiệu quả với. Một WAP-cây đăng ký tất cả và chỉ có tất cả các thông tin cần thiết cho phần còn lại của khai thác mỏ. Một khi một cấu trúc dữ liệu đó được xây dựng, tất cả các chế biến khoáng sản còn lại được dựa trên WAP-tree. Các cơ sở dữ liệu trình tự truy cập ban đầu là không cần thiết nữa. Bởi vì kích thước của WAP-cây thường nhỏ hơn nhiều so với các cơ sở dữ liệu trình tự truy cập ban đầu, việc xây dựng WAP-tree là khá hiệu quả bằng cách đơn giản quét các cơ sở dữ liệu trình tự truy cập hai lần. Một thuật toán đệ quy có hiệu quả được đề xuất để liệt kê các mẫu truy cập từ WAP - cây. Không có thế hệ ứng viên được yêu cầu trong quá trình khai thác mỏ, và chỉ có các mô hình với đầy đủ hỗ trợ sẽ được xem xét. Triết lý của thuật toán khai thác này là tìm kiếm điều kiện. Thay vì tìm kiếm mẫu mực, khôn ngoan như Apriori, tìm kiếm có điều kiện thu hẹp không gian tìm kiếm bằng cách tìm kiếm các mô hình với các hậu tố tương tự, và đếm sự kiện thường xuyên trong tập hợp các tiền tố liên quan đến điều kiện như hậu tố với. Tìm kiếm điều kiện là một phương pháp phân chia-và-chinh phục phân vùng dựa trên thay vì thế hệ dưới lên của các kết hợp. Nó tránh tạo ra bộ ứng cử viên lớn. Các bước chính liên quan đến kỹ thuật này được tóm tắt. Các cửa hàng WAP-cây dữ liệu nhật ký web trong một định dạng cây tiền tố tương tự như cây mẫu thường xuyên (FP-tree) cho dữ liệu tuần tự không. • Thuật toán đầu tiên quét các trang web đăng nhập một lần cho tất cả các sự kiện cá nhân thường xuyên. • Thứ hai, nó quét các trang web đăng nhập một lần nữa để xây dựng một WAP-cây trên tập các sự kiện cá nhân thường xuyên của mỗi giao dịch. • Thứ ba, nó tìm thấy các mô hình hậu tố có điều kiện. • Trong bước thứ tư, nó xây dựng các trung gian có điều kiện WAP-cây bằng cách sử dụng mô hình tìm thấy trong bước trước đó. • Cuối cùng, nó quay ngược lại lặp lại bước 3 và 4 cho đến khi xây dựng có điều kiện WAP-cây chỉ có một chi nhánh hoặc rỗng. Dựa trên những quan sát trên, một truy cập Web cơ cấu cây mô hình, hoặc WAP-cây trong Tóm lại, có thể được định nghĩa như sau. 1. Mỗi nút trong một WAP-cây ghi hai mẩu thông tin: nhãn và đếm, ký hiệu là nhãn: đếm. Gốc của cây là một nút ảo đặc biệt với một nhãn rỗng và đếm 0. Mỗi nút khác được dán nhãn của một sự kiện trong trường hợp đặt E, và được kết hợp với một số trong đó ghi số lần xuất hiện của tiền tố tương ứng với kết thúc biến cố đó trong cơ sở dữ liệu trình tự truy cập Web. 2. WAP-cây được xây dựng như sau: đối với mỗi trình tự truy cập vào cơ sở dữ liệu, lọc ra bất kỳ sự kiện không thường xuyên, và sau đó chèn các kết quả dãy thường xuyên vào WAP-tree. Việc thêm vào dãy con thường xuyên được bắt đầu từ gốc rễ của WAP-tree. Xem xét các sự kiện đầu tiên, ký hiệu là e, tăng số lượng các nút con với nhãn e bằng 1 nếu tồn tại một; nếu không tạo ra một con dán nhãn của e và thiết lập thành các thông số 1. Sau đó, đệ quy chèn các phần còn lại của dãy con thường xuyên để cây con có gốc là con em có nhãn e. 3. Cấu trúc nút liên kết phụ trợ được xây dựng để hỗ trợ các nút traversal trong một cây WAP- như sau. Tất cả các nút trong cây với cùng một nhãn được liên kết bởi mối liên kết chia sẻ nhãn vào một hàng đợi, gọi là sự kiện-node hàng đợi, Hàng đợi nút hợp với nhãn ei cũng được gọi là ei hàng đợi. Có một tiêu đề bảng H cho một WAP-tree, và người đứng đầu của mỗi hàng đợi sự kiện-node được đăng ký tại H.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.