ABSTRACTIn recent years, due to the

ABSTRACT
In recent years, due to the wide applications of uncertain da- ta, mining frequent itemsets over uncertain databases has at- tracted much attention. In uncertain databases, the support of an itemset is a random variable instead of a fixed occur- rence counting of this itemset. Thus, unlike the correspond- ing problem in deterministic databases where the frequent itemset has a unique definition, the frequent itemset under uncertain environments has two different definitions so far. The first definition, referred as the expected support-based frequent itemset, employs the expectation of the support of an itemset to measure whether this itemset is frequent. The second definition, referred as the probabilistic frequent itemset, uses the probability of the support of an itemset to measure its frequency. Thus, existing work on mining frequent itemsets over uncertain databases is divided into two different groups and no study is conducted to compre- hensively compare the two different definitions. In addition, since no uniform experimental platform exists, current so- lutions for the same definition even generate inconsistent results. In this paper, we firstly aim to clarify the relation- ship between the two different definitions. Through exten- sive experiments, we verify that the two definitions have a tight connection and can be unified together when the size of data is large enough. Secondly, we provide baseline imple- mentations of eight existing representative algorithms and test their performances with uniform measures fairly. Final- ly, according to the fair tests over many different benchmark data sets, we clarify several existing inconsistent conclusions and discuss some new findings.

1. INTRODUCTION
Recently, with many new applications, such as sensor net- work monitoring [23, 24, 26], moving object search [13, 14,
15] and protein-protein interaction (PPI) network analysis [29], uncertain data mining has become a hot topic in data mining communities [3, 4, 5, 6, 20, 21]. Since the problem of frequent itemset mining is fundamental in data mining area,

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present their results at The 38th International Conference on Very Large Data Bases, August 27th - 31st 2012, Istanbul, Turkey.
Proceedings of the VLDB Endowment, Vol. 5, No. 11
Copyright 2012 VLDB Endowment 2150-8097/12/07... $ 10.00.

mining frequent itemsets over uncertain databases has also attracted much attention [4, 9, 10, 11, 17, 18, 22, 28, 30, 31,
33]. For example, with the popularization of wireless sen- sor networks, wireless sensor network systems collect huge amount of data. However, due to the inherent uncertain- ty of sensors, the collected data are often inaccurate. For the probability-included uncertain data, how can we discov- er frequent patterns (itemsets) so that the users can under- stand the hidden rules in data? The inherent probability property of data is ignored if we simply apply the tradition- al method of frequent itemset mining in deterministic data to uncertain data. Thus, it is necessary to design special- ized algorithms for mining frequent itemsets over uncertain databases.
Before finding frequent itemsets over uncertain databases, the definition of the frequent itemset is the most essential issue. In deterministic data, it is clear that an itemset is fre- quent if and only if the support (frequency) of such itemset is not smaller than a specified minimum support, min sup [7, 8, 19, 32]. However, different from the deterministic case, the definition of a frequent itemset over uncertain data has two different semantic explanations: expected support-based frequent itemset [4, 18] and probabilistic frequent itemset [9]. Both of which consider the support of an itemset as a dis- crete random variable. However, the two definitions are different on using the random variable to define frequent itemsets. In the definition of the expected support-based frequent itemset, the expectation of the support of an item- set is defined as the measurement, called as the expected support of this itemset. In this definition [4, 17, 18, 22], an itemset is frequent if and only if the expected support of such itemset is no less than a specified minimum expected sup- port threshold, min esup. In the definition of probabilistic frequent itemset [9, 28, 31], the probability that an itemset appears at least the minimum support (min sup) times is defined as the measurement, called as the frequent proba- bility of an itemset, and an itemset is frequent if and only if the frequent probability of such itemset is larger than a given probabilistic threshold.
The definition of expected support-based frequent itemset uses the expectation to measure the uncertainty, which is a simply extension of the definition of the frequent itemset in deterministic data. The definition of probabilistic frequent itemset includes the complete probability distribution of the support of an itemset. Although the expectation is known as an important statistic, it cannot show the complete prob- ability distribution. Most prior researches believe that the two definitions should be studied respectively [9, 28, 31].

1. INTRODUCTION
Recently, with many new applications, such as sensor net- work monitoring [23, 24, 26], moving object search [13, 14,
15] and protein-protein interaction (PPI) network analysis [29], uncertain data mining has become a hot topic in data mining communities [3, 4, 5, 6, 20, 21]. Since the problem of frequent itemset mining is fundamental in data mining area,

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present their results at The 38th International Conference on Very Large Data Bases, August 27th - 31st 2012, Istanbul, Turkey.
Proceedings of the VLDB Endowment, Vol. 5, No. 11
Copyright 2012 VLDB Endowment 2150-8097/12/07... $ 10.00.
 
mining frequent itemsets over uncertain databases has also attracted much attention [4, 9, 10, 11, 17, 18, 22, 28, 30, 31,
33]. For example, with the popularization of wireless sen- sor networks, wireless sensor network systems collect huge amount of data. However, due to the inherent uncertain- ty of sensors, the collected data are often inaccurate. For the probability-included uncertain data, how can we discov- er frequent patterns (itemsets) so that the users can under- stand the hidden rules in data? The inherent probability property of data is ignored if we simply apply the tradition- al method of frequent itemset mining in deterministic data to uncertain data. Thus, it is necessary to design special- ized algorithms for mining frequent itemsets over uncertain databases.
Before finding frequent itemsets over uncertain databases, the definition of the frequent itemset is the most essential issue. In deterministic data, it is clear that an itemset is fre- quent if and only if the support (frequency) of such itemset is not smaller than a specified minimum support, min sup [7, 8, 19, 32]. However, different from the deterministic case, the definition of a frequent itemset over uncertain data has two different semantic explanations: expected support-based frequent itemset [4, 18] and probabilistic frequent itemset [9]. Both of which consider the support of an itemset as a dis- crete random variable. However, the two definitions are different on using the random variable to define frequent itemsets. In the definition of the expected support-based frequent itemset, the expectation of the support of an item- set is defined as the measurement, called as the expected support of this itemset. In this definition [4, 17, 18, 22], an itemset is frequent if and only if the expected support of such itemset is no less than a specified minimum expected sup- port threshold, min esup. In the definition of probabilistic frequent itemset [9, 28, 31], the probability that an itemset appears at least the minimum support (min sup) times is defined as the measurement, called as the frequent proba- bility of an itemset, and an itemset is frequent if and only if the frequent probability of such itemset is larger than a given probabilistic threshold.
The definition of expected support-based frequent itemset uses the expectation to measure the uncertainty, which is a simply extension of the definition of the frequent itemset in deterministic data. The definition of probabilistic frequent itemset includes the complete probability distribution of the support of an itemset. Although the expectation is known as an important statistic, it cannot show the complete prob- ability distribution. Most prior researches believe that the two definitions should be studied respectively [9, 28, 31].

5000/5000

Từ: Anh

Sang: Việt

Kết quả (Việt) 1: [Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

TÓM TẮT
Trong những năm gần đây, do sự ứng dụng rộng rãi của không chắc chắn liệu trên ta, khai thác tập phổ biến trên cơ sở dữ liệu không chắc chắn có tại- hút được nhiều sự chú ý. Trong cơ sở dữ liệu không chắc chắn, sự hỗ trợ của một tập phổ biến là một biến ngẫu nhiên thay vì một Řenče đếm cố định occur- của tập phổ biến này. Vì vậy, không giống như các vấn đề ing correspond- trong cơ sở dữ liệu xác định nơi các tập phổ biến có một định nghĩa duy nhất, các tập phổ biến trong các môi trường không chắc chắn có hai định nghĩa khác nhau cho đến nay. Các định nghĩa đầu tiên, gọi là hỗ trợ dựa trên dự kiến tập phổ biến, sử dụng kỳ vọng của sự hỗ trợ của một tập phổ biến để đo lường xem liệu tập phổ biến này là thường xuyên. Định nghĩa thứ hai, gọi là tập phổ biến theo xác suất, sử dụng xác suất của sự hỗ trợ của một tập phổ biến để đo tần số của nó. Như vậy, công việc hiện tại về khai thác tập phổ biến trên cơ sở dữ liệu không chắc chắn được chia thành hai nhóm khác nhau và không có nghiên cứu được tiến hành để diện hensively so sánh hai định nghĩa khác nhau. Ngoài ra, vì không có nền tảng thực nghiệm thống nhất tồn tại, lutions Xô hiện tại cho các định nghĩa tương tự thậm chí tạo ra kết quả không phù hợp. Trong bài báo này, chúng ta trước hết nhằm mục đích để làm rõ mối quan hệ giữa hai định nghĩa khác nhau. Thông qua thí nghiệm sive mở rộng cho, chúng tôi xác minh rằng hai định nghĩa có một kết nối chặt chẽ và thống nhất được với nhau khi kích thước của dữ liệu là đủ lớn. Thứ hai, chúng tôi cung cấp cơ sở thöïc hieän mentations tám đại diện các thuật toán hiện hành và kiểm tra màn trình diễn của họ với các biện pháp đồng bằng. Thống nhất và hoàn ly, theo kiểm tra công bằng trong nhiều bộ dữ liệu điểm chuẩn khác nhau, chúng tôi làm rõ một số kết luận không phù hợp hiện tại và thảo luận về một số kết quả nghiên cứu mới. 1. GIỚI THIỆU Gần đây, với nhiều ứng dụng mới, chẳng hạn như cảm biến mạng lưới công việc giám sát [23, 24, 26], di chuyển đối tượng tìm kiếm [13, 14, 15] và tương tác protein-protein (PPI) phân tích mạng [29], khai thác dữ liệu không chắc chắn đã trở thành một chủ đề nóng trong cộng đồng khai thác dữ liệu [3, 4, 5, 6, 20, 21]. Vì bài toán khai thác tập phổ biến là cơ bản trong lĩnh vực khai thác dữ liệu, phép làm bản sao kỹ thuật số hoặc khó khăn của tất cả hoặc một phần của tác phẩm này với mục đích cá nhân hoặc lớp học được cấp mà không cần lệ phí cung cấp bản sao mà không được thực hiện hoặc phân phối để thu lợi nhuận hoặc lợi thế thương mại và rằng bản sao chịu thông báo này và trích dẫn đầy đủ trên trang đầu tiên. Để sao chép nếu không tái xuất, đăng bài trên các máy chủ hoặc để phân phối lại các danh sách, đòi hỏi phải có sự cho phép trước và / hoặc lệ phí. Bài viết từ khối lượng này đã được mời để trình bày kết quả tại Hội nghị quốc tế lần thứ 38 về căn cứ dữ liệu rất lớn, ngày 27-ngày 31 tháng 8 năm 2012, Istanbul, Thổ Nhĩ Kỳ. Proceedings của VLDB Endowment, Vol. 5, số 11 Copyright 2012 VLDB Endowment 2150-8097 / 07/12 ... $ 10,00. tập phổ biến thường xuyên khai thác trên cơ sở dữ liệu không chắc chắn cũng đã thu hút được nhiều sự chú ý [4, 9, 10, 11, 17, 18, 22, 28, 30, 31, 33]. Ví dụ, với sự phổ biến của mạng sor sen- không dây, hệ thống mạng cảm biến không dây thu thập số lượng lớn dữ liệu. Tuy nhiên, do các ty uncertain- vốn có của các cảm biến, các dữ liệu thu được thường không chính xác. Đối với các dữ liệu không chắc chắn xác suất đã tính, làm thế nào chúng ta có thể khám phá ra er mẫu thường xuyên (tập phổ biến) để người sử dụng hiểu biết có thể chịu được các quy tắc ẩn trong dữ liệu? Các tài sản suất vốn có của dữ liệu được bỏ qua nếu chúng ta chỉ cần áp dụng phương pháp tradition- al khai thác tập phổ biến trong dữ liệu xác định dữ liệu không chắc chắn. Vì vậy, nó là cần thiết để thiết kế các thuật toán san chuyên cho khai thác tập phổ biến trên cơ sở dữ liệu không chắc chắn. Trước khi tìm tập phổ biến trên cơ sở dữ liệu không chắc chắn, các định nghĩa của các tập phổ biến là các vấn đề quan trọng nhất. Trong dữ liệu xác định, rõ ràng là một itemset là quent độ thường xuyên nếu và chỉ nếu hỗ trợ (tần số) của tập phổ biến như vậy không phải là nhỏ hơn so với một sự hỗ trợ tối thiểu quy định, min sup [7, 8, 19, 32]. Tuy nhiên, khác với trường hợp xác định, định nghĩa của một tập phổ biến trên các dữ liệu không chắc chắn có hai cách giải thích ngữ nghĩa khác nhau: dự kiến sẽ hỗ trợ dựa trên tập phổ biến [4, 18] và xác suất tập phổ biến [9]. Cả hai đều xem xét sự hỗ trợ của một tập phổ biến như là một biến ngẫu nhiên rời rạc. Tuy nhiên, cả hai định nghĩa khác nhau về việc sử dụng các biến ngẫu nhiên để xác định tập phổ biến. Trong định nghĩa của sự hỗ trợ dựa trên dự kiến tập phổ biến, những kỳ vọng về sự hỗ trợ của một tập item- được định nghĩa là đo lường, gọi là hỗ trợ dự kiến của tập phổ biến này. Trong định nghĩa này [4, 17, 18, 22], một itemset là thường xuyên nếu và chỉ nếu sự hỗ trợ dự kiến của tập phổ biến như vậy là không ít hơn mức tối thiểu quy định dự kiến ngưỡng, nhằm hỗ trợ, min esup. Trong định nghĩa của xác suất tập phổ biến [9, 28, 31], xác suất một itemset xuất hiện ít nhất là sự hỗ trợ tối thiểu (min sup) lần được định nghĩa là đo lường, được gọi là trách xác suất thường xuyên của một tập phổ biến, và một itemset là thường xuyên nếu và chỉ nếu xác suất thường xuyên của tập phổ biến như vậy là lớn hơn một ngưỡng xác suất nhất định. Các định nghĩa về dự kiến hỗ trợ dựa trên tập phổ biến sử dụng kỳ vọng để đo lường sự không chắc chắn, đó là một phần mở rộng đơn giản của định nghĩa của tập phổ biến trong dữ liệu xác định. Các định nghĩa của xác suất tập phổ biến bao gồm các phân bố xác suất hoàn toàn của sự hỗ trợ của một tập phổ biến. Mặc dù kỳ vọng được biết đến như là một số liệu thống kê quan trọng, nó không thể hiện sự phân bố xác suất đầy đủ. Hầu hết các nghiên cứu trước cho rằng hai định nghĩa này phải được nghiên cứu tương ứng [9, 28, 31].

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.