than that under the quantized binar

than that under the quantized binary model. This is particularly true if most
of the existential probabilities are very small. Consequently, mining algorithms
will run a lot slower on such large datasets. In this paper we propose an efficient
technique for mining existential uncertain datasets, which exploit the statistical
properties of low-valued items. Through experiments, we will show that the
proposed technique is very efficient in terms of both CPU cost and I/O cost.
The rest of this paper is organized as follows. Section 2 describes the Possible
Worlds interpretation of existential uncertain data and defines the expected support
measure. Section 3 discusses a simple modification of the Apriori algorithm
to mine uncertain data and explains why such a modification does not lead to
an efficient algorithm. Section 4 presents a data trimming technique to improve
mining efficiency. Section 5 presents some experimental results and discusses
some observations. We conclude the study in Section 6.
2 Problem Definition
In our data model, an uncertain dataset D consists of d transactions t1, . . . , td.
A transaction ti contains a number of items. Each item x in ti is associated
with a non-zero probability Pti (x), which indicates the likelihood that item x
is present in transaction ti. There are thus two possibilities of the world. In
one case, item x is present in transaction ti; in another case, item x is not
in ti. Let us call these two possibilities the two possible worlds, W1 and W2,
respectively.We do not know which world is the real world but we do know, from
the dataset, the probability of each world being the true world. In particular, if
we let P(Wi) be the probability that world Wi being the true world, then we
have P(W1) = Pti (x) and P(W2) = 1−Pti(x). We can extend this idea to cover
cases in which transaction ti contains other items. For example, let item y be
another item in ti with probability Pti (y). If the observation of item x and item y
are independently done1, then there are four possible worlds. The probability of
the world in which ti contains both items x and y, for example, is Pti (x) ·Pti (y).
We can further extend the idea to cover datasets that contains more than one
transaction. Figure 1 illustrates the 16 possible worlds derived from the patient
records shown in Table 1. In traditional frequent itemset mining, the support
count of an itemset X is defined as the number of transactions that contain
X. For an uncertain dataset, such a support value is undefined since we do not
know in the real world whether a transaction contains X with certainty. We can,
however, determine the support of X with respect to any given possible world.
Let us consider the worlds shown in Figure 1, the supports of itemset AB in
world W1 and W6 are 2 and 1, respectively. If we can determine the probability
of each possible world and the support of an itemset X in each world, we can
determine the expected support of X.
Definition 1. An itemset X is frequent if and only if its expected support not
less than ρs · d, where ρs is a user-specified support threshold.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

than that under the quantized binary model. This is particularly true if mostof the existential probabilities are very small. Consequently, mining algorithmswill run a lot slower on such large datasets. In this paper we propose an efficienttechnique for mining existential uncertain datasets, which exploit the statisticalproperties of low-valued items. Through experiments, we will show that theproposed technique is very efficient in terms of both CPU cost and I/O cost.The rest of this paper is organized as follows. Section 2 describes the PossibleWorlds interpretation of existential uncertain data and defines the expected supportmeasure. Section 3 discusses a simple modification of the Apriori algorithmto mine uncertain data and explains why such a modification does not lead toan efficient algorithm. Section 4 presents a data trimming technique to improvemining efficiency. Section 5 presents some experimental results and discussessome observations. We conclude the study in Section 6.2 Problem DefinitionIn our data model, an uncertain dataset D consists of d transactions t1, . . . , td.A transaction ti contains a number of items. Each item x in ti is associatedwith a non-zero probability Pti (x), which indicates the likelihood that item xis present in transaction ti. There are thus two possibilities of the world. Inone case, item x is present in transaction ti; in another case, item x is notin ti. Let us call these two possibilities the two possible worlds, W1 and W2,
respectively.We do not know which world is the real world but we do know, from
the dataset, the probability of each world being the true world. In particular, if
we let P(Wi) be the probability that world Wi being the true world, then we
have P(W1) = Pti (x) and P(W2) = 1−Pti(x). We can extend this idea to cover
cases in which transaction ti contains other items. For example, let item y be
another item in ti with probability Pti (y). If the observation of item x and item y
are independently done1, then there are four possible worlds. The probability of
the world in which ti contains both items x and y, for example, is Pti (x) ·Pti (y).
We can further extend the idea to cover datasets that contains more than one
transaction. Figure 1 illustrates the 16 possible worlds derived from the patient
records shown in Table 1. In traditional frequent itemset mining, the support
count of an itemset X is defined as the number of transactions that contain
X. For an uncertain dataset, such a support value is undefined since we do not
know in the real world whether a transaction contains X with certainty. We can,
however, determine the support of X with respect to any given possible world.
Let us consider the worlds shown in Figure 1, the supports of itemset AB in
world W1 and W6 are 2 and 1, respectively. If we can determine the probability
of each possible world and the support of an itemset X in each world, we can
determine the expected support of X.
Definition 1. An itemset X is frequent if and only if its expected support not
less than ρs · d, where ρs is a user-specified support threshold.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

hơn theo mô hình nhị phân lượng tử hóa. Điều này đặc biệt đúng nếu nhất
của các xác suất tồn tại là rất nhỏ. Do đó, các thuật toán khai thác mỏ
sẽ chạy chậm hơn rất nhiều trên các tập dữ liệu lớn như vậy. Trong bài báo này chúng tôi đề xuất một hiệu quả
kỹ thuật cho khai thác bộ dữ liệu không chắc chắn tồn tại, trong đó khai thác các thống kê
thuộc tính của các mặt hàng có giá trị thấp. Qua thực nghiệm, chúng tôi sẽ cho thấy rằng các
kỹ thuật được đề xuất là rất hiệu quả cả về chi phí CPU và tôi chi phí / O.
Phần còn lại của bài viết này được tổ chức như sau. Phần 2 mô tả có thể
giải thích thế giới của dữ liệu không chắc chắn tồn tại và định nghĩa hỗ trợ dự kiến
biện pháp. Phần 3 thảo luận về một thay đổi đơn giản của thuật toán Apriori
mỏ dữ liệu chắc chắn và giải thích lý do tại sao một điều chỉnh đó không dẫn đến
một thuật toán hiệu quả. Phần 4 trình bày một kỹ thuật cắt tỉa dữ liệu để nâng cao
hiệu quả khai thác. Phần 5 trình bày một số kết quả thực nghiệm và thảo luận về
một số quan sát. Chúng tôi kết luận nghiên cứu tại mục 6.
Vấn đề 2 Định nghĩa
Trong mô hình dữ liệu của chúng tôi, một tập dữ liệu D chắc chắn bao gồm d giao dịch t1,. . . , Td.
Một ti giao dịch có chứa một số mặt hàng. Mỗi mục x trong ti được gắn liền
với một phi xác suất bằng không PTI (x), trong đó cho biết khả năng item x
có mặt trong ti giao dịch. Có như vậy, hai khả năng của thế giới. Trong
một trường hợp, mục x có mặt trong ti giao dịch; trong trường hợp khác, mục x không phải là
trong ti. Hãy để chúng tôi gọi hai khả năng này hai thế giới có thể, W1 và W2,
respectively.We không biết thế giới là thế giới thực nhưng chúng tôi biết, từ
các tập dữ liệu, xác suất của mỗi thế giới là thế giới thực sự. Đặc biệt, nếu
chúng ta để cho P (Wi) là xác suất mà thế giới Wi là thế giới thật, sau đó chúng ta
có P (W1) = PTI (x) và P (W2) = 1-PTI (x). Chúng tôi có thể mở rộng ý tưởng này để bao gồm
các trường hợp trong đó ti giao dịch có chứa các mặt hàng khác. Ví dụ, chúng ta hãy item y là
một mục trong ti với xác suất PTI (y). Nếu quan sát của item x và y item
là độc lập done1, sau đó có bốn thế giới có thể. Xác suất của
thế giới, trong đó có cả mặt hàng ti x và y, ví dụ, là PTI (x) · PTI (y).
Chúng tôi có thể tiếp tục mở rộng các ý tưởng để trang trải các bộ dữ liệu có chứa nhiều hơn một
giao dịch. Hình 1 minh họa 16 thế giới có thể có nguồn gốc từ các bệnh nhân
ghi hiển thị trong Bảng 1. Trong khai thác tập phổ biến truyền thống, sự hỗ trợ
của một số itemset X được định nghĩa là số lượng giao dịch có chứa
X. Đối với một bộ dữ liệu chắc chắn, một giá trị hỗ trợ như vậy là không xác định vì chúng ta không
biết trong thế giới thực một giao dịch có chứa X một cách chắc chắn. Chúng tôi có thể,
tuy nhiên, xác định sự hỗ trợ của X đối với bất kỳ trên thế giới có thể được đưa ra với.
Chúng ta hãy xem xét thế giới thể hiện trong hình 1, sự hỗ trợ của tập phổ biến AB trong
thế giới W1 và W6 là 2 và 1, tương ứng. Nếu chúng ta có thể xác định xác suất
của mỗi thế giới có thể và sự hỗ trợ của một itemset X ở mỗi thế giới, chúng ta có thể
xác định sự hỗ trợ dự kiến của X.
Định nghĩa 1. Một itemset X là thường xuyên nếu và chỉ nếu hỗ trợ dự kiến của nó không
ít hơn ρs · d, nơi ρs là một ngưỡng hỗ trợ người dùng chỉ định.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.