other trees might be skewed in one

other trees might be skewed in one part of the tree. The hybrid approach adapts
well to all types of classification trees. If the decision tree is skinny, the hybrid
approach will just stay with the Synchronous Tree Construction Approach. On
the other hand, it will shift to the Partitioned Tree Construction Approach as
soon as the tree becomes bushy. If the tree has a big variance in depth, the
hybrid approach will perform dynamic load balancing with processor groups to
reduce processor idling.
Handling Continuous Attributes The approaches described above concentrated
primarily on how the tree is constructed in parallel with respect to the issues
of load balancing and reducing communication overhead. The discussion was
simplified by the assumption of absence of continuous-valued attributes. Pres-
ence of continuous attributes can be handled in two ways. One is to perform
intelligent discretization, either once in the beginning or at each node as the
tree is being induced, and treat them as categorical attributes. Another, more
popular approach is to use decisions of the form A < x and A ≥ x, directly
on the values x of continuous attribute A. The decision value of x needs to be
determined at each node. For efficient search of x, most algorithms require the
attributes to be sorted on values, such that one linear scan can be done over
all the values to evaluate the best decision. Among various different algorithms,
the approach taken by SPRINT algorithm[SAM96], which sorts each continuous
attribute only once in the beginning, is proven to be efficient for large datasets.
The sorted order is maintained throughout the induction process, thus avoiding
the possibly excessive costs of re-sorting at each node. A separate list is kept for
each of the attributes, in which the record identifier is associated with each sorted
value. The key step in handling continuous attributes is the proper assignment
of records to the children node after a splitting decision is made. Implementation
of this offers the design challenge. SPRINT builds a mapping between a record
identifier and the node to which it goes to based on the splitting decision. The
mapping is implemented as a hash table and is probed to split the attribute lists
in a consistent manner.
Parallel formulation of the SPRINT algorithm falls under the category of
synchronous tree construction design. The multiple sorted lists of continuous
attributes are split in parallel by building the entire hash table on all the proces-
sors. However, with this simple-minded way of achieving a consistent split, the
algorithm incurs a communication overhead of O(N) per processor. Since, the se-
rial runtime of the induction process is O(N), SPRINT becomes unscalable with
respect to runtime. It is unscalable in memory requirements also, because the to-
tal memory requirement per processor is O(N), as the size of the hash table is of
the same order as the size of the training dataset for the upper levels of the deci-
sion tree, and it resides on every processor. Another parallel algorithm, ScalParC
[JKK98], solves this scalability problem. It employs a distributed hash table to
achieve a consistent split. The communication structure, used to construct and
access this hash table, is motivated by the parallel sparse matrix-vector multipli-
cation algorithms. It is shown in [JKK98] that with the proper implementation
of the parallel hashing, the overall communication overhead does not exceed

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

other trees might be skewed in one part of the tree. The hybrid approach adaptswell to all types of classification trees. If the decision tree is skinny, the hybridapproach will just stay with the Synchronous Tree Construction Approach. Onthe other hand, it will shift to the Partitioned Tree Construction Approach assoon as the tree becomes bushy. If the tree has a big variance in depth, thehybrid approach will perform dynamic load balancing with processor groups toreduce processor idling.Handling Continuous Attributes The approaches described above concentratedprimarily on how the tree is constructed in parallel with respect to the issuesof load balancing and reducing communication overhead. The discussion wassimplified by the assumption of absence of continuous-valued attributes. Pres-ence of continuous attributes can be handled in two ways. One is to performintelligent discretization, either once in the beginning or at each node as thetree is being induced, and treat them as categorical attributes. Another, morepopular approach is to use decisions of the form A < x and A ≥ x, directlyon the values x of continuous attribute A. The decision value of x needs to bedetermined at each node. For efficient search of x, most algorithms require theattributes to be sorted on values, such that one linear scan can be done overall the values to evaluate the best decision. Among various different algorithms,cách tiếp cận thực hiện bởi thuật toán SPRINT [SAM96], sắp xếp mỗi liên tụcthuộc tính chỉ một lần trong đầu, được chứng minh là có hiệu quả cho datasets lớn.Thứ tự sắp xếp được duy trì trong suốt quá trình cảm ứng, như vậy tránhCác chi phí quá nhiều có thể tái phân loại tại mỗi nút. Một danh sách riêng biệt giữ chomỗi người trong số các thuộc tính, trong đó các định danh hồ sơ được liên kết với mỗi được sắp xếpgiá trị. Nhiệm vụ thích hợp là bước quan trọng trong việc xử lý các thuộc tính liên tụcHồ sơ để các nút trẻ em sau khi một quyết định chia tách được thực hiện. Thực hiệnĐiều này cung cấp những thách thức thiết kế. SPRINT xây dựng một ánh xạ giữa một kỷ lụcnhận dạng và các nút mà nó đi để dựa trên quyết định chia tách. Cáclập bản đồ được thực hiện như là một bảng băm và được thăm dò để tách các danh sách thuộc tínhmột cách nhất quán.Xây dựng song song thuật toán SPRINT té ngã theo thể loạiđồng bộ cây thiết kế xây dựng. Nhiều danh sách được sắp xếp của liên tụcthuộc tính được chia song song bằng cách xây dựng bảng băm toàn bộ trên tất cả các proces-sors. Tuy nhiên, với này cách minded để đạt được một sự chia rẽ phù hợp, cácthuật toán phải gánh chịu chi phí truyền thông của O(N) cho một bộ xử lý. Từ, se-Rial i-ran thời gian chạy của quá trình cảm ứng là O(N), SPRINT trở thành unscalable vớitôn trọng để thời gian chạy. Nó là unscalable trong yêu cầu bộ nhớ ngoài ra, bởi vì các để-Tal bộ nhớ yêu cầu mỗi bộ vi xử lý là O(N), vì kích thước của bảng băm củathe same order as the size of the training dataset for the upper levels of the deci-sion tree, and it resides on every processor. Another parallel algorithm, ScalParC[JKK98], solves this scalability problem. It employs a distributed hash table toachieve a consistent split. The communication structure, used to construct andaccess this hash table, is motivated by the parallel sparse matrix-vector multipli-cation algorithms. It is shown in [JKK98] that with the proper implementationof the parallel hashing, the overall communication overhead does not exceed

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

cây khác có thể bị mất cân bằng trong một phần của cây. Các phương pháp lai thích nghi
tốt với tất cả các loại cây phân loại. Nếu các cây quyết định là gầy, lai
tiếp cận sẽ chỉ ở lại với các phương pháp tiếp cận đồng bộ Tree Xây dựng. Trên
Mặt khác, nó sẽ chuyển sang phân vùng tiếp cận Tree Xây dựng như
ngay sau khi trở thành cây rậm rạp. Nếu cây có một phương sai lớn trong chiều sâu, các
phương pháp lai sẽ thực hiện tải động cân bằng với các nhóm vi xử lý để
giảm bộ xử lý chạy không tải.
Xử lý các thuộc tính liên tục Các phương pháp mô tả ở trên tập trung
chủ yếu vào việc làm thế nào cây được xây dựng song song đối với các vấn đề
của phụ tải cân bằng và giảm chi phí thông tin liên lạc. Các cuộc thảo luận đã được
đơn giản hóa bằng cách giả thiết về sự vắng mặt của các thuộc tính liên tục có giá trị. Áp lực
khoa của các thuộc tính liên tục có thể được xử lý theo hai cách. Một là để thực hiện
rời rạc thông minh, hoặc là một lần trong đầu hoặc tại mỗi nút là
cây đang được gây ra, và đối xử với họ như các thuộc tính phân loại. Một người khác, nhiều
cách tiếp cận phổ biến là sử dụng quyết định của mẫu A <x và A ≥ x, trực tiếp
trên giá trị x của thuộc tính liên tục A. Giá trị quyết định của x cần phải được
xác định tại mỗi nút. Để tìm kiếm hiệu quả của x, hầu hết các thuật toán yêu cầu các
thuộc tính phải được sắp xếp trên các giá trị, như vậy mà một lần quét tuyến tính có thể được thực hiện trên
tất cả các giá trị để đánh giá các quyết định tốt nhất. Trong số các thuật toán khác nhau,
phương pháp của SPRINT thuật toán [SAM96], mà sắp xếp từng liên tục
thuộc tính chỉ có một lần trong đầu, được chứng minh là có hiệu quả đối với các tập dữ liệu lớn.
Thứ tự sắp xếp được duy trì trong suốt quá trình cảm ứng, như vậy tránh được
sự có thể quá mức chi phí của việc tái phân loại tại mỗi nút. Một danh sách riêng biệt được lưu giữ cho
mỗi thuộc tính, trong đó các định danh bản ghi được kết hợp với nhau được sắp xếp
giá trị. Các bước quan trọng trong việc xử lý các thuộc tính liên tục là sự phân công hợp lý
của hồ sơ đến nút con sau khi quyết định chia tách được thực hiện. Thực hiện
điều này cung cấp những thách thức thiết kế. SPRINT xây dựng một ánh xạ giữa một kỷ lục
định danh và các nút mà nó đi vào dựa trên quyết định chia tách. Các
bản đồ được thực hiện như một bảng băm và được thăm dò để phân chia các danh sách thuộc tính
một cách nhất quán.
Xây dựng song song của thuật toán SPRINT thuộc thể loại của
đồng bộ thiết kế xây dựng cây. Các danh sách được sắp xếp liên tục nhiều của
thuộc tính được chia song song bằng cách xây dựng toàn bộ bảng băm trên tất cả các proces-
sors. Tuy nhiên, với cách này đầu óc đơn giản của việc đạt được một phân chia phù hợp, các
thuật toán phải gánh chịu một chi phí thông tin liên lạc của O (N) cho mỗi bộ vi xử lý. Kể từ khi, các se-
runtime rial của quá trình cảm ứng là O (N), SPRINT trở nên không thể leo với
sự tôn trọng thời gian chạy. Nó là không thể leo trong các yêu cầu bộ nhớ cũng được, bởi vì các to-
yêu cầu bộ nhớ tal mỗi bộ vi xử lý là O (N), như kích thước của bảng băm là của
cùng một thứ tự như kích thước của các tập dữ liệu đào tạo cho các cấp trên của deci-
cây sion, và nó cư trú trên mỗi bộ vi xử lý. Một thuật toán song song, ScalParC
[JKK98], giải quyết vấn đề khả năng mở rộng này. Nó sử dụng một bảng băm phân phối để
đạt được một sự chia rẽ phù hợp. Các cơ cấu truyền thông, được sử dụng để xây dựng và
truy cập vào bảng băm này, được thúc đẩy bởi sự thưa thớt ma trận vector song song multipli-
cation thuật toán. Nó được trình bày trong [JKK98] rằng với việc thực hiện đúng
các băm song song, trên cao truyền thông tổng thể không vượt quá

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.