DTREG number of predictor categorie

DTREG
number of predictor categories exceeds a threshold that you can specify on the Model
Design property page (see page 33). This technique uses cluster analysis to group the
categories of the target variable into two groups. DTREG is then able to try only (k-1)
splits, where k is the number of predictor categories.
Once DTREG has evaluated each possible split for each possible predictor variable, a
node is split using the best split found. The runner-up splits are remembered and
displayed as “Competitor Splits” in the report.
Evaluating Splits
The ideal split would divide a group into two child groups in such a way so that all of the
rows in the left child have the same value on the target variable and all of the rows in the
right group have the same target value – but different from the left group. If such a split
can be found, then you can exactly and perfectly classify all of the rows by using just that
split, and no further splits are necessary or useful. Such a perfect split is possible only if
the rows in the node being split have only two possible values on the target variable.
Unfortunately, perfect splits do not occur often, so it is necessary to evaluate and
compare the quality of imperfect splits. Various criteria have been proposed for
evaluating splits, but they all have the same basic goal which is to favor homogeneity
within each child node and heterogeneity between the child nodes. The heterogeneity –
or dispersion – of target categories within a node is called the “node impurity”. The goal
of splitting is to produce child nodes with minimum impurity.
The impurity of every node is calculated by examining the distribution of categories of
the target variable for the rows in the group. A “pure” node, where all rows have the
same value of the target variable, has an impurity value of 0 (zero). When a potential
split is evaluated, the probability-weighted average of the impurities of the two child
nodes is subtracted from the impurity of the parent node. This reduction in impurity is
called the improvement of the split. The split with the greatest improvement is the one
used. Improvement values for splits are shown in the node information that is part of the
report generated by DTREG.
DTREG provides two methods for evaluating the quality of splits when building
classification trees, (1) Gini and (2) Entropy,. Only one method is provided when
building regression trees, and that is minimum variance within nodes. The minimum
variance/least squares criteria is essential the same criteria used by traditional, numeric
regression analysis (i.e., line and function fitting).
Experience has shown that the splitting criterion is not very important, and Gini and
Entropy yield trees that are very similar. Gini is considered slightly better than Entropy,
so it is the default criteria used for classification trees. See Breiman, Friedman, Olshen
and Stone Classification And Regression Trees (1984) for a technical description of the
Gini and Entropy criteria.

DTREG 
number of predictor categories exceeds a threshold that you can specify on the Model
Design property page (see page 33). This technique uses cluster analysis to group the
categories of the target variable into two groups. DTREG is then able to try only (k-1)
splits, where k is the number of predictor categories.
Once DTREG has evaluated each possible split for each possible predictor variable, a
node is split using the best split found. The runner-up splits are remembered and
displayed as “Competitor Splits” in the report.
Evaluating Splits
The ideal split would divide a group into two child groups in such a way so that all of the
rows in the left child have the same value on the target variable and all of the rows in the
right group have the same target value – but different from the left group. If such a split
can be found, then you can exactly and perfectly classify all of the rows by using just that
split, and no further splits are necessary or useful. Such a perfect split is possible only if
the rows in the node being split have only two possible values on the target variable.
Unfortunately, perfect splits do not occur often, so it is necessary to evaluate and
compare the quality of imperfect splits. Various criteria have been proposed for
evaluating splits, but they all have the same basic goal which is to favor homogeneity
within each child node and heterogeneity between the child nodes. The heterogeneity –
or dispersion – of target categories within a node is called the “node impurity”. The goal
of splitting is to produce child nodes with minimum impurity.
The impurity of every node is calculated by examining the distribution of categories of
the target variable for the rows in the group. A “pure” node, where all rows have the
same value of the target variable, has an impurity value of 0 (zero). When a potential
split is evaluated, the probability-weighted average of the impurities of the two child
nodes is subtracted from the impurity of the parent node. This reduction in impurity is
called the improvement of the split. The split with the greatest improvement is the one
used. Improvement values for splits are shown in the node information that is part of the
report generated by DTREG.
DTREG provides two methods for evaluating the quality of splits when building
classification trees, (1) Gini and (2) Entropy,. Only one method is provided when
building regression trees, and that is minimum variance within nodes. The minimum
variance/least squares criteria is essential the same criteria used by traditional, numeric
regression analysis (i.e., line and function fitting).
Experience has shown that the splitting criterion is not very important, and Gini and
Entropy yield trees that are very similar. Gini is considered slightly better than Entropy,
so it is the default criteria used for classification trees. See Breiman, Friedman, Olshen
and Stone Classification And Regression Trees (1984) for a technical description of the
Gini and Entropy criteria.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

DTREG vượt quá số lượng thể loại yếu tố dự báo một ngưỡng mà bạn có thể chỉ định trên các mô hìnhThiết kế trang thuộc tính (xem trang 33). Sử dụng kỹ thuật này cụm phân tích để nhóm cácloại mục tiêu biến thành hai nhóm. DTREG sau đó có thể cố gắng duy nhất (k-1)««chia tách, nơi k là một số loại dự báo.Sau khi DTREG đã đánh giá mỗi có thể chia cho mỗi yếu tố dự báo có thể biến, mộtnút được chia bằng cách sử dụng phần tách tốt nhất tìm thấy. Á hậu 1 tách được nhớ vàHiển thị dưới dạng "Đối thủ cạnh tranh chia tách" trong báo cáo.Đánh giá chia táchSự chia rẽ lý tưởng sẽ chia một nhóm thành hai nhóm trẻ em trong một cách để tất cả cáchàng trong con trái có cùng một giá trị trên các biến mục tiêu và tất cả các hàng trong cácNhóm phải có cùng một giá trị mục tiêu-nhưng khác nhau từ nhóm còn lại. Nếu sự chia rẽcó thể được tìm thấy, sau đó bạn có thể chính xác và hoàn toàn phân loại tất cả các hàng bằng cách sử dụng chỉ làSplit, và không tiếp tục chia tách là cần thiết hoặc hữu ích. Một tách hoàn hảo có thể chỉ khihàng trong các nút được tách ra có giá trị có thể chỉ có hai trên các biến mục tiêu.Thật không may, chia tách hoàn hảo không xảy ra thường xuyên, do đó, nó là cần thiết để đánh giá vàHãy so sánh các chất lượng hoàn hảo chia tách. Tiêu chí khác nhau đã được đề xuấtđánh giá chia tách, nhưng họ tất cả có cùng một mục tiêu cơ bản là để ưu tiên tính đồng nhấttrong vòng mỗi trẻ em nút và heterogeneity giữa các nút con. Heterogeneity-hoặc phân tán-loại mục tiêu trong một nút được gọi là "nút tạp chất". Mục tiêuchia tách là để sản xuất các nút con với tối thiểu tạp chất.Tạp chất của mỗi nút tính toán bằng cách kiểm tra việc phân phối các loạiCác biến mục tiêu cho các hàng trong nhóm. Một nút "tinh khiết", nơi tất cả các hàng có cáccùng một giá trị của biến mục tiêu, có giá trị tạp chất của 0 (zero). Khi một tiềm năngSplit được đánh giá, xác suất-trọng là tạp chất của hai đứa trẻnút được trừ đi từ tạp chất của nút phụ huynh. Điều này giảm tạp chất làgọi là cải thiện sự chia rẽ. Chia rẽ với cải tiến lớn nhất là một trong nhữngđược sử dụng. Cải thiện giá trị cho chia tách được hiển thị trong thông tin nút là một phần của cácbáo cáo được tạo ra bởi DTREG.DTREG cung cấp hai phương pháp để đánh giá chất lượng của phần tách khi xây dựngphân loại cây, (1) Gini và (2) dữ liệu ngẫu nhiên. Chỉ có một phương pháp được cung cấp khixây dựng cây hồi qui, và đó là tối thiểu phương sai trong các nút. Tối thiểuphương sai/ít nhất hình vuông tiêu chí là điều cần thiết các tiêu chí tương tự được sử dụng bởi truyền thống, sốphân tích hồi quy (tức là, đường và chức năng phù hợp).Kinh nghiệm cho thấy rằng các tiêu chí chia tách không phải là rất quan trọng, và Gini vàDữ liệu ngẫu nhiên năng suất cây có rất giống nhau. Gini được coi là tốt hơn một chút so với dữ liệu ngẫu nhiên,Vì vậy, nó là tiêu chuẩn mặc định được sử dụng để phân loại cây. Xem Breiman, Friedman, Olshenvà đá phân loại và hồi qui cây (1984) cho một mô tả kỹ thuật của cácTiêu chí Gini và dữ liệu ngẫu nhiên.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.