relation between these labels, such as C1 ≺ C2 ≺ ••• C J , where≺ denotes the given order between different ranks. If a pattern xn belongs to category C j , it is classified automatically intolower-order categories (C1 , C2 , ... , C j −1) as well. Therefore, the target vector of xn , using the ordered partitions encoding, is yˆ (xn ) = (0, 0, ... , 0, 1, 1, 1), where yˆi (xn ) = 0 for all 1 ≤ i < j and yˆi (xn ) = 1 otherwise, as shown in Table I.The formulation of the target vector is similar to theperceptron approach [29]. It is also related to the classical cumulative probit model for OR [10], in the sense that the output nodes vector is considered as a certain estima- tion of the cumulative probability distribution of the classes (a monotonicity constraint in the weights connecting the hid- den layer to the output layer is imposed). The estimation of the cumulative and the posterior probability is further discussed in Section III-C.B. Neural Network ModelAs previously stated, the model proposed is very similar to a standard classification neural network model but including the monotonicity constraints. For this reason, the model iscomposed by J potential functions: f j (x) : RK → R withj = 1, ... , J , and a hidden layer (with corresponding basisfunctions). The final output of each class can be describedwith the following: Sf j (xn ) = β j • B (x ) (3) Fig. 2. Structure of the probabilistic neural network model proposed. s s n s =1 where S is the number of neurons, Bs (xn ) is a nonlinear mapping from the input layer to the hidden layer (basis functions s = 1, ... , S:⎧ β 1 1 functions), and β j = β j ,β j , ... , β j is the connection s = s⎪ β 2 = 1 + 2 1 2 S ⎨ s s s (6) weights between the hidden layer and the j th output node. Inthis paper, the selected basis function is the sigmoidal function. ... ⎪ ⎩ β J 1 2 J ThereforeBs (xn ) = σ K wsi x (i ) , σ (x ) = 11 + e−x (4) subject to s = s + s + ••• + s j ≥ 0 ∀ j = 1, ... , J. i =1 In this way, the parameters are restricted to assume positive 1 2 J J where ws = [ws 1, ws 2, ..., wsK ] is the connection weightsbetween the input layer and the sth basis function. Fig. 2represents the structure of the model proposed. From now on, values β s = (βs , βs , ... , βs ) ∈ R+.Equation (6) can be expressed in matrix form, for the sthbasis function, as follows: the model parameters set is defined as z = {w, β }, where ⎛ β 1 ⎞ ⎛ 1 0 0 ⎞ ⎛ 1 ⎞ w ∈ RS × RK and β ∈ RS × R J . ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ Note that the parameters connecting the hidden layer to the ⎝ ••• ⎠ = ⎝ ••• ••• 0 ⎠ • ⎝ ••• ⎠ (7) j j =1,..., J J 1 ••• 1 J output layer {βs }s =1,...,S in a nominal classification model arenot ordered. In this proposal, they are considered to be ordered and the following constraint ensures their monotonicity:
or, in the reduced form
β s = C • s ∀s = 1, ... , S (8)
β j j +1
s ≤ βs ∀ j = 1, ... , J.
∀s = 1, ... , S. (5)
where C ∈ R J × R J , the column vector s ∈ R J . Then,
the matrix form considering all the vectors β s of all basis
functions is provided as
Because the inequality (5) is defined for each pair of
⎛ β 1
1 ⎞ ⎛
1 0 0 ⎞ ⎛ 1 1 ⎞
parameters of each basis function, the parameters of different basis functions have their own structure. The monotonicity
1 ••• βS
⎜ ••• ⎟
⎜ ••• ••• 0 ⎟
1 ••• S
⎜ ••• ⎟
condition can be reformulated as follows, for all the basis
⎝
J ••• β J
⎠ = ⎝
1 ••• 1
⎠ • ⎝
J
1
••• J
(9)
⎠
đang được dịch, vui lòng đợi..
