where wi ranges over V t3 .stop. and {wi-l,wi-2,attri} is the history, where wi de- notes the ith word in the phrase, and attri denotes the attributes that remain to be generated at position i in the phrase. The fj, where fj(a, b) E {0, 1}, are called features and capture any information in the history that might be useful for estimating p(wi[wi-1, wi-2, attri). The features used in NLG2 are described in the next section, and the feature weights aj, obtained from the Improved Iterative Scaling algorithm (Berger et al., 1996), are set to maximize the likelihood of the training data. The probability of the sequence W = wl ... wn, given the attribute set A, (and also given that its length is n) is:
đang được dịch, vui lòng đợi..
