We report results on the highlight generation taskin Figure 3 with ROUGE-1 and ROUGE-L (errorbars indicate the 95% confidence interval). Inboth measures, the ILP sentence baseline has thebest recall, while the ILP phrase model has thebest precision (the differences are statistically significant).F-score is higher for the phrase-basedsystem but not significantly. This can be attributedto the fact that the longer output of thesentence-based model makes the recall task easier.Average highlight lengths are shown in Table 3,and the compression rates they represent. Ourphrase model achieves the highest compressionrates, whereas the sentence-based model tends toselect long sentences even in comparison to thelead baseline. The sentence ILP model outperformsthe lead baseline with respect to recall butnot precision or F-score. The phrase ILP achievesa significantly better F-score over the lead baselinewith both ROUGE-1 and ROUGE-L.
đang được dịch, vui lòng đợi..
