5. DiscussionIn short, we have look

5. Discussion
In short, we have looked at three speaker-clustering approaches. The first was a standard approach using a bottom-up agglomerative clustering principle with the BIC as a merging criterion. In the second system an alternative approach was applied, also using bottom-up clustering, but the representations of the speaker clusters and the merging criteria are different. In this approach the speaker clusters were modeled by GMMs. In the clustering procedure during the merging process the universal background model was transformed into speaker-cluster GMMs using the MAP adaptation technique. The merging criterion in this case was a cross log-likelihood ratio (CLR). A totally new approach was developed within the fusion speaker-clustering system, where the speaker segments are modeled by acoustic and prosodic representations. The idea was to additionally model the speaker’s prosodic characteristics and add them to the basic acoustic information. We constructed 10 basic prosodic features derived from the energy of the audio signals, the estimated the pitch contours, and the recognized voiced-unvoiced regions in the speech, which represented the basic speech units. By adding prosodic information to the basic acoustic features the baseline clustering procedure had to be changed to work with the fusion of both representations. We performed two evaluation experiments where the overall diarization error rate was used as an assessment measure for the three tested clustering approaches. Experiments were performed on the SiBN and the COST278 BN databases. The evaluation results showed better performance for the tested systems in the SiBN case. This is due to the fact that the SiBN data included more homogeneous audio segments than the COST278 data, which resulted in an about 5% better performance for all of the clustering approaches. Furthermore, it was shown that speaker clustering, where the segments are modeled by speaker-oriented representations (speaker GMMs, prosodic features), were more stable and more reliable than the baseline system, where the segments are represented just by acoustic information. The best overall results were achieved with the fusion system, where the clustering involved joining the acoustic and prosodic features. From this it can be concluded that the proposed fusion approach aimed at improving the speaker-diarizatio
performance, especially in the case of processing BN data, where the speaker’s
speech characteristics across one BN show do not change significantly, but the speaker’s
clustering data can be biased due to different acoustic environments or background
conditions.

5. Discussion
In short, we have looked at three speaker-clustering approaches. The first was a standard approach using a bottom-up agglomerative clustering principle with the BIC as a merging criterion. In the second system an alternative approach was applied, also using bottom-up clustering, but the representations of the speaker clusters and the merging criteria are different. In this approach the speaker clusters were modeled by GMMs. In the clustering procedure during the merging process the universal background model was transformed into speaker-cluster GMMs using the MAP adaptation technique. The merging criterion in this case was a cross log-likelihood ratio (CLR). A totally new approach was developed within the fusion speaker-clustering system, where the speaker segments are modeled by acoustic and prosodic representations. The idea was to additionally model the speaker’s prosodic characteristics and add them to the basic acoustic information. We constructed 10 basic prosodic features derived from the energy of the audio signals, the estimated the pitch contours, and the recognized voiced-unvoiced regions in the speech, which represented the basic speech units. By adding prosodic information to the basic acoustic features the baseline clustering procedure had to be changed to work with the fusion of both representations. We performed two evaluation experiments where the overall diarization error rate was used as an assessment measure for the three tested clustering approaches. Experiments were performed on the SiBN and the COST278 BN databases. The evaluation results showed better performance for the tested systems in the SiBN case. This is due to the fact that the SiBN data included more homogeneous audio segments than the COST278 data, which resulted in an about 5% better performance for all of the clustering approaches. Furthermore, it was shown that speaker clustering, where the segments are modeled by speaker-oriented representations (speaker GMMs, prosodic features), were more stable and more reliable than the baseline system, where the segments are represented just by acoustic information. The best overall results were achieved with the fusion system, where the clustering involved joining the acoustic and prosodic features. From this it can be concluded that the proposed fusion approach aimed at improving the speaker-diarizatio 
performance, especially in the case of processing BN data, where the speaker’s
speech characteristics across one BN show do not change significantly, but the speaker’s
clustering data can be biased due to different acoustic environments or background
conditions.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

5. thảo luậnTrong ngắn hạn, chúng tôi đã nhìn ba cụm loa cách tiếp cận. Việc đầu tiên là một phương pháp tiêu chuẩn bằng cách sử dụng một nguyên tắc kết cụm agglomerative dưới lên với BIC là như là một tiêu chí merging. Trong hệ thống thứ hai một cách tiếp cận khác được áp dụng, cũng sử dụng dưới lên clustering, nhưng đại diện của các cụm loa và merging các tiêu chí khác nhau. Trong cách tiếp cận này các cụm loa được mô hình bởi GMMs. Trong các thủ tục kết cụm trong quá trình merging model phổ quát nền được chuyển thành loa-cụm GMMs sử dụng kỹ thuật thích ứng đồ. Các tiêu chí merging trong trường hợp này là một tỷ lệ qua đăng nhập, khả năng (CLR). Một cách tiếp cận hoàn toàn mới đã được phát triển trong các phản ứng tổng hợp cụm loa hệ thống, nơi mà các phân đoạn loa được mô hình bởi đại diện âm thanh và prosodic. Ý tưởng là để ngoài ra mô hình của người nói đặc điểm prosodic và thêm chúng vào các thông tin cơ bản của âm thanh. Chúng tôi xây dựng 10 cơ bản tính năng prosodic bắt nguồn từ năng lượng của tín hiệu âm thanh, các ước tính khoảng sân đường nét, và khu vực unvoiced lồng tiếng được công nhận trong bài phát biểu, đại diện các đơn vị cơ bản bài phát biểu. Bằng cách thêm prosodic thông tin cho các tính năng âm thanh cơ bản đường cơ sở clustering thủ tục đã được thay đổi để làm việc với sự hợp nhất của đại diện cả hai. Chúng tôi thực hiện hai thí nghiệm đánh giá nơi tỷ lệ lỗi diarization tổng thể đã được sử dụng như là một biện pháp đánh giá đối với ba thử nghiệm phương pháp tiếp cận kết cụm. Thí nghiệm đã được thực hiện trên SiBN và cơ sở dữ liệu COST278 BN. Kết quả đánh giá cho thấy hiệu suất tốt hơn cho các hệ thống được kiểm tra trong trường hợp SiBN. Điều này là do thực tế là các dữ liệu SiBN bao gồm đồng nhất hơn các phân đoạn âm thanh so với các dữ liệu COST278, kết quả là một phương pháp tiếp cận hiệu suất tốt hơn cho tất cả các cụm khoảng 5%. Hơn nữa, nó đã được hiển thị mà loa clustering, nơi mà các phân đoạn được mô hình bởi đại diện Hội đồng quản trị theo định hướng loa (loa GMMs, tính năng prosodic), đã được ổn định hơn và đáng tin cậy hơn so với hệ thống cơ bản, nơi mà các phân đoạn được đại diện chỉ bằng âm thanh thông tin. Kết quả tổng thể tốt nhất đã đạt được với các hệ thống phản ứng tổng hợp, nơi các cụm tham gia tham gia các tính năng âm thanh và prosodic. Từ này nó có thể được kết luận rằng phương pháp tiếp cận tổng hợp được đề xuất nhằm cải thiện loa-diarizatio hiệu suất, đặc biệt là trong trường hợp xử lý dữ liệu BN, nơi mà các loabài phát biểu đặc điểm trên một BN Hiển thị không thay đổi đáng kể, nhưng của người nóiclustering dữ liệu có thể được thiên vị do môi trường âm thanh khác nhau hoặc nềnđiều kiện.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.