Table 2. Word recognition rate for

Table 2. Word recognition rate for the continuous audio-visual database (in %)
Word recognition rates for digit audio-visual database and continuous audio-visual
database, using MS-ADBN model and MM-ADBN model, respectively, are given in Table 1
and Table 2. For the sake of comparison, word recognition rates obtained from HMM, SS-
MSHMM, SA-MSHMM, WP-DBN model and WPS-DBN model are also given.
It can be notice from Table 1 and Table 2 that:
a. For audio-only speech recognition on digit audio-visual database, under clean or
relatively clean conditions with SNRs as 20dB and 30dB, the speech recognition rates of
WP-DBN model are lower than those of triphone HMM. But the recognition rates under
20dB show that WP-DBN is more robust to noisy environments. Additionally, for
speech recognition with visual features on digit audio-visual database, WP-DBN model
performs slightly better than triphone HMM. A possible reason is that the DBN model
describes better the dynamic temporal evolution of the speech process. While WPS-
DBN model has the worse performance than triphone HMM, a possible reason is that WPS-DBN model uses single Gaussian model, triphone HMM uses Multi-Gaussian
mixture model. For audio only or video only speech recognition on continuous audio-
visual database, WPS-DBN model outperform than triphone HMM at various SNRs.
b. Because of integrating the visual features and audio features, multi-stream models have
the better performance than corresponding single stream models. For digit audio-visual
database, in the noisy environment with signal to noise ratios ranging from 0dB to
30dB, comparing with HMM, WP-DBN and WPS-DBN model, the average
improvements of 6.03%, 8.67% and 7.34% are obtained in speech recognition rate from
SA-MSHMM, MS-ADBN and MM-ADBN model respectively. As well as for continuous
audio-visual database, in clean speech, the improvements of 5.61%, 7.81% and 0.42%
respectively.
c. For digit audio-visual database, MS-ADBN model has the better performance than SS-
MSHMM and SA-MSHMM. This trend becomes even more obvious with the increasing
of noise. Since the SA-MSHMM forces audio stream and visual stream to be
synchronized at the timing boundaries of phones, while the MS-ADBN model looses
the asynchrony of both streams to word level, the recognition results show the evidence
that the MS-ADBN model describes more reasonable audio visual asynchrony in
speech. As well as for continuous audio-visual database, MM-ADBN model has the
better performance than SA-MSHMM. At clean speech environment, MM-ADBN model
has the improvement of 9.97% than SA-MSHMM in speech recognition rate.
d. It should be noticed that under all noise conditions for digit audio-visual database, the
MM-ADBN model gets worse but acceptable recognition rates than the MS-ADBN
model, while for continuous audio-visual database, MM-ADBN model outperform than
MS-ADBN model at various SNRs. At clean speech environment, the speech
recognition rate of MS-ADBN model is 35.91% higher than that of the MS-ADBN in
speech recognition rate. These are in coincidence with the speech recognition results of
the single stream WP-DBN model and WPS-DBN model in (Lv et al. 2007). Since MM-
ADBN model and WPS-DBN model are all phone models and are appropriate for large
vocabulary speech recognition. MS-ADBN model and WP-DBN model are all word
models, which cannot be properly trained for large vocabulary database, and they are
appropriate for small vocabulary speech recognition, since they can be properly trained.

Table 2. Word recognition rate for the continuous audio-visual database (in %) 
Word recognition rates for digit audio-visual database and continuous audio-visual 
database, using MS-ADBN model and MM-ADBN model, respectively, are given in Table 1 
and Table 2. For the sake of comparison, word recognition rates obtained from HMM, SS-
MSHMM, SA-MSHMM, WP-DBN model and WPS-DBN model are also given. 
It can be notice from Table 1 and Table 2 that: 
a. For audio-only speech recognition on digit audio-visual database, under clean or 
relatively clean conditions with SNRs as 20dB and 30dB, the speech recognition rates of 
WP-DBN model are lower than those of triphone HMM. But the recognition rates under 
20dB show that WP-DBN is more robust to noisy environments. Additionally, for 
speech recognition with visual features on digit audio-visual database, WP-DBN model 
performs slightly better than triphone HMM. A possible reason is that the DBN model 
describes better the dynamic temporal evolution of the speech process. While WPS-
DBN model has the worse performance than triphone HMM, a possible reason is that WPS-DBN model uses single Gaussian model, triphone HMM uses Multi-Gaussian 
mixture model. For audio only or video only speech recognition on continuous audio-
visual database, WPS-DBN model outperform than triphone HMM at various SNRs. 
b. Because of integrating the visual features and audio features, multi-stream models have 
the better performance than corresponding single stream models. For digit audio-visual 
database, in the noisy environment with signal to noise ratios ranging from 0dB to 
30dB, comparing with HMM, WP-DBN and WPS-DBN model, the average 
improvements of 6.03%, 8.67% and 7.34% are obtained in speech recognition rate from 
SA-MSHMM, MS-ADBN and MM-ADBN model respectively. As well as for continuous 
audio-visual database, in clean speech, the improvements of 5.61%, 7.81% and 0.42% 
respectively. 
c. For digit audio-visual database, MS-ADBN model has the better performance than SS-
MSHMM and SA-MSHMM. This trend becomes even more obvious with the increasing 
of noise. Since the SA-MSHMM forces audio stream and visual stream to be 
synchronized at the timing boundaries of phones, while the MS-ADBN model looses 
the asynchrony of both streams to word level, the recognition results show the evidence 
that the MS-ADBN model describes more reasonable audio visual asynchrony in 
speech. As well as for continuous audio-visual database, MM-ADBN model has the 
better performance than SA-MSHMM. At clean speech environment, MM-ADBN model 
has the improvement of 9.97% than SA-MSHMM in speech recognition rate. 
d. It should be noticed that under all noise conditions for digit audio-visual database, the 
MM-ADBN model gets worse but acceptable recognition rates than the MS-ADBN 
model, while for continuous audio-visual database, MM-ADBN model outperform than 
MS-ADBN model at various SNRs. At clean speech environment, the speech 
recognition rate of MS-ADBN model is 35.91% higher than that of the MS-ADBN in 
speech recognition rate. These are in coincidence with the speech recognition results of 
the single stream WP-DBN model and WPS-DBN model in (Lv et al. 2007). Since MM-
ADBN model and WPS-DBN model are all phone models and are appropriate for large 
vocabulary speech recognition. MS-ADBN model and WP-DBN model are all word 
models, which cannot be properly trained for large vocabulary database, and they are 
appropriate for small vocabulary speech recognition, since they can be properly trained.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

Bảng 2. Tỷ lệ công nhận từ cơ sở dữ liệu nghe nhìn liên tục (tính bằng %) Từ công nhận tỷ giá cho các chữ số cơ sở dữ liệu nghe nhìn và nghe liên tục cơ sở dữ liệu, sử dụng MS-ADBN mô hình và mô hình MM-ADBN, tương ứng, được đưa ra trong bảng 1 và bảng 2. Vì lợi ích của so sánh, từ công nhận tỷ lệ thu được từ HMM, SS-MSHMM, SA-MSHMM, WP-DBN mẫu và mô hình WPS-DBN cũng được đưa ra. Nó có thể là thông báo từ bảng 1 và bảng 2 mà: a. cho nhận dạng giọng nói chỉ có âm thanh trên cơ sở dữ liệu nghe nhìn chữ, nhỏ hơn làm sạch hoặc Các điều kiện tương đối sạch sẽ với SNRs như 20dB và 30dB, bài phát biểu công nhận tỷ lệ WP-DBN mô hình đang thấp hơn so với những người triphone HMM. Nhưng công nhận tỷ giá theo 20dB Hiển thị rằng WP-DBN là mạnh mẽ hơn cho môi trường ồn ào. Ngoài ra, cho nhận dạng tiếng nói với các tính năng trực quan trên cơ sở dữ liệu nghe nhìn chữ số, mô hình WP-DBN thực hiện tốt hơn một chút so với triphone HMM. Một lý do có thể là các mô hình DBN Mô tả tốt hơn sự phát triển thời gian năng động của quá trình bài phát biểu. Trong khi WPS -DBN mô hình có hiệu suất tồi tệ hơn triphone HMM, một lý do có thể là mô hình WPS-DBN sử dụng mẫu đơn Gaussian, triphone HMM sử dụng đa Gaussian Mô hình hỗn hợp. Âm thanh duy nhất hoặc video chỉ phát biểu công nhận ngày liên tục âm thanh-cơ sở dữ liệu trực quan, WPS-DBN mô hình tốt hơn so với triphone HMM tại SNRs khác nhau. b. vì tích hợp các tính năng trực quan và tính năng âm thanh, nhiều dòng Model có hiệu suất tốt hơn so với mô hình duy nhất dòng tương ứng. Cho chữ nghe cơ sở dữ liệu, trong môi trường ồn ào với các tín hiệu đến tiếng ồn tỷ lệ khác nhau, từ 0dB để 30dB, so sánh với HMM, mô hình WP-DBN và WPS-DBN, Trung bình Các cải tiến của 6,03%, 8.67% và 7.34% thu được trong bài phát biểu công nhận giá từ SA-MSHMM, MS-ADBN và MM-ADBN mô hình tương ứng. Cũng như đối với liên tục cơ sở dữ liệu nghe nhìn, trong sạch phát biểu, những cải tiến của 5,61%, 7,81% và 0,42% tương ứng. c. đối với cơ sở dữ liệu nghe nhìn chữ số, MS-ADBN mô hình có hiệu suất tốt hơn so với SS-MSHMM và SA-MSHMM. Xu hướng này trở nên rõ ràng hơn với các ngày càng tăng tiếng ồn. Kể từ khi SA-MSHMM lực lượng dòng suối âm thanh và hình ảnh dòng được đồng bộ hóa tại ranh giới thời gian loại điện thoại, trong khi model MS-ADBN thua asynchrony cả hai dòng suối từ cấp độ, các kết quả công nhận cho những bằng chứng mẫu MS-ADBN mô tả hợp lý hơn âm thanh trực quan asynchrony trong bài phát biểu. Cũng như cơ sở dữ liệu liên tục nghe nhìn, MM-ADBN mô hình có các hiệu suất tốt hơn so với SA-MSHMM. Tại môi trường sạch bài phát biểu, mô hình MM-ADBN có sự cải thiện của 9,97% so với SA-MSHMM trong bài phát biểu công nhận tỷ lệ. mất nó nên nhận thấy rằng những điều kiện tất cả tiếng ồn cho cơ sở dữ liệu nghe nhìn chữ số, các Mô hình MM-ADBN bị nặng hơn nhưng mức giá chấp nhận được công nhận hơn so với MS-ADBN Mô hình, trong khi đối với cơ sở dữ liệu liên tục nghe nhìn, MM-ADBN mô hình tốt hơn so với MS-ADBN các mô hình ở nhiều SNRs. Tại môi trường sạch bài phát biểu, bài phát biểu công nhận tỷ lệ MS-ADBN Model là 35.91% cao hơn của MS-ADBN trong bài phát biểu công nhận tỷ lệ. Đây là sự trùng hợp với bài phát biểu công nhận kết quả Mô hình đơn dòng WP-DBN và WPS-DBN mẫu trong (Lv et al. 2007). Kể từ khi MM-ADBN mẫu và mô hình WPS-DBN tất cả các mô hình điện thoại và thích hợp cho lớn nhận dạng giọng nói từ vựng. MS-ADBN mô hình và WP-DBN mô hình là tất cả từ Các mô hình, mà không thể được đào tạo đúng cho cơ sở dữ liệu lớn từ vựng, và họ là thích hợp cho các từ vựng nhỏ nhận dạng giọng nói, kể từ khi họ có thể được đào tạo đúng.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

Bảng 2. Lời nhận tỷ lệ cho các cơ sở dữ liệu nghe nhìn liên tục (theo%)
Lời lãi nhận cho cơ sở dữ liệu nghe nhìn chữ số và nghe nhìn liên tục
cơ sở dữ liệu, sử dụng mô hình MS-ADBN và mô hình MM-ADBN, tương ứng, được đưa ra trong Bảng 1
và Bảng 2. Vì lợi ích của so sánh, tỷ lệ nhận lời thu được từ HMM, SS-
MSHMM, SA-MSHMM, model WP-DBN và mô hình WPS-DBN cũng được đưa ra.
Nó có thể được thông báo từ Bảng 1 và Bảng 2 mà :
a. Đối với nhận dạng giọng nói chỉ có âm thanh trên chữ số cơ sở dữ liệu nghe nhìn, dưới sạch hoặc
các điều kiện tương đối sạch với SNRs là 20dB và 30dB, tỷ lệ nhận dạng giọng nói của
model WP-DBN là thấp hơn so với triphone HMM. Nhưng mức công nhận dưới
20dB thấy WP-DBN là mạnh mẽ hơn với môi trường ồn ào. Ngoài ra, để
nhận dạng giọng nói với các tính năng trực quan trên cơ sở dữ liệu chữ số nghe nhìn, WP-DBN mô hình
thực hiện tốt hơn một chút so với triphone HMM. Một lý do có thể là mô hình DBN
mô tả tốt hơn sự tiến hóa theo thời gian năng động của quá trình bài phát biểu. Trong khi WPS-
DBN mô hình có hiệu suất tồi tệ hơn triphone HMM, một lý do có thể là WPS-DBN mô hình sử dụng mô hình Gaussian đơn, triphone HMM sử dụng Multi-Gaussian
mô hình hỗn hợp. Đối với chỉ có âm thanh hoặc video chỉ nhận dạng giọng nói trên liên tục audio-
cơ sở dữ liệu trực quan, WPS-DBN mô hình làm tốt hơn so với triphone HMM tại SNRs khác nhau.
B. Do tích hợp các tính năng trực quan và các tính năng âm thanh, các mô hình đa luồng có
hiệu suất tốt hơn so với mô hình tương ứng với dòng duy nhất. Đối với chữ số nghe nhìn
cơ sở dữ liệu, trong môi trường ồn ào với tín hiệu để tỷ lệ tiếng ồn khác nhau, từ 0dB đến
30dB, so sánh với HMM, WP-DBN và mô hình WPS-DBN, trung bình
cải tiến 6,03%, 8,67% và 7,34% thu được trong tỷ lệ nhận dạng giọng nói từ
SA-MSHMM, MS-ADBN và mô hình MM-ADBN tương ứng. Cũng như cho liên tục
cơ sở dữ liệu nghe nhìn, trong bài phát biểu trong sạch, những cải tiến của 5,61%, 7,81% và 0,42%
tương ứng.
C. Đối với chữ số cơ sở dữ liệu nghe nhìn, MS-ADBN mô hình có hiệu suất tốt hơn so với SS-
MSHMM và SA-MSHMM. Xu hướng này càng trở nên rõ ràng hơn với sự gia tăng
của tiếng ồn. Kể từ khi các lực lượng SA-MSHMM dòng âm thanh và dòng hình ảnh được
đồng bộ hóa với các ranh giới thời gian của điện thoại, trong khi mô hình MS-ADBN thua
những sự không đồng bộ của cả hai dòng để cấp độ từ, kết quả ghi nhận cho thấy những bằng chứng
rằng mô hình MS-ADBN mô tả hợp lý hơn không đồng bộ âm thanh hình ảnh trong
bài phát biểu. Cũng như cho các cơ sở dữ liệu nghe nhìn liên tục, MM-ADBN mô hình có
hiệu suất tốt hơn so với SA-MSHMM. Tại môi trường nói sạch sẽ, MM-ADBN mô hình
có cải thiện 9,97% so với SA-MSHMM tỷ lệ nhận dạng giọng nói.
D. Cần nhận thấy rằng trong mọi điều kiện tiếng ồn cho chữ số cơ sở dữ liệu nghe nhìn, các
mô hình MM-ADBN được mức công nhận tồi tệ hơn nhưng chấp nhận được so với MS-ADBN
mô hình, trong khi đối với cơ sở dữ liệu nghe nhìn liên tục, MM-ADBN mô hình làm tốt hơn so với
MS- mô hình ADBN tại SNRs khác nhau. Tại môi trường nói sạch sẽ, bài phát biểu
công nhận tỷ lệ của mô hình MS-ADBN là cao hơn so với MS-ADBN trong 35,91%
tỷ lệ nhận dạng giọng nói. Đây là những sự trùng hợp với kết quả nhận dạng giọng nói của
các đơn mô hình dòng WP-DBN và mô hình WPS-DBN trong (Lv et al. 2007). Kể từ MM-
ADBN mô hình và mô hình WPS-DBN là tất cả các mẫu điện thoại và thích hợp cho lớn
công nhận từ vựng tiếng nói. MS-ADBN mô hình và mô hình WP-DBN là tất cả từ
các mô hình, trong đó có thể không được đào tạo đúng cho cơ sở dữ liệu từ vựng lớn, và họ là
thích hợp cho nhỏ nhận vốn từ vựng tiếng nói, vì chúng có thể được đào tạo đúng.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.