Byte order encoding schemes[edit]UT

Byte order encoding schemes[edit]
UTF-16 and UCS-2 produce a sequence of 16-bit code units. Since most communication and storage protocols are defined for bytes, and each unit thus takes two 8-bit bytes, and the order of the bytes may depend on the endianness (byte order) of the computer architecture.

To assist in recognizing the byte order of code units, UTF-16 allows a Byte Order Mark (BOM), a code point with the value U+FEFF, to precede the first actual coded value.[7] (U+FEFF is the invisible zero-width non-breaking space/ZWNBSP character.)[8] If the endian architecture of the decoder matches that of the encoder, the decoder detects the 0xFEFF value, but an opposite-endian decoder interprets the BOM as the non-character value U+FFFE reserved for this purpose. This incorrect result provides a hint to perform byte-swapping for the remaining values. If the BOM is missing, RFC 2781 says that big-endian encoding should be assumed. (In practice, due to Windows using little-endian order by default, many applications similarly assume little-endian encoding by default.) If there is no BOM, one method of recognizing a UTF-16 encoding is searching for the space character (U+0020) which is very common in texts in most languages.

The standard also allows the byte order to be stated explicitly by specifying UTF-16BE or UTF-16LE as the encoding type. When the byte order is specified explicitly this way, a BOM is specifically not supposed to be prepended to the text, and a U+FEFF at the beginning should be handled as a ZWNBSP character. Many applications ignore the BOM code at the start of any Unicode encoding. Web browsers often use a BOM as a hint in determining the character encoding.[9]

For Internet protocols, IANA has approved "UTF-16", "UTF-16BE", and "UTF-16LE" as the names for these encodings. (The names are case insensitive.) The aliases UTF_16 or UTF16 may be meaningful in some programming languages or software applications, but they are not standard names in Internet protocols.

Similar designations, UCS-2, UCS-2BE and UCS-2LE, are used to imitate the UTF-16 labels and behaviour. However, "UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard.

To assist in recognizing the byte order of code units, UTF-16 allows a Byte Order Mark (BOM), a code point with the value U+FEFF, to precede the first actual coded value.[7] (U+FEFF is the invisible zero-width non-breaking space/ZWNBSP character.)[8] If the endian architecture of the decoder matches that of the encoder, the decoder detects the 0xFEFF value, but an opposite-endian decoder interprets the BOM as the non-character value U+FFFE reserved for this purpose. This incorrect result provides a hint to perform byte-swapping for the remaining values. If the BOM is missing, RFC 2781 says that big-endian encoding should be assumed. (In practice, due to Windows using little-endian order by default, many applications similarly assume little-endian encoding by default.) If there is no BOM, one method of recognizing a UTF-16 encoding is searching for the space character (U+0020) which is very common in texts in most languages.

The standard also allows the byte order to be stated explicitly by specifying UTF-16BE or UTF-16LE as the encoding type. When the byte order is specified explicitly this way, a BOM is specifically not supposed to be prepended to the text, and a U+FEFF at the beginning should be handled as a ZWNBSP character. Many applications ignore the BOM code at the start of any Unicode encoding. Web browsers often use a BOM as a hint in determining the character encoding.[9]

For Internet protocols, IANA has approved "UTF-16", "UTF-16BE", and "UTF-16LE" as the names for these encodings. (The names are case insensitive.) The aliases UTF_16 or UTF16 may be meaningful in some programming languages or software applications, but they are not standard names in Internet protocols.

Similar designations, UCS-2, UCS-2BE and UCS-2LE, are used to imitate the UTF-16 labels and behaviour. However, "UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

Byte thứ tự mã hóa đề án [sửa]UTF-16 và UCS-2 tạo ra một chuỗi các 16-bit mã đơn vị. Kể từ khi hầu hết các thông tin liên lạc và lí giao thức được xác định cho byte, và mỗi đơn vị như vậy có hai 8-bit byte, và thứ tự của byte đầu có thể phụ thuộc vào endianness (byte thứ tự) của kiến trúc máy tính.Để hỗ trợ trong công nhận thứ tự byte đơn vị mã, UTF-16 cho phép một Byte đơn đặt hàng Mark (BOM), một mã số điểm với giá trị U + FEFF, để ưu tiên đầu tiên giá trị mã thực tế. [7] (U + FEFF là vô hình 0-chiều rộng-breaking không gian/ZWNBSP nhân.) [8] Nếu kiến trúc về cuối của các bộ giải mã phù hợp của các bộ mã hóa, các bộ giải mã phát hiện giá trị 0xFEFF, nhưng một bộ giải mã về cuối đối diện dịch HĐQT như là giá trị không phải là ký tự U + FFFE dành riêng cho mục đích này. Kết quả không chính xác này cung cấp một gợi ý để thực hiện trao đổi byte cho các giá trị còn lại. Nếu Hội đồng quản trị là mất tích, RFC 2781 nói rằng lớn về cuối mã hóa cần được giả định. (Trong thực tế, do Windows bằng cách sử dụng đơn đặt hàng về cuối nhỏ theo mặc định, nhiều ứng dụng tương tự như vậy giả định về cuối nhỏ mã hoá theo mặc định.) Nếu không có không có hội đồng quản trị, một phương pháp công nhận một mã hóa UTF-16 tìm kiếm không gian ký tự (U + 0020) mà là rất phổ biến trong các văn bản trong hầu hết các ngôn ngữ.Tiêu chuẩn cũng cho phép lệnh byte được nêu rõ ràng bằng cách xác định UTF-16BE hoặc UTF-16LE như kiểu mã hóa. Khi thứ tự byte được chỉ định một cách rõ ràng bằng cách này, một hội đồng quản trị cụ thể không phải là nghĩa vụ phải được prepended để các văn bản, và U + FEFF lúc đầu nên được xử lý như một nhân vật ZWNBSP. Nhiều ứng dụng bỏ qua mã BOM lúc bắt đầu của bất kỳ mã hóa Unicode. Trình duyệt web thường sử dụng một hội đồng quản trị như là một gợi ý trong việc xác định các bảng mã ký tự. [9]Cho giao thức Internet, IANA đã chấp thuận "UTF-16", "UTF-16BE", và "UTF-16LE" là tên của các mã hóa. (Tên là trường hợp insensitive.) Bí danh UTF_16 hay UTF16 có thể có ý nghĩa trong một số ngôn ngữ lập trình hoặc ứng dụng phần mềm, nhưng họ không phải tên tiêu chuẩn trong giao thức Internet.Tương tự như tên gọi, UCS-2, UCS-2BE và UCS-2LE, được sử dụng để bắt chước các nhãn UTF-16 và hành vi. Tuy nhiên, "UCS-2 nên bây giờ được coi là đã lỗi thời. Nó không còn đề cập đến một hình thức mã hóa trong chế độ 10646 hoặc tiêu chuẩn Unicode.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.