Usage[edit]UTF-16 is used for text

Usage[edit]
UTF-16 is used for text in the OS API in Microsoft Windows 2000/XP/2003/Vista/7/8/CE.[11] Older Windows NT systems (prior to Windows 2000) only support UCS-2.[12] In Windows XP, no code point above U+FFFF is included in any font delivered with Windows for European languages.[13][14] Files and network data tend to be a mix of UTF-16, UTF-8, and legacy byte encodings.

IBM iSeries systems designate code page CCSID 13488 for UCS-2 character encoding, CCSID 1200 for UTF-16 encoding, and CCSID 1208 for UTF-8 encoding.[15]

UTF-16 is used by the Qualcomm BREW operating systems; the .NET environments; and the Qt cross-platform graphical widget toolkit.

Symbian OS used in Nokia S60 handsets and Sony Ericsson UIQ handsets uses UCS-2. iPhone handsets use UTF-16 for Short Message Service instead of UCS-2 described in the 3GPP TS 23.038 (GSM) and IS-637 (CDMA) standards.[16]

The Joliet file system, used in CD-ROM media, encodes file names using UCS-2BE (up to sixty-four Unicode characters per file name).

The Python language environment officially only uses UCS-2 internally since version 2.0, but the UTF-8 decoder to "Unicode" produces correct UTF-16. Since Python 2.2, "wide" builds of Unicode are supported which use UTF-32 instead;[17] these are primarily used on Linux. Python 3.3 no longer ever uses UTF-16, instead strings are stored in one of ASCII/Latin-1, UCS-2, or UTF-32, depending on which code points are in the string, with a UTF-8 version also included so that repeated conversions to UTF-8 are fast.[18]

Java originally used UCS-2, and added UTF-16 supplementary character support in J2SE 5.0.

In many languages quoted strings need a new syntax for quoting non-BMP characters, as the "uXXXX" syntax explicitly limits itself to 4 hex digits. The most common (used by C#, D and several other languages) is to use an upper-case 'U' with 8 hex digits such as "U0001D11E"[19] In Java 7 regular expressions and ICU and Perl, the syntax "x{1D11E}" must be used. In many other cases (such as Java outside of regular expressions)[20] the only way to get non-BMP characters is to enter the surrogate halves individually, for example: "uD834uDD1E" for U+1D11E.

These implementations all return the number of 16-bit code units rather than the number of Unicode code points when the equivalent of strlen() is used on their strings, and indexing into a string returns the indexed code unit, not the indexed code point,[21][22][23] this leads some people to claim that UTF-16 is not supported. However the term "character" is defined and used in multiple ways within the Unicode terminology,[24] so an unambiguous count is not possible and there is no reason for strlen to attempt to return any such value. Most of the confusion is due to obsolete ASCII-era documentation using the term "character" when a fixed-size "byte" or "octet" was intended

IBM iSeries systems designate code page CCSID 13488 for UCS-2 character encoding, CCSID 1200 for UTF-16 encoding, and CCSID 1208 for UTF-8 encoding.[15]

UTF-16 is used by the Qualcomm BREW operating systems; the .NET environments; and the Qt cross-platform graphical widget toolkit.

Symbian OS used in Nokia S60 handsets and Sony Ericsson UIQ handsets uses UCS-2. iPhone handsets use UTF-16 for Short Message Service instead of UCS-2 described in the 3GPP TS 23.038 (GSM) and IS-637 (CDMA) standards.[16]

The Joliet file system, used in CD-ROM media, encodes file names using UCS-2BE (up to sixty-four Unicode characters per file name).

The Python language environment officially only uses UCS-2 internally since version 2.0, but the UTF-8 decoder to "Unicode" produces correct UTF-16. Since Python 2.2, "wide" builds of Unicode are supported which use UTF-32 instead;[17] these are primarily used on Linux. Python 3.3 no longer ever uses UTF-16, instead strings are stored in one of ASCII/Latin-1, UCS-2, or UTF-32, depending on which code points are in the string, with a UTF-8 version also included so that repeated conversions to UTF-8 are fast.[18]

Java originally used UCS-2, and added UTF-16 supplementary character support in J2SE 5.0.

In many languages quoted strings need a new syntax for quoting non-BMP characters, as the "uXXXX" syntax explicitly limits itself to 4 hex digits. The most common (used by C#, D and several other languages) is to use an upper-case 'U' with 8 hex digits such as "U0001D11E"[19] In Java 7 regular expressions and ICU and Perl, the syntax "x{1D11E}" must be used. In many other cases (such as Java outside of regular expressions)[20] the only way to get non-BMP characters is to enter the surrogate halves individually, for example: "uD834uDD1E" for U+1D11E.

These implementations all return the number of 16-bit code units rather than the number of Unicode code points when the equivalent of strlen() is used on their strings, and indexing into a string returns the indexed code unit, not the indexed code point,[21][22][23] this leads some people to claim that UTF-16 is not supported. However the term "character" is defined and used in multiple ways within the Unicode terminology,[24] so an unambiguous count is not possible and there is no reason for strlen to attempt to return any such value. Most of the confusion is due to obsolete ASCII-era documentation using the term "character" when a fixed-size "byte" or "octet" was intended

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

Sử dụng [sửa]UTF-16 được sử dụng cho văn bản trong API hệ điều hành Microsoft Windows 2000/XP/2003/Vista/7/8/CE. [11] Hệ thống Windows NT cũ (trước khi Windows 2000) chỉ hỗ trợ UCS-2. [12] trong Windows XP, không có điểm mã trên U + FFFF được bao gồm trong bất kỳ phông chữ chuyển giao với các cửa sổ cho ngôn ngữ châu Âu. [13] [14] tác phẩm và dữ liệu mạng có xu hướng là một kết hợp của UTF-16, UTF-8, và mã hóa byte di sản.IBM iSeries hệ thống chỉ định mã trang CCSID 13488 cho mã hóa, CCSID 1200 để mã hóa UTF-16, và CCSID 1208 cho UTF-8 mã hóa ký tự UCS-2. [15]UTF-16 được sử dụng bởi hệ điều hành Qualcomm BREW; môi trường .NET; và bộ công cụ Qt cross-nền tảng đồ họa widget.Hệ điều hành Symbian được sử dụng trong điện thoại di động Nokia S60 và điện thoại di động Sony Ericsson UIQ sử dụng UCS-2. điện thoại di động iPhone sử dụng UTF-16 cho dịch vụ tin nhắn ngắn thay vì UCS-2 được mô tả trong tiêu chuẩn 3GPP TS 23.038 (GSM) và IS-637 (CDMA). [16]Hệ thống tập tin Joliet, được sử dụng trong phương tiện truyền thông đĩa CD-ROM, mã hóa tên tập tin bằng cách sử dụng UCS-2BE (lên đến sáu mươi bốn Unicode ký tự cho tên tập tin).Môi trường ngôn ngữ Python chính thức chỉ sử dụng UCS-2 trong nội bộ kể từ phiên bản 2.0, nhưng các bộ giải mã UTF-8 để "Unicode" sản xuất chính xác UTF-16. Kể từ Python 2.2, "nhiều" xây dựng của Unicode được hỗ trợ sử dụng UTF-32 thay vào đó; [17] chúng chủ yếu được sử dụng trên Linux. Python 3.3 không bao giờ sử dụng UTF-16, thay vào đó chuỗi được lưu trữ trong một trong ASCII/Latin-1, UCS-2, và UTF-32, tùy thuộc vào những điểm mã đang trong chuỗi, với một phiên bản UTF-8 cũng bao gồm để lặp đi lặp lại chuyển đổi UTF-8 đang nhanh chóng. [18]Java được sử dụng UCS-2, và thêm UTF-16 ký tự bổ sung hỗ trợ J2SE 5.0.Trong nhiều ngôn ngữ trích dẫn dây cần một cú pháp mới cho trích dẫn các ký tự BMP, như cú pháp "uXXXX" một cách rõ ràng hạn chế bản thân để 4 chữ số hex. Phổ biến nhất (sử dụng C#, D và một số các ngôn ngữ khác) là sử dụng một-chữ 'U' với 8 chữ số hex chẳng hạn như "U0001D11E"[19] trong cụm từ Java 7 và ICU và Perl, cú pháp "x{1D11E}" phải được sử dụng. Trong nhiều trường hợp khác (chẳng hạn như Java bên ngoài của biểu thức thông thường) [20] cách duy nhất để có được nhân vật BMP là nhập nửa thay thế riêng, ví dụ: "uD834uDD1E" cho U + 1D11E.Những triển khai tất cả trở về số 16-bit mã đơn vị chứ không phải số điểm mã Unicode khi tương đương với strlen() được sử dụng trên của dây, và lập chỉ mục thành một chuỗi trả về các đơn vị được lập chỉ mục mã, không phải là mã được lập chỉ mục điểm, [21] [22] [23] điều này dẫn một số người yêu cầu bồi thường rằng UTF-16 không được hỗ trợ. Tuy nhiên, thuật ngữ "nhân vật" xác định và được sử dụng trong nhiều cách trong thuật ngữ Unicode, [24] do đó, một số rõ ràng là không thể và không có lý do để strlen để cố gắng để trở về bất kỳ giá trị như vậy. Hầu hết sự nhầm lẫn là do lỗi thời ASCII thời tài liệu hướng dẫn sử dụng thuật ngữ "nhân vật" khi một kích thước cố định "byte" hoặc "octet" được dự định

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.