AboutA Part-Of-Speech Tagger (POS T

About

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. This software is a Java implementation of the log-linear part-of-speech taggers described in these papers (if citing just one paper, cite the 2003 one):

Kristina Toutanova and Christopher D. Manning. 2000. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63-70.

Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.

The tagger was originally written by Kristina Toutanova. Since that time, Dan Klein, Christopher Manning, William Morgan, Anna Rafferty, Michel Galley, and John Bauer have improved its speed, performance, usability, and support for other languages.

The system requires Java 1.8+ to be installed. Depending on whether you're running 32 or 64 bit Java and the complexity of the tagger model, you'll need somewhere between 60 and 200 MB of memory to run a trained tagger (i.e., you may need to give java an option like java -mx200m). Plenty of memory is needed to train a tagger. It again depends on the complexity of the model but at least 1GB is usually needed, often more.

Several downloads are available. The basic download contains two trained tagger models for English. The full download contains three trained English tagger models, an Arabic tagger model, a Chinese tagger model, a French tagger model, and a German tagger model. Both versions include the same source and other required files. The tagger can be retrained on any language, given POS-annotated training text for the language.

Part-of-speech name abbreviations: The English taggers use the Penn Treebank tag set. Here are some links to documentation of the Penn Treebank English POS tag set: 1993 Computational Linguistics article in PDF, AMALGAM page, Aoife Cahill's list. See the included README-Models.txt in the models directory for more information about the tagsets for the other languages.

The tagger is licensed under the GNU General Public License (v2 or later). Source is included. The package includes components for command-line invocation, running as a server, and a Java API. The tagger code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing is available. If you don't need a commercial license, but would like to support maintenance of these tools, we welcome gift funding.

About

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. This software is a Java implementation of the log-linear part-of-speech taggers described in these papers (if citing just one paper, cite the 2003 one):

Kristina Toutanova and Christopher D. Manning. 2000. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000), pp. 63-70.

Kristina Toutanova, Dan Klein, Christopher Manning, and Yoram Singer. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLT-NAACL 2003, pp. 252-259.

The tagger was originally written by Kristina Toutanova. Since that time, Dan Klein, Christopher Manning, William Morgan, Anna Rafferty, Michel Galley, and John Bauer have improved its speed, performance, usability, and support for other languages.

The system requires Java 1.8+ to be installed. Depending on whether you're running 32 or 64 bit Java and the complexity of the tagger model, you'll need somewhere between 60 and 200 MB of memory to run a trained tagger (i.e., you may need to give java an option like java -mx200m). Plenty of memory is needed to train a tagger. It again depends on the complexity of the model but at least 1GB is usually needed, often more.

Several downloads are available. The basic download contains two trained tagger models for English. The full download contains three trained English tagger models, an Arabic tagger model, a Chinese tagger model, a French tagger model, and a German tagger model. Both versions include the same source and other required files. The tagger can be retrained on any language, given POS-annotated training text for the language.

Part-of-speech name abbreviations: The English taggers use the Penn Treebank tag set. Here are some links to documentation of the Penn Treebank English POS tag set: 1993 Computational Linguistics article in PDF, AMALGAM page, Aoife Cahill's list. See the included README-Models.txt in the models directory for more information about the tagsets for the other languages.

The tagger is licensed under the GNU General Public License (v2 or later). Source is included. The package includes components for command-line invocation, running as a server, and a Java API. The tagger code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing is available. If you don't need a commercial license, but would like to support maintenance of these tools, we welcome gift funding.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

VềMột phần của bài phát biểu Tagger (POS Tagger) là một phần của phần mềm đọc văn bản trong một số ngôn ngữ và gán các phần của bài phát biểu cho mỗi từ (và mã thông báo), chẳng hạn như danh từ, động từ, tính từ, vv, mặc dù thường tính toán ứng dụng sử dụng các thẻ POS thêm hạt mịn như 'danh từ số nhiều. Phần mềm này là một thực hiện Java của taggers một phần của bài phát biểu đăng nhập tuyến tính được mô tả trong các giấy tờ (nếu trích dẫn chỉ một giấy, trích dẫn các năm 2003 một): Kristina Toutanova và Christopher D. Manning. 2000. làm giàu nguồn kiến thức được sử dụng trong một phần của bài phát biểu tối đa Entropy Tagger. Trong thủ tục tố tụng của hội nghị liên SIGDAT trên các phương pháp thực nghiệm trong xử lý ngôn ngữ tự nhiên và rất lớn Corpora (EMNLP/VLC-2000), tr. 63-70. Kristina Toutanova, Dan Klein, Christopher Manning và ca sĩ Yoram. 2003. giàu tính năng phần-of-Speech gắn với một mạng lưới phụ thuộc nhóm Cyclic. Trong thủ tục tố tụng của HLT-NAACL năm 2003, tr. 252-259. Tagger đã được ban đầu được viết bởi Kristina Toutanova. Kể từ đó, Dan Klein, Christopher Manning, William Morgan, Anna Rafferty, Michel Galley và John Bauer đã cải thiện tốc độ của nó, hiệu suất, khả năng sử dụng, và hỗ trợ cho các ngôn ngữ khác.The system requires Java 1.8+ to be installed. Depending on whether you're running 32 or 64 bit Java and the complexity of the tagger model, you'll need somewhere between 60 and 200 MB of memory to run a trained tagger (i.e., you may need to give java an option like java -mx200m). Plenty of memory is needed to train a tagger. It again depends on the complexity of the model but at least 1GB is usually needed, often more.Several downloads are available. The basic download contains two trained tagger models for English. The full download contains three trained English tagger models, an Arabic tagger model, a Chinese tagger model, a French tagger model, and a German tagger model. Both versions include the same source and other required files. The tagger can be retrained on any language, given POS-annotated training text for the language.Part-of-speech name abbreviations: The English taggers use the Penn Treebank tag set. Here are some links to documentation of the Penn Treebank English POS tag set: 1993 Computational Linguistics article in PDF, AMALGAM page, Aoife Cahill's list. See the included README-Models.txt in the models directory for more information about the tagsets for the other languages.The tagger is licensed under the GNU General Public License (v2 or later). Source is included. The package includes components for command-line invocation, running as a server, and a Java API. The tagger code is dual licensed (in a similar manner to MySQL, etc.). Open source licensing is under the full GPL, which allows many free uses. For distributors of proprietary software, commercial licensing is available. If you don't need a commercial license, but would like to support maintenance of these tools, we welcome gift funding.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

Về A Part-Of-Speech Tagger (POS Tagger) là một phần của phần mềm đọc văn bản trong một số ngôn ngữ và giao cho các bộ phận của bài phát biểu với mỗi từ (và mã thông báo khác), chẳng hạn như danh từ, động từ, tính từ, vv, mặc dù thường ứng dụng tính toán sử dụng thẻ POS nhiều hạt mịn như 'danh từ số nhiều-". Phần mềm này là một thực hiện Java của phần-of-speech người gắn thẻ log tuyến tính mô tả trong các giấy tờ (nếu trích dẫn chỉ là một tờ giấy, trích dẫn một 2003): Kristina Toutanova và Christopher D. Manning. 2000. Làm giàu các nguồn kiến thức sử dụng trong một Maximum Entropy Part-of-Speech Tagger. Trong Kỷ yếu của Hội nghị SIGDAT chung về thực nghiệm phương pháp trong xử lý ngôn ngữ tự nhiên và Very Large Corpora (EMNLP / VLC-2000), pp. 63-70. Kristina Toutanova, Dan Klein, Christopher Manning, và Yoram Singer. 2003. Tính năng-Rich Part-of-Speech Tagging với một phụ thuộc Mạng Cyclic. Trong Kỷ yếu của HLT-NAACL 2003, tr. 252-259. Các tagger ban đầu được viết bởi Kristina Toutanova. Kể từ thời điểm đó, Dan Klein, Christopher Manning, William Morgan, Anna Rafferty, Michel Galley, và John Bauer đã được cải thiện của nó tốc độ, hiệu suất, khả năng sử dụng, và hỗ trợ cho các ngôn ngữ khác. Hệ thống yêu cầu Java 1.8+ được cài đặt. Tùy thuộc vào việc bạn đang chạy 32 hoặc 64 bit Java và sự phức tạp của mô hình tagger, bạn sẽ cần một nơi nào giữa 60 và 200 MB bộ nhớ để chạy một tagger đào tạo (ví dụ, bạn có thể cần phải cung cấp cho java một tùy chọn như java -mx200m). Rất nhiều bộ nhớ là cần thiết để đào tạo một người rượt bắt. Nó lại phụ thuộc vào sự phức tạp của mô hình nhưng ít nhất 1GB thường là cần thiết, thường xuyên hơn. Một số tải có sẵn. Việc tải về cơ bản bao gồm hai mô hình tagger được đào tạo tiếng Anh. Đầy đủ tải về có chứa ba mô hình tagger tiếng Anh được đào tạo, một mô hình rập tagger, một mô hình tagger Trung Quốc, một mô hình tagger Pháp, và một mô hình tagger Đức. Cả hai phiên bản bao gồm cùng một nguồn và các tập tin cần thiết khác. Các tagger có thể được đào tạo lại về ngôn ngữ nào, cho văn bản đào tạo POS-chú thích cho ngôn ngữ. Phần-of-speech chữ viết tắt tên: Những người gắn thẻ tiếng Anh sử dụng thẻ bộ Penn Treebank. Dưới đây là một số liên kết đến các tài liệu của thẻ bộ Penn Treebank tiếng Anh POS: 1993 bài viết Computational Linguistics trong PDF, trang hỗn hợp, danh sách Aoife Cahill. Xem bao gồm README-Models.txt trong thư mục mô hình để biết thêm thông tin về các tagsets cho các ngôn ngữ khác. Các tagger được cấp phép theo Giấy phép GNU General Public (v2 hoặc sau đó). Nguồn được bao gồm. Các gói phần mềm bao gồm các thành phần cho dòng lệnh gọi, chạy như một máy chủ, và một API Java. Mã tagger được cấp phép kép (một cách tương tự như MySQL, vv). Cấp giấy phép mã nguồn mở đang được đầy đủ giấy phép GPL, cho phép nhiều người sử dụng miễn phí. Đối với các nhà phân phối phần mềm độc quyền, cấp phép thương mại có sẵn. Nếu bạn không cần phải có giấy phép thương mại, nhưng muốn hỗ trợ duy trì các công cụ, chúng tôi chào đón tài trợ quà tặng.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.