The aim of the research done in this thesis was to extract disease and disorder namesfrom clinical texts. We utilized Conditional Random Fields (CRF) as the main method tolabel diseases and disorders in clinical sentences. We used some other tools such asMetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap toolwas used to identify names of diseases/disorders that are already in UMLSMetathesaurus. Some other important features such as lemmatized versions of words, andPOS tags were extracted using the Stanford Core NLP tool. Some more features wereextracted directly from UMLS Metathesaurus, including semantic types of words. Weparticipated in the SemEval 2014 competition's Task 7 and used its provided data to trainand evaluate our system. Training data contained 199 clinical texts, development datacontained 99 clinical texts, and the test data contained 133 clinical texts, these includeddischarge summaries, echocardiogram, radiology, and ECG reports. We obtainedcompetitive results on the disease/disorder name extraction task. We found throughablation study that while all features contributed, MetaMap matches, POS tags, andprevious and next words were the most effective features.
đang được dịch, vui lòng đợi..
