中文電子病歷命名實體識別（CNER）研究進展

阿新 • • 發佈：2021-01-29

中文電子病歷命名實體識別（CNER）研究進展

中文電子病歷命名實體識別（Chinese Clinical Named Entity Recognition, Chinese-CNER）任務目標是從給定的電子病歷純文字文件中識別並抽取出與醫學臨床相關的實體提及，並將它們歸類到預定義的類別。最近把之前收集整理的一些CNER相關的研究進展放在了github上。主要內容包括Chinese-CNER的相關論文列表，以及目前各個主要資料集上的一些先進結果，希望對CNER感興趣的讀者有所幫助。

github地址：https://github.com/lingluodlut/Chinese-BioNLP

中文電子病歷實體識別研究相關論文

在中文電子病歷實體識別任務上，已經有不少研究方法被提出，這些研究主要集中在對領域特徵的探索上，即在通用領域NER方法的基礎上，研究中文漢字特徵和電子病歷知識特徵等來提升模型效能。

綜述論文

電子病歷命名實體識別和實體關係抽取研究綜述. 楊錦鋒, 於秋濱, 關毅等. 自動化學報, 2014, 40(8):1537-1561.[paper]
中文電子病歷的命名實體識別研究進展. 楊飛洪,張宇,覃露等.中國數字醫學,2020,15(02):9-12. [paper]
Overview of CCKS 2018 Task 1: Named Entity Recognition in Chinese Electronic Medical Records. Zhang J, Li J, Jiao Z, et al. In China Conference on Knowledge Graph and Semantic Computing

, Springer, 2019:158-164. [paper]
Overview of the CCKS 2019 Knowledge Graph Evaluation Track: Entity, Relation, Event and QA. Han X, Wang Z, Zhang J, et al. arXiv preprint, 2020, arXiv:2003.03875. [paper]

方法論文

HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text. Hu J, Shi X, Liu Z, et al. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2017)

, Chendu, China, 2017:1-6. [paper].
Clinical named entity recognition from Chinese electronic health records via machine learning methods. Zhang Y, Wang X, Hou Z, et al. JMIR medical informatics. 2018;6(4):e50. [paper]
A BiLSTM-CRF Method to Chinese Electronic Medical Record Named Entity Recognition. Ji B, Liu R, Li S, et al. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, 2018:1-6.[paper]
A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records. Chowdhury S, Dong X, Qian L, et al. BMC bioinformatics. 2018, 19(17):75-84.[paper]
A Conditional Random Fields Approach to Clinical Name Entity Recognition. Yang X, Huang W. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2018). Tianjin, China, 2018:1-6.[paper]
DUTIR at the CCKS-2018 Task1: A Neural Network Ensemble Approach for Chinese Clinical Named Entity Recognition. Luo L, Li N, Li S, et al. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2018). Tianjin, China, 2018:1-6. [paper]
Incorporating dictionaries into deep neural networks for the chinese clinical named entity recognition. Wang Q, Zhou Y, Ruan T, et al. Journal of biomedical informatics, 2019, 92: 103133. [paper]
A hybrid approach for named entity recognition in Chinese electronic medical record. Ji B, Liu R, Li S, et al. BMC medical informatics and decision making. 2019 Apr;19(2):149-58. [paper]
Chinese Clinical Named Entity Recognition Using Residual Dilated Convolutional Neural Network with Conditional Random Field. Qiu J, Zhou Y, Wang Q, et al. IEEE Transactions on NanoBioscience. 2019, 18(3):306-315. [paper]
An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records. Li L, Zhao J, Hou L, et al. BMC medical informatics and decision making. 2019, 19(5):1-1. [paper]
Chinese clinical named entity recognition with word-level information incorporating dictionaries. Lu N, Zheng J, Wu W, et al. In 2019 International Joint Conference on Neural Networks (IJCNN), 2019,1-8. [paper]
Fine-tuning BERT for joint entity and relation extraction in Chinese medical text. Xue K, Zhou Y, Ma Z, et al. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2019, 892-897. [paper]
Chinese clinical named entity recognition with radical-level feature and self-attention mechanism. Yin M, Mou C, Xiong K, et al. Journal of biomedical informatics. 2019, 98:103289. [paper]
Adversarial training based lattice LSTM for Chinese clinical named entity recognition. Zhao S, Cai Z, Chen H, et al. Journal of biomedical informatics. 2019, 99:103290. [paper]
基於句子級 Lattice-長短記憶神經網路的中文電子病歷命名實體識別. 潘璀然, 王青華, 湯步洲等. 第二軍醫大學學報. 2019,40(05):497-507.[paper]
基於BERT與模型融合的醫療命名實體識別. 喬銳，楊笑然，黃文亢. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2019) [paper]
Noisy Label Learning for Chinese Medical Named Entity Recognition Based on Uncertainty Strategy. Li Z, Gan Z, Zhang B, et al. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2020) [paper]
基於BERT與字形字音特徵的醫療命名實體識別. 晏陽天, 趙新宇, 吳賢. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2020) [paper]
Cross domains adversarial learning for Chinese named entity recognition for online medical consultation. Wen G, Chen H, Li H, et al. Journal of Biomedical Informatics. 2020 Dec 1;112:103608. [paper]
Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree. Wang C, Wang H, Zhuang H, et al. Journal of Biomedical Informatics. 2020, 111:103583. [paper]
Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations. Li Y, Wang X, Hui L, et al. JMIR Medical Informatics. 2020;8(9):e19848. [paper]
Chinese clinical named entity recognition with variant neural structures based on BERT methods. Li X, Zhang H, Zhou XH. Journal of biomedical informatics. 2020, 107:103422. [paper]
融入語言模型和注意力機制的臨床電子病歷命名實體識別. 唐國強,高大啟,阮彤等. 電腦科學,2020,47(03):211-216.[paper]
基於筆畫ELMo和多工學習的中文電子病歷命名實體識別研究. 羅凌, 楊志豪, 宋雅文等. 計算機學報, 2020, 43(10): 1943-1957. [paper]

中文電子病歷實體識別現存方法效能

中文電子病歷實體識別任務的資料集以及相應資料集上系統模型效能表現。目前現存公開的中文電子病歷標註資料十分稀缺，為了推動CNER系統在中文臨床文字上的表現，中國知識圖譜與語義計算大會(China Conference on Knowledge Graph and Semantic Computing, CCKS)在近幾年都組織了面向中文電子病歷的命名實體識別評測任務，下面我們主要關注CCKS CNER資料集上的結果。

CCKS 2017
CCKS 2018
CCKS 2019
CCKS 2020

CCKS 2017

CCKS17資料集：原始資料集分為訓練集和測試集，其中訓練集包括300個醫療記錄，人工標註了五類實體(包括症狀和體徵、檢查和檢驗、疾病和診斷、治療、身體部位)。測試集包含100個醫療記錄。

語料資料統計

	症狀體徵	檢查檢驗	疾病診斷	治療	身體部位	總數
訓練集	7,831	9,546	722	1,048	10,719	29,866
測試集	2,311	3,143	553	465	3,021	9,493

現存方法效能比較 (%F值)

方法	症狀體徵	檢查檢驗	疾病診斷	治療	身體部位	總體	論文
HIT-CNER (Hu et al., 2017) Top1	96.00	94.43	78.97	81.47	87.48	91.14	HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text
BiLSTM-CRF-DIC (Wang et al., 2019)	-	-	-	-	-	91.24	Incorporating dictionaries into deep neural networks for the chinese clinical named entity recognition
RD-CNN-CRF (Qiu et al., 2019)	-	-	-	-	-	91.32	Chinese Clinical Named Entity Recognition Using Residual Dilated Convolutional Neural Network with Conditional Random Field
Tang et al. (2019)	-	-	-	-	-	91.34	融入語言模型和注意力機制的臨床電子病歷命名實體識別
PDET Feature in Model-II (Lu et al., 2019)	-	-	-	-	-	92.68	Chinese Clinical Named Entity Recognition with Word-Level Information Incorporating Dictionaries
BiLSTM-CRF-SP+ELMo (Luo et al., 2020)	95.37	94.94	81.13	83.32	88.74	91.75	基於筆畫ELMo和多工學習的中文電子病歷命名實體識別研究
FT-BERT + BiLSTM + CRF+Fea (Li et al., 2020)	96.57	94.09	81.26	82.62	88.37	91.60	Chinese clinical named entity recognition with variant neural structures based on BERT methods

注：Top表示當時評測的前三名系統方法。

CCKS 2018

CCKS18資料集：原始資料集包括訓練集和測試集．其中訓練集包括600個醫療記錄，人工標註了五類實體（包括解剖部位、症狀描述、獨立症狀、藥物、手術）。測試集包含400個醫療記錄原始資料。

語料資料統計

	解剖部位	症狀描述	獨立症狀	藥物	手術	總數
訓練集	9,472	2,484	3,712	1,221	1,329	18,218
測試集	6,339	918	1,327	813	735	10,132

現存方法效能比較 (%F值)

方法	解剖部位	症狀描述	獨立症狀	藥物	手術	總體	論文
Alihealth Lab (Yang and Huang) (2018) Top1	87.97	90.59	92.45	94.49	85.43	89.13	A Conditional Random Fields Approach to Clinical Name Entity Recognition
DUTIR (Luo et al., 2018) Top3	87.59	90.77	91.72	91.53	86.41	88.63	DUTIR at the CCKS-2018 Task1: A Neural Network Ensemble Approach for Chinese Clinical Named Entity Recognition
BiLSTM-CRF (Ji et al., 2018)	86.65	89.13	90.69	91.15	85.61	87.68	A BiLSTM-CRF Method to Chinese Electronic Medical Record Named Entity Recognition
Lattice-LSTM (潘璀然等人, 2019)	-	-	-	-	-	89.75	基於句子級 Lattice- 長短記憶神經網路的中文電子病歷命名實體識別
Attention-BiLSTM-CRF + all (Ji et al, 2019)	-	-	-	-	-	90.82	A hybrid approach for named entity recognition in Chinese electronic medical record
MSD_DT_NER (Wang et al., 2020)	88.01	92.57	90.71	94.58	85.62	89.88	Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree
BiLSTM-CRF-SP+ELMo (Luo et al., 2020)	89.69	91.83	92.01	91.30	86.22	90.05	基於筆畫ELMo和多工學習的中文電子病歷命名實體識別研究
FT-BERT + BiLSTM + CRF+Fea (Li et al., 2020)	89.12	90.66	92.94	87.99	87.59	89.56	Chinese clinical named entity recognition with variant neural structures based on BERT methods

注：Top表示當時評測的前三名系統方法。

CCKS 2019

CCKS19資料集：原始資料集包括訓練集和測試集．其中訓練集包括1000個醫療記錄，人工標註了六類實體（包括疾病和診斷、檢查、檢驗、手術、藥物、解剖部位）。測試集包含379個醫療記錄原始資料。

語料資料統計（唯一實體個數）

	疾病和診斷	檢查	檢驗	手術	藥物	解剖部位	總數
訓練集	2,116	222	318	765	456	1486	5,363
測試集	682	91	193	140	263	447	1,816

現存方法效能比較 (%F值)

方法	疾病和診斷	檢查	檢驗	手術	藥物	解剖部位	總體	論文
Alihealth (喬銳等人, 2019) Top1	84.29	86.29	76.94	83.33	96.02	86.18	85.62	基於BERT與模型融合的醫療命名實體識別
MSIIP (Liu et al., 2019) Top2	-	-	-	-	-	-	85.59	Team MSIIP at CCKS 2019 Task 1
DUTIR (Li et al., 2019) Top3	82.81	88.01	75.65	86.79	94.49	85.99	85.16	DUTIR at the CCKS-2019 Task 1: Improving Chinese clinical named entity recognition using stroke ELMo and transfer learning

注：Top表示當時評測的前三名系統方法。

CCKS 2020

CCKS20資料集：原始資料集包括訓練集和測試集．其中訓練集包括1050個醫療記錄，人工標註了六類實體（包括疾病和診斷、檢查、檢驗、手術、藥物、解剖部位）。測試集未公開。

語料資料統計

	疾病和診斷	檢查	檢驗	手術	藥物	解剖部位	總數
訓練集	4,345	1002	1297	923	1935	8811	18313

現存方法效能比較 (%F值)

方法	疾病和診斷	檢查	檢驗	手術	藥物	解剖部位	總體	論文
CASIA_Unisound (Li et al.,2020) Top1	90.93	89.96	85.94	94.85	93.56	91.62	91.56	Noisy Label Learning for Chinese Medical Named Entity Recognition Based on Uncertainty Strategy
TMAIL (晏陽天等人, 2020) Top2	90.53	88.47	83.50	96.21	93.75	92.00	91.54	基於BERT與字形字音特徵的醫療命名實體識別
ChiEHRBert (楊文明等人, 2020) Top3	91.10	88.62	85.71	95.52	92.93	91.16	91.24	基於 ChiEHRBert 與多模型融合的醫療命名實體識別

注：Top表示當時評測的前三名系統方法。

中文電子病歷命名實體識別（CNER）研究進展

中文電子病歷命名實體識別（CNER）研究進展

中文電子病歷實體識別研究相關論文

綜述論文

方法論文

中文電子病歷實體識別現存方法效能

CCKS 2017

CCKS 2018

CCKS 2019

CCKS 2020

中文電子病歷命名實體識別（CNER）研究進展

中文電子病例命名實體識別專案

神經網絡結構在命名實體識別（NER）中的應用

NLP入門（四）命名實體識別（NER）

神經網路結構在命名實體識別（NER）中的應用

NLP入門（五）用深度學習實現命名實體識別（NER）

命名實體識別（NER）的發展歷程

零基礎入門--中文命名實體識別（BiLSTM+CRF模型，含程式碼）

命名實體識別（biLSTM+crf）

NLP --- 命名體識別（NER）

中文命名實體識別之學習筆記一（詞性標註）

NLP之中文命名實體識別

基於CRF的中文命名實體識別模型

簡單NLP分析套路（2）----分詞，詞頻，命名實體識別與關鍵詞抽取

BiLSTM-CRF模型做基於字的中文命名實體識別

使用Stanford Word Segmenter and Stanford Named Entity Recognizer (NER)實現中文命名實體識別

BiLSTM介紹及中文命名實體識別應用

《中文電子病歷實體關係抽取研究》——筆記

命名實體識別訓練集彙總（一直更新）

實體命名識別（NER）任務中加詞典

中文電子病歷命名實體識別（CNER）研究進展

中文電子病歷命名實體識別（CNER）研究進展

中文電子病歷實體識別研究相關論文

綜述論文

方法論文

中文電子病歷實體識別現存方法效能

CCKS 2017

CCKS 2018

CCKS 2019

CCKS 2020

相關推薦