报告题目:Image Understanding of Figures in Biomedical Literature
报告时间:2019年6月6日 上午9:00
报告地点:太阳成集团tyc4633A521
报告人:许东 教授
报告人简介:
Dong Xu is Shumaker Endowed Professor in Department of Electrical Engineering and Computer Science, Director of Information Technology Program, with appointments in the Christopher S. Bond Life Sciences Center and the Informatics Institute at the University of Missouri-Columbia. He obtained his PhD from the University of Illinois, Urbana-Champaign in 1995 and did two years of postdoctoral work at the US National Cancer Institute. He was a Staff Scientist at Oak Ridge National Laboratory until 2003 before joining the University of Missouri, where he served as Department Chair of Computer Science during 2007-2016. His research is in computational biology and bioinformatics, including machine-learning application in bioinformatics, protein structure prediction, post-translational modification prediction, high-throughput biological data analyses, in silico studies of plants, microbes and cancers, biological information systems, and mobile App development for healthcare. He has published more than 300 papers. He was elected to the rank of American Association for the Advancement of Science (AAAS) Fellow in 2015.
报告内容简介:
Figures in the scientific literature contain rich information. For example, many new molecular mechanisms of genomics, pharmacogenomics, immunology, and other fields are reflected in pathway figures and need to be curated for various applications, especially in precision medicine. Current manual curation approaches are inadequate in keeping up with the pace of biomedical literature growth. Compared with textual representations, pathway figures in biomedical literature often contain more direct representations of the mechanisms. However, no systematic method for curating pathway figures exists in publications. Here, we propose a pathway curation pipeline, which integrates a deep learning model with an optical character recognition method and an image processing strategy to capture the locations, names, and interactions of pathway entities in the figure. Our pipeline was evaluated on the figures from PubMed publications. The results demonstrate that our model can effectively retrieve molecular entities and their interactions from pathway figures at a large scale. The proposed pipeline provides an alternative way to text-mining approaches in biological literature mining. In future work, we will combine our method with text-mining tools to enrich extracted information and reconstruct pathway mechanisms fully.
主办单位:
太阳成集团tyc4633(中国)有限公司-百度百科
吉林大学软件学院
吉林大学计算机科学技术研究所
符号计算与知识工程教育部重点实验室
吉林大学国家级计算机实验教学示范中心