正则表达式去除html中的标签
正则表达式去除html中的标签
目的
题目的目的,换言之就是,用正则表达式提取html标签中的文字内容。
现有一份html文档的源码,是一份postdoc招聘信息,想通过正则表达式提取出其中关于招聘的信息。
首先,定位到了招聘信息内容所处的标签div,内容如下(其实语句是 “Postdoctoral Scholar in Translational Bioinformatics, Computational Biology for Precision Cancer Medicine”):
<div class="rich_media_content " id="js_content" style="visibility: hidden;">
<p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Postdoctoral Scholar in Translational Bioinformatics, Computational Biology for Precision Cancer Medicine</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">The </span><span style="color: rgb(171, 25, 66);max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Ruijiang Li </span></strong></span><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">lab at Stanford University School of Medicine is looking for a highly motivated postdoctoral scholar. The major focus of the lab is to develop, validate, and clinically translate diagnostic, prognostic, predictive biomarkers for precision cancer medicine. We integrate datasets of large patient populations and develop novel statistical and machine learning methods.</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Our lab is generously funded by 3 active NIH R01 grants. Our work has been published in top clinical journals such as JAMA Oncology, Annals of Surgery, Clinical Cancer Research, Radiology. Major awards to my postdoc trainees include the prestigious NIH K99/R00 Pathway to Independence Award, which provides with $1,000,000 over 5 years to establish an independent research program. The awardee has secured a tenure-track faculty position at MD Anderson Cancer Center. Please visit </span><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="color: rgb(61, 170, 214);max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">http://med.stanford.edu/lilab</span></strong></span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;" /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Candidates from a diverse background are encouraged to apply. The applicant may hold a PhD either in math, physical sciences or engineering with a strong motivation to solve biomedical problems, or in biomedical sciences with a strong interest to apply computational approaches. The ideal candidates will have strong analytic and computational skills, as well as prior research experience in cancer genomics, epigenomics, transcriptomics, or multi-omic data integration. Basic knowledge in molecular biology or tumor immunology is helpful.</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Stanford University is located at the heart of Silicon Valley, epicenter of the technology revolution in biomedicine. This is an excellent opportunity not only for those motivated to pursue an academic career, but also for those interested in entrepreneurship with the goal of commercialization and translation of new technology into clinical practice.</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Interested applicants should send a research statement, CV, and names of three references to:</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Ruijiang Li, PhD</span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><br /></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">Email:</span><span style="color: rgb(217, 33, 66);max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;">rli2@stanford.edu</span></strong></span></p><p style="line-height: 1.75em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;background-color: rgb(255, 255, 255);overflow-wrap: break-word;"><span style="color: rgb(217, 33, 66);max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="font-size: 15px;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><br /></span></strong></span></p><p style="text-indent: 0em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><span style="color: rgb(61, 170, 214);font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><strong style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">请在应聘材料上注明此职位信息来源于BioArt。</strong></span></p><p style="letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><br style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;" /></p><p style="letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;margin-right: 8px;margin-left: 8px;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><strong style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">温馨提示<em style="letter-spacing: 0.54px;font-family: -apple-system-font, system-ui, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">:</span></em></span></em></strong><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">BioArt原则上每年可为每个课题组免费发布一次博后招聘广告,博后广告请直接将word文档发送到</span></em><strong style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="color: rgb(217, 33, 66);-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">sinobioart@bioart.com.cn</span></em></span></strong><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">或加微信ID:</span></em><strong style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="color: rgb(217, 33, 66);-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><em style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;"><span style="font-size: 15px;-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;">bioartbusiness </span></em></span></strong>。</p><p style="text-indent: 0em;letter-spacing: 0.54px;font-family: -apple-system-font, BlinkMacSystemFont, "Helvetica Neue", "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei UI", "Microsoft YaHei", Arial, sans-serif;white-space: normal;min-height: 1em;max-width: 100%;box-sizing: border-box !important;overflow-wrap: break-word;"><br style="-ms-word-wrap: break-word !important;max-width: 100%;box-sizing: border-box !important;" /></p><section style="text-align: center;margin-right: 8px;margin-left: 8px;white-space: normal;"><img class="rich_pages" data-ratio="0.5244444444444445" data-type="png" data-w="900" data-s="300,640" data-src="https://mmbiz.qpic.cn/mmbiz_png/zO6xlS3tgcHuPkfM4BsYWV2yO5SfZ74tFljA68n6B6gLcsWWBkG1euFL5UvFSf2mcxhMfMHv4libLrzwJiatpADA/640?wx_fmt=png" /></section>
方法
利用sublime,正则表达式。
因为标签tag更具有规律性,所以通过正则表达式表示出所有的标签tag,然后再invert selection,即选中所有招聘信息内容。
操作:将上述内容粘贴到sublime中,ctrl F, 点亮正则表达式选项,然后输入<[^>]+>
,点击Find All, 然后Selection- Invert selection,即选中了所有招聘信息内容,复制粘贴到新文档中即可。为方便查看,再对结果中的html空格占位符
进行替换为空格即可。
过滤出来的内容如下:
Postdoctoral Scholar in Translational Bioinformatics, Computational Biology for Precision Cancer Medicine
The
Ruijiang Li
lab at Stanford University School of Medicine is looking for a highly motivated postdoctoral scholar. The major focus of the lab is to develop, validate, and clinically translate diagnostic, prognostic, predictive biomarkers for precision cancer medicine. We integrate datasets of large patient populations and develop novel statistical and machine learning methods.
Our lab is generously funded by 3 active NIH R01 grants. Our work has been published in top clinical journals such as JAMA Oncology, Annals of Surgery, Clinical Cancer Research, Radiology. Major awards to my postdoc trainees include the prestigious NIH K99/R00 Pathway to Independence Award, which provides with $1,000,000 over 5 years to establish an independent research program. The awardee has secured a tenure-track faculty position at MD Anderson Cancer Center. Please visit
http://med.stanford.edu/lilab
Candidates from a diverse background are encouraged to apply. The applicant may hold a PhD either in math, physical sciences or engineering with a strong motivation to solve biomedical problems, or in biomedical sciences with a strong interest to apply computational approaches. The ideal candidates will have strong analytic and computational skills, as well as prior research experience in cancer genomics, epigenomics, transcriptomics, or multi-omic data integration. Basic knowledge in molecular biology or tumor immunology is helpful.
Stanford University is located at the heart of Silicon Valley, epicenter of the technology revolution in biomedicine. This is an excellent opportunity not only for those motivated to pursue an academic career, but also for those interested in entrepreneurship with the goal of commercialization and translation of new technology into clinical practice.
Interested applicants should send a research statement, CV, and names of three references to:
Ruijiang Li, PhD
Email:
rli2@stanford.edu
请在应聘材料上注明此职位信息来源于BioArt。
温馨提示
:
BioArt原则上每年可为每个课题组免费发布一次博后招聘广告,博后广告请直接将word文档发送到
sinobioart@bioart.com.cn
或加微信ID:
bioartbusiness
。