How to use regular expression to match a special meta tag in html string using javascript All In One
How to use regular expression to match a special meta tag in html string using javascript All In One
meta
tag
error ❌
const html = `
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="My Favorite Girlfriend"/>
<meta name="twitter:site" content="@Hulu"/>
</head>
</html>
`;
// regex groups
const result = [];
// html.match(/(^<meta name="twitter:url" content="[.]+"\/>$)/ig, (group) => {
// html.match(/(^<meta name="twitter:url" content="[\w+\s*]+"\/>$)/ig, (group) => {
// html.matchAll(/(^<meta name="twitter:url" content="([\w+\s?]+)"\/>$)/ig, (group) => {
html.matchAll(/(^<meta name="twitter:url" content="[\w+\s?]+"\/>$)/ig, (group) => {
result.push(group);
});
console.log(`result`, result);
https://regexper.com/#%2F(^<meta name%3D"twitter%3Aurl" content%3D"[\w%2B\s%3F]%2B"\%2F>%24)%2Fig
solution ✅
const html = `
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:title" content="My Favorite Girlfriend"/> ✅
<meta name="twitter:site" content="@Hulu"/>
<meta name="twitter:description" content="A chef's life gets complicated when he falls for a beautiful young woman who has multiple personalities."/>
<meta property="og:title" content="My Favorite Girlfriend"/> ✅
<meta property="og:site_name" content="Hulu"/>
<meta property="og:type" content="movie"/>
</head>
</html>
`;
// ✅
// let result = html.match(/<meta name="twitter:title" content="([^"]+)"\/>/)
// ✅
// let result = html.match(/<meta name="twitter:title" content="([\w+\s?]+)"\/>/)
// let result = html.match(/<meta name="twitter:title" content="([\w+\s?]+)"\/>/)[0]
let result = html.match(/<meta name="twitter:title" content="([\w+\s?]+)"\/>/)[1]
console.log(`result =`, result);
// result = My Favorite Girlfriend
// ❌
// const result = [];
// html.matchAll(/<meta name="twitter:title" content="([\w+\s?]+)"\/>/g, (group, i) => {
// console.log(`group, i`, group, i)
// result.push(group);
// })
// console.log(`result =`, result);
https://regexper.com/#%2F<meta name%3D"twitter%3Atitle" content%3D"([\w%2B\s%3F]%2B)"\%2F>%2F
https://regex101.com/r/QnWceA/1
demos
match
match(regexp)
const paragraph = 'The quick brown fox jumps over the lazy dog. It barked.';
const regex = /[A-Z]/g;
const found = paragraph.match(regex);
console.log(found);
// Array ["T", "I"]
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match
matchAll
matchAll(regexp)
// matchAll 使用错误,没有 callback 方法 ⚠️
const regexp = /foo[a-z]*/g;
const str = "table football, foosball";
const matches = str.matchAll(regexp);
for (const match of matches) {
console.log(
`Found ${match[0]} start=${match.index} end=${
match.index + match[0].length
}.`,
);
}
// matches iterator is exhausted after the for...of iteration Call matchAll again to create a new iterator
Array.from(str.matchAll(regexp), (m) => m[0]);
// (2) ['football', 'foosball']
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll
Array.from
Array.from(arrayLike)
Array.from(arrayLike, mapFn)
Array.from(arrayLike, mapFn, thisArg)
console.log(Array.from('foo'));
// Array ["f", "o", "o"]
console.log(Array.from([1, 2, 3], (x) => x + x));
// Array [2, 4, 6]
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from
(🐞 反爬虫测试!打击盗版⚠️)如果你看到这个信息, 说明这是一篇剽窃的文章,请访问 https://www.cnblogs.com/xgqfrms/ 查看原创文章!
regex tools
- Regexper
https://regexper.com/#%2F^<meta name%3D"twitter%3Aurl" content%3D"([\w%2B\s%3F]%2B)"\%2F>%24%2Fig
regex101
build, test, and
debug regex
元字符(Metacharacter
)是拥有特殊含义的字符:
元字符 | 描述 |
---|---|
. |
查找单个字符 ,除了换行\r 和行结束符\n 。 |
\w |
查找单词 字符: 数字、字母及下划线。 |
\W | 查找非单词字符。 |
\d |
查找数字 。 |
\D | 查找非数字字符。 |
\s |
查找空白字符 。 |
\S | 查找非空白字符。 |
\b |
匹配单词边界 。 |
\B | 匹配非单词边界。 |
\0 | 查找 NULL 字符。 |
\n |
查找换行符 。 |
\f | 查找换页符。 |
\r |
查找回车符 。 |
\t |
查找制表符 。 |
\v | 查找垂直制表符。 |
\xxx | 查找以八进制数 xxx 规定的字符。 |
\xdd | 查找以十六进制数 dd 规定的字符。 |
\uxxxx | 查找以十六进制数 xxxx 规定的 Unicode 字符。 |
https://www.runoob.com/jsref/jsref-obj-regexp.html
.
/ regex dot
https://www.runoob.com/jsref/jsref-regexp-dot.html
refs
https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not
https://www.cnblogs.com/xgqfrms/p/17780326.html
©xgqfrms 2012-2021
www.cnblogs.com/xgqfrms 发布文章使用:只允许注册用户才可以访问!
原创文章,版权所有©️xgqfrms, 禁止转载 🈲️,侵权必究⚠️!
本文首发于博客园,作者:xgqfrms,原文链接:https://www.cnblogs.com/xgqfrms/p/17780516.html
未经授权禁止转载,违者必究!