xgqfrms™, xgqfrms® : xgqfrms's offical website of cnblogs! xgqfrms™, xgqfrms® : xgqfrms's offical website of GitHub!

How to use regular expression to match a special meta tag in html string using javascript All In One

How to use regular expression to match a special meta tag in html string using javascript All In One

meta tag

error ❌

const html = `
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta name="twitter:card" content="summary"/>
    <meta name="twitter:title" content="My Favorite Girlfriend"/>
    <meta name="twitter:site" content="@Hulu"/>
  </head>
</html>
`;

// regex groups
const result = [];
// html.match(/(^<meta name="twitter:url" content="[.]+"\/>$)/ig, (group) => {
// html.match(/(^<meta name="twitter:url" content="[\w+\s*]+"\/>$)/ig, (group) => {
// html.matchAll(/(^<meta name="twitter:url" content="([\w+\s?]+)"\/>$)/ig, (group) => {
html.matchAll(/(^<meta name="twitter:url" content="[\w+\s?]+"\/>$)/ig, (group) => {
  result.push(group);
});

console.log(`result`, result);

image

https://regexper.com/#%2F(^<meta name%3D"twitter%3Aurl" content%3D"[\w%2B\s%3F]%2B"\%2F>%24)%2Fig

solution ✅

const html = `
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta name="twitter:card" content="summary"/>
    <meta name="twitter:title" content="My Favorite Girlfriend"/> ✅
    <meta name="twitter:site" content="@Hulu"/>
    <meta name="twitter:description" content="A chef&#x27;s life gets complicated when he falls for a beautiful young woman who has multiple personalities."/>
    <meta property="og:title" content="My Favorite Girlfriend"/> ✅
    <meta property="og:site_name" content="Hulu"/>
    <meta property="og:type" content="movie"/>
  </head>
</html>
`;

// ✅
// let result = html.match(/<meta name="twitter:title" content="([^"]+)"\/>/)
// ✅
// let result = html.match(/<meta name="twitter:title" content="([\w+\s?]+)"\/>/)
// let result = html.match(/<meta name="twitter:title" content="([\w+\s?]+)"\/>/)[0]
let result = html.match(/<meta name="twitter:title" content="([\w+\s?]+)"\/>/)[1]

console.log(`result =`, result);
// result = My Favorite Girlfriend



// ❌
// const result = [];
// html.matchAll(/<meta name="twitter:title" content="([\w+\s?]+)"\/>/g, (group, i) => {
//   console.log(`group, i`, group, i)
//   result.push(group);
// })

// console.log(`result =`, result);


image

https://regexper.com/#%2F<meta name%3D"twitter%3Atitle" content%3D"([\w%2B\s%3F]%2B)"\%2F>%2F

image

https://regex101.com/r/QnWceA/1

image

demos

match

match(regexp)
const paragraph = 'The quick brown fox jumps over the lazy dog. It barked.';
const regex = /[A-Z]/g;
const found = paragraph.match(regex);

console.log(found);
// Array ["T", "I"]

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match

matchAll

matchAll(regexp)
// matchAll 使用错误,没有 callback 方法 ⚠️

const regexp = /foo[a-z]*/g;
const str = "table football, foosball";
const matches = str.matchAll(regexp);

for (const match of matches) {
  console.log(
    `Found ${match[0]} start=${match.index} end=${
      match.index + match[0].length
    }.`,
  );
}
// matches iterator is exhausted after the for...of iteration Call matchAll again to create a new iterator
Array.from(str.matchAll(regexp), (m) => m[0]);
// (2) ['football', 'foosball']

image

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll

Array.from

Array.from(arrayLike)
Array.from(arrayLike, mapFn)
Array.from(arrayLike, mapFn, thisArg)
console.log(Array.from('foo'));
//  Array ["f", "o", "o"]

console.log(Array.from([1, 2, 3], (x) => x + x));
// Array [2, 4, 6]

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/from

(🐞 反爬虫测试!打击盗版⚠️)如果你看到这个信息, 说明这是一篇剽窃的文章,请访问 https://www.cnblogs.com/xgqfrms/ 查看原创文章!

regex tools

  1. Regexper

https://regexper.com/#%2F^<meta name%3D"twitter%3Aurl" content%3D"([\w%2B\s%3F]%2B)"\%2F>%24%2Fig

image

  1. regex101

build, test, and debug regex

https://regex101.com/

元字符(Metacharacter)是拥有特殊含义的字符:

元字符 描述
. 查找单个字符,除了换行\r和行结束符\n
\w 查找单词字符: 数字、字母及下划线。
\W 查找非单词字符。
\d 查找数字
\D 查找非数字字符。
\s 查找空白字符
\S 查找非空白字符。
\b 匹配单词边界
\B 匹配非单词边界。
\0 查找 NULL 字符。
\n 查找换行符
\f 查找换页符。
\r 查找回车符
\t 查找制表符
\v 查找垂直制表符。
\xxx 查找以八进制数 xxx 规定的字符。
\xdd 查找以十六进制数 dd 规定的字符。
\uxxxx 查找以十六进制数 xxxx 规定的 Unicode 字符。

https://www.runoob.com/jsref/jsref-obj-regexp.html

. / regex dot

https://www.runoob.com/jsref/jsref-regexp-dot.html

refs

https://stackoverflow.com/questions/77338957/how-to-use-regular-expression-to-match-a-special-meta-tag-in-html-string-using-j#

https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not

https://www.cnblogs.com/xgqfrms/p/17780326.html



©xgqfrms 2012-2021

www.cnblogs.com/xgqfrms 发布文章使用:只允许注册用户才可以访问!

原创文章,版权所有©️xgqfrms, 禁止转载 🈲️,侵权必究⚠️!


posted @ 2023-10-22 15:26  xgqfrms  阅读(8)  评论(0编辑  收藏  举报