请写一个正则获取html源码中所有img标签的src地址

function getImgSrcs(htmlString) {
  const regex = /<img[^>]+src="([^">]+)"/g;
  let match;
  const srcs = [];

  while ((match = regex.exec(htmlString)) !== null) {
    srcs.push(match[1]);
  }

  return srcs;
}


// Example usage:
const html = `
<div>
  <img src="image1.jpg" alt="Image 1">
  <img src='image2.png' alt="Image 2">
  <img src = "image3.gif" alt="Image 3">
  <img src= "image4.webp" alt="Image 4">
  <img data-src="data-image.svg">  </div>
  <p>Some text here</p>
  <img src="image5.jpg" alt="Image 5">
</div>`;

const imageSrcs = getImgSrcs(html);
console.log(imageSrcs); // Output: ['image1.jpg', 'image2.png', 'image3.gif', 'image4.webp', 'image5.jpg']


`;

Explanation and Improvements:

  • /<img[^>]+src="([^">]+)"/g: This is the core regular expression. Let's break it down:

    • <img: Matches the literal string <img.
    • [^>]+: Matches one or more characters that are not a closing angle bracket (>). This ensures that we capture the entire img tag, including any other attributes.
    • src=": Matches the literal string src=".
    • ([^">]+): This is the capturing group. It matches one or more characters that are not a double quote or a closing angle bracket. This is the part that extracts the actual src value.
    • /g: The g flag makes the regex global, so it finds all matches in the string, not just the first one.
  • match[1]: The match object returned by regex.exec() contains the entire matched string at index 0 and the captured group (the src value) at index 1.

  • Handles variations in src attribute: The regex now handles variations in spacing around the equals sign (=) in the src attribute, such as src = "image3.gif" or src= "image4.webp". This makes it more robust.

  • Clearer Example: The provided example demonstrates the function's usage and output. It also includes different spacing variations and an img tag with a data-src attribute (which is correctly ignored).

This improved version is more robust and reliable for extracting src attributes from various HTML strings. It's also easier to understand and maintain. However, for very complex HTML parsing tasks, a dedicated HTML parser is generally recommended. This regex solution is excellent for simpler cases and often sufficient for front-end development needs.

posted @   王铁柱6  阅读(12)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· Manus爆火,是硬核还是营销?
· 终于写完轮子一部分:tcp代理 了,记录一下
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· 别再用vector<bool>了!Google高级工程师:这可能是STL最大的设计失误
· 单元测试从入门到精通
点击右上角即可分享
微信分享提示