html parser html解析器 C语言 其他语言也有接口 java
下载:
git clone https://github.com/google/gumbo-parser.git
预先安装gcc等
sudo apt-get install libtool
$cd gumbo-parser/ $ ./autogen.sh $ ./configure $ make $ sudo make install
- 实例代码在examples下。make时会自动生成在gumbo-parser/目录下。
注意所以操作都在gumbo-parser/目录下。
自己可以修改示例重新生成。在gumbo-parser/目录下执行 make 程序名(不要后缀cc)。比如在examples/find_links.cc, 重新编译用 make find_links 即可。生成的可执行文件在根目录下。
- 自己集成编译的话,配置信息可以用命令pkg-config打出:
- $ pkg-config --cflags --libs gumbo
- $ gcc my_program.c `pkg-config --cflags --libs gumbo`
集成gtest也可以。用官方的 make check没成功。
git clone出gtest,进入目录。
sudo cmake CMakeLists.txt
make #执行make,生成两个静态库:libgtest.a libgtest_main.a
cp ./lib/libgtest*.a /usr/lib
测试代码:

#include<gtest/gtest.h> int add(inta,intb){ returna+b; } TEST(testCase,test0){ EXPECT_EQ(add(2,3),5); } int main(intargc,char**argv){ testing::InitGoogleTest(&argc,argv); returnRUN_ALL_TESTS(); } 作者:bowen_4ae0 链接:https://www.jianshu.com/p/96158afbb91d 来源:简书 著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
在该文件的终端输入编译指令:
$ g++ -o sample sample.cpp -lgtest -lpthread
$ ./sample
参考:https://www.jianshu.com/p/96158afbb91d
注意加载库的顺序很重要。pthread一定得放到末尾!!!
参考:https://github.com/google/gumbo-parser
示例代码修改成遍历出所有文本节点:
// Copyright 2013 Google Inc. All Rights Reserved. // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. // // Author: jdtang@google.com (Jonathan Tang) // // Finds the URLs of all links in the page. #include <stdlib.h> #include <fstream> #include <iostream> #include <string> #include "gumbo.h" static void search_for_links(GumboNode* node) { if (node->type != GUMBO_NODE_ELEMENT) { return; } GumboAttribute* href; if (node->v.element.tag == GUMBO_TAG_A && (href = gumbo_get_attribute(&node->v.element.attributes, "href"))) { std::cout << href->value << std::endl; } GumboVector* children = &node->v.element.children; for (unsigned int i = 0; i < children->length; ++i) { search_for_links(static_cast<GumboNode*>(children->data[i])); } } static void search_for_text(GumboNode* node) { if (node->type == GUMBO_NODE_TEXT) { std::cout << node->v.text.text << std::endl; } if (node->type == GUMBO_NODE_ELEMENT|| node->type == GUMBO_NODE_DOCUMENT|| node->type == GUMBO_NODE_TEMPLATE) { if(node->type == GUMBO_NODE_TEMPLATE){ std::cout << "=== GUMBO_NODE_TEMPLATE ===" << std::endl; } if(node->type == GUMBO_NODE_DOCUMENT){ std::cout << "=== GUMBO_NODE_DOCUMENT ===" << std::endl; } GumboVector* children = &node->v.element.children; for (unsigned int i = 0; i < children->length; ++i) { search_for_text(static_cast<GumboNode*>(children->data[i])); } } } int main(int argc, char** argv) { if (argc != 2) { std::cout << "Usage: find_links <html filename>.\n"; exit(EXIT_FAILURE); } const char* filename = argv[1]; std::ifstream in(filename, std::ios::in | std::ios::binary); if (!in) { std::cout << "File " << filename << " not found!\n"; exit(EXIT_FAILURE); } std::string contents; in.seekg(0, std::ios::end); contents.resize(in.tellg()); in.seekg(0, std::ios::beg); in.read(&contents[0], contents.size()); in.close(); GumboOutput* output = gumbo_parse(contents.c_str()); //search_for_links(output->root); search_for_text(output->root); gumbo_destroy_output(&kGumboDefaultOptions, output); }
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· AI与.NET技术实操系列(二):开始使用ML.NET
· 记一次.NET内存居高不下排查解决与启示
· 探究高空视频全景AR技术的实现原理
· 理解Rust引用及其生命周期标识(上)
· 浏览器原生「磁吸」效果!Anchor Positioning 锚点定位神器解析
· DeepSeek 开源周回顾「GitHub 热点速览」
· 物流快递公司核心技术能力-地址解析分单基础技术分享
· .NET 10首个预览版发布:重大改进与新特性概览!
· AI与.NET技术实操系列(二):开始使用ML.NET
· 单线程的Redis速度为什么快?
2016-11-24 JSF primefaces session view expired 会话失效后页面跳转