“语义”而非“语法”,弱化对词法分析的技术要求和提升爬虫的效率
COMPPUTER SCIENCE AN OVERVIEW 11th Edition
Finally we should note that XML allows the development of new markuplanguages that differ from HTML in that they emphasize semantics rather thanappearance. For example, with HTML the ingredients in a recipe can bemarked so that they appear as a list in which each ingredient is positioned on a separate line. But if we used semantic-oriented tags, ingredients in a recipecould be marked as ingredients (perhaps using the tags <ingredient> and </ingredient> ) rather than merely items in a list. The difference is subtlebut important. The semantic approach would allow search engines(Web sitesthat assist users in locating Web material pertaining to a subject of interest) toidentify recipes that contain or do not contain certain ingredients, whichwould be a substantial improvement over the current state of the art in whichonly recipes that do or do not contain certain words can be isolated. More pre-cisely, if semantic tags are used, a search engine can identify recipes forlasagna that do not contain spinach, whereas a similar search based merely onword content would skip over a recipe that started with the statement “Thislasagna does not contain spinach.” In turn, by using an Internet-wide standardfor marking documents according to semantics rather than appearance, aWorld Wide SemanticWeb, rather than the World Wide SyntacticWeb we havetoday, would be created.