ClojureScript实现xpath定位器生成-1

ClojureScript实现xpath定位器生成-1

ClojureScript实现xpath定位器生成-1

1 简介

从dom元素生成唯一的xpath定位器,有多种方法,最简单的就是从/html开始的全路径,如果生成比较健壮的xpath路径,则不容易实现,特别是动态网页,元素位置会发生动态变化的情况,自动生成的xpath定位器健壮性比较差。根据Robula+: An algorithm for generating robust XPath locators for web testing文章介绍的算法,参考robula-plus的实现,使用ClojureScript实现robula+算法,并进行相关优化。

算法的思想是比较简单的,从//*匹配任意元素开始,通过替换元素名,添加id,添加text,添加各种属性,添加多个属性集合,添加元素位置索引,添加父级定位的方式来查找唯一匹配元素的xpath定位器。通过不断递归父级进行查找就能找到最短的xpath定位器。

2 算法实现

content-script里面处理鼠标移动事件,标记当前指向的元素并获取元素的xpath定位器:

(def last-ele "最后指向的元素" (atom nil))

(defn remove-mark
  [ele]
  (when ele
    (dommy/remove-style! ele :box-shadow)
    (reset! last-ele nil)))

(defn mark-element
  [ele]
  (when ele
    (dommy/set-style! ele :box-shadow "0px 0px 2px 2px blue")
    (reset! last-ele ele)))

(defn handle-mouse-move
  [e]
  (let [x (.-clientX e)
        y (.-clientY e)]
    (if-let [ele (js/document.elementFromPoint  x y)]
      (let [prev-ele @last-ele]
        (when-not (= ele prev-ele)
          (remove-mark prev-ele)
          (mark-element ele)
          (log "x:" x " y:" y "elements xpath:" (get-robust-xpath js/document ele))))
      (remove-mark @last-ele))))

robula算法代码:

(ns xpath-finder.robula
  (:require [oops.core :refer [oget oset! ocall oapply ocall! oapply!
                               oget+ oset!+ ocall+ oapply+ ocall!+ oapply!+]]
            [clojure.string :as string]
            [goog.string :as gstring]
            [goog.string.format]
            [clojure.set :refer [difference]]
            [dommy.core :as dommy :refer-macros [sel sel1]]))

;;;;;;;;;;;;; xpath 辅助函数
(def xpath-length count)
(def xpath-head first)

(defn xpath-head-has-predicates?
  [xpath]
  (-> (xpath-head xpath)
      (string/includes? "[")))

(defn xpath-head-has-position?
  [xpath]
  (let [head (xpath-head xpath)]
    (or (string/includes? head "position()")
        (string/includes? head "last")
        (re-find #"\[[0-9]+\]" head))))

(defn xpath-head-has-text?
  [xpath]
  (-> (xpath-head xpath)
      (string/includes? "text()")))

(defn xpath-head-with-all?
  [xpath]
  (-> (xpath-head xpath)
      (string/starts-with? "*")))

(defn xpath-add-predicate-to-head
  [xpath predicate]
  (let [new-head (-> (first xpath)
                     (str predicate))]
    (assoc (into [] xpath) 0 new-head)))

(defn xpath-replace-head-all
  [xpath head-tag]
  (let [new-head (str head-tag
                      (-> (first xpath)
                          (subs 1)))]
    (assoc (into [] xpath) 0 new-head)))

(defn xpath-add-head-all
  [xpath]
  (cons "*" xpath))

(def xpath-empty [])
(def xpath-all ["*"])

(defn xpath->str
  [xpath]
  (->> (string/join "/" xpath)
       (str "//")))

(comment
  ;; 使用序列表示xpath,每一个元素表示xpath的一个层级(level),避免使用一个字符串表示整个xpath
  ;;  会因为字符串中出现/字符导致路径错误。
  (xpath-add-predicate-to-head ["div" "span" "a"] "[text() = 'aa']")

  (xpath-head-has-text? *1)

  (xpath-add-head-all empty-xpath)

  )

;;;;;;;;;;;;; dom helper
(defn get-previous-element-siblings
  "获取所有之前的兄弟元素"
  [element]
  (->> (iterate #(oget % "previousElementSibling") element)
       (take-while identity)))

(defn get-all-ancestor
  "获取所有父级元素,包括自身"
  [element]
  (->> (iterate #(oget % "parentElement") element)
       (take-while identity)))

(defn get-ancestor-count
  [element]
  (-> (get-all-ancestor element)
      count))

(defn get-ancestor-at
  [element index]
  (-> (get-all-ancestor element)
      (nth index)))

(defn get-attributes
  "获取元素的所有属性"
  [element]
  (let [get-attr-kv (fn [attr]
                      [(oget attr "name") (oget attr "value")])
        attrs (oget element "attributes")]
    (->> (range (oget attrs "length"))
         (map #(-> (oget+ attrs (str %))
                   get-attr-kv))
         (into {}))))

(defn tag-name
  [element]
  (-> (oget element "tagName")
      string/lower-case ))

;;;;;;;;;;;;; robula xpath locator
(def ^:dynamic attribute-priorization-list #{"name" "class" "title" "alt" "value"})
(def ^:dynamic attribute-black-list #{"href"
                                      "src"
                                      "onclick"
                                      "onload"
                                      "tabindex"
                                      "width"
                                      "height"
                                      "style"
                                      "size"
                                      "maxlength"})


(defn get-xpath-head-ancestor
  "获取xpath头部的祖先元素"
  [xpath element]
  (->> (xpath-length xpath)
       dec
       (get-ancestor-at element)))

(defn transf-convert-star
  "转换xpath的*表示"
  [xpath element]
  (let [ancestor (get-xpath-head-ancestor xpath element)]
    (when (xpath-head-with-all? xpath)
      (vector (xpath-replace-head-all
               xpath
               (tag-name ancestor))))))

(def ^:dynamic *max-text-length* 30)

(defn xpath-trans
  [k v]
  (if (string/includes? v "'")
    [(gstring/format "translate(%s,\"'\",\" \")" k)
     (string/replace v "'" " ")]
    [k v]))

(defn xpath-contains-trans
  [k v]
  (->> (xpath-trans k v)
       (apply gstring/format "contains(%s,'%s')")))

(defn make-xpath-text-exp
  "构造text表达式"
  [s]
  (let [text-fn-name (if (> (count s) *max-text-length*)
                       ;;注意xpath substring第一个位置为1,不是0
                       (gstring/format "substring(text(),1,%d)" *max-text-length*)
                       "text()")
        target-s (subs s 0 *max-text-length*)]
    (->> (xpath-contains-trans text-fn-name target-s)
         (gstring/format "[%s]"))))

(defn make-xpath-attr-predicate
  [attr-key value]
  (let [k (str "@" attr-key)]
    (if (> (count value) *max-text-length*)
      (->> (subs value 0 *max-text-length*)
           (xpath-contains-trans k))
      (->> (xpath-trans k value)
           (apply gstring/format "%s='%s'")))))

(defn make-xpath-attr-exp
  [attr-key value]
  (->> (make-xpath-attr-predicate attr-key value)
       (gstring/format "[%s]")))

(defn transf-add-id
  "添加id属性"
  [xpath element]
  (let [ancestor (get-xpath-head-ancestor xpath element)
        ancestor-id (oget ancestor "id")]
    (when (and (seq ancestor-id)
               (not (xpath-head-has-predicates? xpath)))
      (vector (->> (make-xpath-attr-exp "id" ancestor-id)
                   (xpath-add-predicate-to-head xpath))))))

(defn transf-add-text
  "添加text属性"
  [xpath element]
  (let [ancestor (get-xpath-head-ancestor xpath element)
        ancestor-text (oget ancestor "textContent")]
    (when (and (seq ancestor-text)
               (not (xpath-head-has-position? xpath))
               (not (xpath-head-has-text? xpath)))
      (vector (->> (make-xpath-text-exp ancestor-text)
                   (xpath-add-predicate-to-head xpath))))))

(defn transf-add-attribute
  "添加其他属性"
  [xpath element]
  (let [ancestor (get-xpath-head-ancestor xpath element)
        ancestor-attrs (get-attributes ancestor)]
    (when (not (xpath-head-has-predicates? xpath))
      (let [ancestor-priority-attrs (select-keys ancestor-attrs attribute-priorization-list)
            ancestor-other-attrs (->> (difference (set (keys ancestor-attrs))
                                                  attribute-priorization-list
                                                  attribute-black-list)
                                      (select-keys ancestor-attrs))]
        (->> (concat ancestor-priority-attrs ancestor-other-attrs)
             (map (fn [[k v]]
                    (->> (make-xpath-attr-exp k v)
                         (xpath-add-predicate-to-head xpath)))))))))

(defn powerset [coll]
  (reduce (fn [a x]
            (into a (map #(conj % x)) a))
          #{#{}} coll))

(defn priorization-attr-compare
  "优先属性比较"
  [priorization-set attr1 attr2]
  (cond
    (priorization-set (first attr1)) -1
    (priorization-set (first attr2)) 1
    :else 0))

(defn transf-add-attribute-set
  "添加其他属性集合(幂集)"
  [xpath element]
  (let [ancestor-attrs (-> (get-xpath-head-ancestor xpath element)
                           get-attributes)]
    (when (not (xpath-head-has-predicates? xpath))
      (let [ancestor-useful-attrs (->> (difference (set (keys ancestor-attrs))
                                                   attribute-black-list)
                                       (select-keys ancestor-attrs))
            attr-power-set (->> (powerset ancestor-useful-attrs)
                                (filter #(> (count %) 1)))
            priorization-set-val-cmp (->> (conj attribute-priorization-list "id")
                                          (partial priorization-attr-compare))
            sorted-attr-set (->> (map #(sort priorization-set-val-cmp %)
                                      attr-power-set)
                                 (sort (fn [attr-set1 attr-set2]
                                         (let [l1 (count attr-set1)
                                               l2 (count attr-set2)]
                                           (if (= l1 l2)
                                             ;; 如果长度相等,则按照优先级顺序排序
                                             (->> (zipmap attr-set1 attr-set2)
                                                  (drop-while #(= (first %) (second %)))
                                                  first
                                                  (apply priorization-set-val-cmp))
                                             (compare l1 l2))))))
            gen-attr-set-xpath (fn [attrs]
                                 (->> (map (fn [[k v]]
                                             (make-xpath-attr-predicate k v))
                                           attrs)
                                      (string/join " and ")
                                      (gstring/format "[%s]")
                                      (xpath-add-predicate-to-head xpath)))]
        (map gen-attr-set-xpath sorted-attr-set)))))


(defn transf-add-position
  "添加位置索引"
  [xpath element]
  (let [ancestor (get-xpath-head-ancestor xpath element)
        prev-siblings (get-previous-element-siblings ancestor)]
    (when (not (xpath-head-has-position? xpath))
      (let [idx (if (xpath-head-with-all? xpath)
                  (count prev-siblings)
                  (->> (map tag-name prev-siblings)
                       (filter #(= (tag-name ancestor) %))
                       count))]
        (when (pos? idx)
          (vector (->> (str "[" idx "]")
                       (xpath-add-predicate-to-head xpath))))))))

(defn transf-add-level
  "添加深度"
  [xpath element]
  (when (< (xpath-length xpath)
           (dec (get-ancestor-count element)))
    ;;(.log js/console "add level:" (str xpath) " ancestor count:" (get-ancestor-count element))
    (vector (xpath-add-head-all xpath))))

(defn locate-count
  [document xpath]
  (-> (.evaluate document
                 xpath
                 document
                 nil
                 (oget js/XPathResult "ORDERED_NODE_SNAPSHOT_TYPE")
                 nil)
      (oget "snapshotLength")))

(defn unique-locate?
  "唯一定位?"
  [xpath element document]
  ;;(.log js/console"check unique xpath:" xpath)
  (let [node-snap (.evaluate document
                             xpath
                             document
                             nil
                             (oget js/XPathResult "ORDERED_NODE_SNAPSHOT_TYPE")
                             nil)]
    (and (= 1 (oget node-snap "snapshotLength"))
         (identical? element (.snapshotItem node-snap 0)))))

(defn get-element-by-xpath
  [document xpath]
  (-> (.evaluate document
                 xpath
                 document
                 nil
                 (oget js/XPathResult "FIRST_ORDERED_NODE_TYPE")
                 nil)
      (oget ".singleNodeValue")))

(def $x (partial get-element-by-xpath js/document))

(defn find-xpath
  [xpath-list doc element]
  ;;(js/console.log "xpath-stream:" (str (map xpath->str xpath-list)))
  (when (seq xpath-list)
    (let [xpath (first xpath-list)
          new-xpath-list (concat (transf-convert-star xpath element)
                                 (transf-add-id xpath element)
                                 (transf-add-text xpath element)
                                 (transf-add-attribute xpath element)
                                 (transf-add-attribute-set xpath element)
                                 (transf-add-position xpath element)
                                 (transf-add-level xpath element))
          paths (-> new-xpath-list
                    distinct
                    vec)]
      (if-let [result (some #(when (-> (xpath->str %)
                                       (unique-locate? element doc))
                               %)
                            paths)]
        result
        (recur (-> xpath-list
                   rest
                   (concat paths)
                   vec)
               doc
               element)))))

(defn get-robust-xpath
  [document element]
  (let [r (time
           (find-xpath [xpath-all] document element))]
    (js/console.log "xpath depth:" (xpath-length r))
    (xpath->str r)
    ))

整个项目地址

3 运行测试

当xpath路径比较深的时候,查找速度就很慢,因为要不断执行evaluate进行xpath测试。

https://img2018.cnblogs.com/blog/1545892/202001/1545892-20200131161730583-2013314027.jpg

图1  chrome性能分析结果,黄色为GC执行时间,褐色为evaluate执行时间

https://img2018.cnblogs.com/blog/1545892/202001/1545892-20200131161732681-596315751.jpg

图2  find_xpath执行耗时(局部放大)

4 总结

因为robula+算法依靠不断地进行xpath测试来找到最短的xpath定位器,所以性能受到唯一定位的层级深度影响。当超过5层的时候,就需要几秒钟才能找到xpath定位器,性能不是很理想。

通过减少执行xpath测试的次数,或者使用css定位器进行查找,速度应该会更快一点,留到下一节实现。

作者: ntestoc

Created: 2020-01-31 五 16:17

posted @ 2020-01-31 15:41  cloca  阅读(370)  评论(0编辑  收藏  举报