org.apache.commons.text.similarity

org.apache.commons.text.similarity下的package-info.java

/**
 * <p>Provides algorithms for string similarity.</p>
 *
 * <p>The algorithms that implement the EditDistance interface follow the same
 * simple principle: the more similar (closer) strings are, lower is the distance.
 * For example, the words house and hose are closer than house and trousers.</p>
 *
 * <p>The following algorithms are available at the moment:</p>
 *
 * <ul>
 * <li>{@link org.apache.commons.text.similarity.CosineDistance Cosine Distance}</li>
 * <li>{@link org.apache.commons.text.similarity.CosineSimilarity Cosine Similarity}</li>
 * <li>{@link org.apache.commons.text.similarity.FuzzyScore Fuzzy Score}</li>
 * <li>{@link org.apache.commons.text.similarity.HammingDistance Hamming Distance}</li>
 * <li>{@link org.apache.commons.text.similarity.JaroWinklerDistance Jaro-Winkler Distance}</li>
 * <li>{@link org.apache.commons.text.similarity.JaroWinklerSimilarity Jaro-Winkler Similarity}</li>
 * <li>{@link org.apache.commons.text.similarity.LevenshteinDistance Levenshtein Distance}</li>
 * <li>{@link org.apache.commons.text.similarity.LongestCommonSubsequenceDistance
 * Longest Common Subsequence Distance}</li>
 * </ul>
 *
 * <p>The {@link org.apache.commons.text.similarity.CosineDistance Cosine Distance}
 * utilises a {@link org.apache.commons.text.similarity.RegexTokenizer regular expression tokenizer (\w+)}.
 * And the {@link org.apache.commons.text.similarity.LevenshteinDistance Levenshtein Distance}'s
 * behavior can be changed to take into consideration a maximum throughput.</p>
 *
 * @since 1.0
 */
package org.apache.commons.text.similarity;
  • CosineDistance 余弦距离
  • CosineSimilarity 余弦相似度
  • FuzzyScore 模糊评分
  • HammingDistance 汉明距离
  • JaroWinklerDistance Jaro-Winkler距离
  • JaroWinklerSimilarity Jaro-Winkler相似度
  • LevenshteinDistance 莱文斯坦距离
  • LongestCommonSubsequenceDistance 最长公共子序列距离

总结:距离越短,相似度越高

posted @ 2023-03-16 21:58  干翻苍穹  阅读(380)  评论(0编辑  收藏  举报