org.apache.commons.text.similarity
org.apache.commons.text.similarity下的package-info.java
/**
* <p>Provides algorithms for string similarity.</p>
*
* <p>The algorithms that implement the EditDistance interface follow the same
* simple principle: the more similar (closer) strings are, lower is the distance.
* For example, the words house and hose are closer than house and trousers.</p>
*
* <p>The following algorithms are available at the moment:</p>
*
* <ul>
* <li>{@link org.apache.commons.text.similarity.CosineDistance Cosine Distance}</li>
* <li>{@link org.apache.commons.text.similarity.CosineSimilarity Cosine Similarity}</li>
* <li>{@link org.apache.commons.text.similarity.FuzzyScore Fuzzy Score}</li>
* <li>{@link org.apache.commons.text.similarity.HammingDistance Hamming Distance}</li>
* <li>{@link org.apache.commons.text.similarity.JaroWinklerDistance Jaro-Winkler Distance}</li>
* <li>{@link org.apache.commons.text.similarity.JaroWinklerSimilarity Jaro-Winkler Similarity}</li>
* <li>{@link org.apache.commons.text.similarity.LevenshteinDistance Levenshtein Distance}</li>
* <li>{@link org.apache.commons.text.similarity.LongestCommonSubsequenceDistance
* Longest Common Subsequence Distance}</li>
* </ul>
*
* <p>The {@link org.apache.commons.text.similarity.CosineDistance Cosine Distance}
* utilises a {@link org.apache.commons.text.similarity.RegexTokenizer regular expression tokenizer (\w+)}.
* And the {@link org.apache.commons.text.similarity.LevenshteinDistance Levenshtein Distance}'s
* behavior can be changed to take into consideration a maximum throughput.</p>
*
* @since 1.0
*/
package org.apache.commons.text.similarity;
- CosineDistance 余弦距离
- CosineSimilarity 余弦相似度
- FuzzyScore 模糊评分
- HammingDistance 汉明距离
- JaroWinklerDistance Jaro-Winkler距离
- JaroWinklerSimilarity Jaro-Winkler相似度
- LevenshteinDistance 莱文斯坦距离
- LongestCommonSubsequenceDistance 最长公共子序列距离
总结:距离越短,相似度越高