[DataMining]WEEK1 - text-retrieval and search engine

  • What does a computer have to do in order to understand a natural language sentence?
  • What is ambiguity?
  • Why is natural language processing (NLP) difficult for computers?
  • What is bag-of-words representation? Why do modern search engines use this simple representation of text?
  • What are the two modes of text information access? Which mode does a web search engine such as Google support?
  • When is browsing more useful than querying to help a user find relevant information?
  • Why is a text retrieval task defined as a ranking task?
  • What is a retrieval model?
  • What are the two assumptions made by the Probability Ranking Principle?
  • What is the Vector Space Retrieval Model? How does it work?
  • How do we define the dimensions of the Vector Space Model? What does “bag of words” representation mean?
  • What does the retrieval function intuitively capture when we instantiate a vector space model with bag of words representation and bit representation for documents and queries?
posted @ 2016-10-03 12:57  oDoraemon  阅读(238)  评论(0编辑  收藏  举报