cynorr

Learn what I touched.

  博客园  :: 首页  :: 新随笔  :: 联系 :: 订阅 订阅  :: 管理

The basic structure of lda-c


corpus

  • docs[]
  • num_terms :The range or pages of words
  • num_docs :The amount of documents ?? value of word or value of length ?? deepth or range?

doc

  • words[] :(type:int) An integer representing certain word
  • counts[] :(type:int) The frequency of related word
  • length :The range of words in certain document
  • total :The amount of total words in certain document that is sum of frequency

lda-model

  • alpha :unknown
  • log_prob_w[NTOPICS][num_terms] log(ss->class_word[k][w]/ss->class_total[k]) prob: distribution of topics ~ words
  • num_topics :(NTOPICS) the amount of topics to be trained
  • num_terms :The range of words

ss - suffient statistics

  • class_word[NTOPICS][num_terms] prob: 1.0/random()
  • class_total[NTOPICS] :The sum of frequency of related class_word
  • alpha_suffstats
  • num_docs

var_gamma[docs][NTOPICS]

doc ~ topics

phi[max-corpus_length][NTOPICS]

word ~ topics

posted on 2014-11-07 22:46  cynorr  阅读(157)  评论(0编辑  收藏  举报