A Regularized Competition Model for Question Diffi culty Estimation in Community Question Answering Services-20160520

1、Information

publication:EMNLP 2014

author:Jing Liu(在前一篇sigir基础上,拓展模型的论文)

2、What

衡量CQA中问题的困难程度,提出从两个方向建模

1)利用Competition的比较:Competition Model
q = {ua q , q ub , ua ub , uo1 ≺ub , · · · , uoM ub } ,

2) question Text Similarities for QDE,相似程度的问题具有相似的描述。(冷启动问题)

3、Dataset

Stack Overflow:

是一个与程序相关的IT技术问答网站。

数据下载地址:

http://www.ics.uci.edu/~duboisc/stackoverflow/

  • qid: Unique question id
  • i: User id of questioner
  • qs: Score of the question
  • qt: Time of the question (in epoch time)
  • tags: a comma-separated list of the tags associated with the question. Examples of tags are ``html'', ``R'', ``mysql'', ``python'', and so on; often between two and six tags are used on each question.
  • qvc: Number of views of this question (at the time of the datadump)
  • qac: Number of answers for this question (at the time of the datadump)
  • aid: Unique answer id
  • j: User id of answerer
  • as: Score of the answer
  • at: Time of the answer

4、How

input: question user Competition,question-question的Competition,similarity.

output: pair compare result.

method:RCM

5、Evaluation:accuracy:ACC =# correctly judged question pairs/# all question pairs

baseline:pagerank,TS,CM

6、additional analysis

1)不同方式计算text similarity

2)estimate difficult sorce of cold start problem:KNN

3) 不同difficult level的text words 举例

7、conclusion

posted @ 2016-05-20 13:13  白婷  阅读(280)  评论(0编辑  收藏  举报