提出一种使用段落自动聚类思想的自动文摘方法,首先利用词频统计和词的位置特征得到文档的关键词向量、每个段落的关键词向量,并建立以段落为基础的向量空间模型;然后计算各段落间的相似度,采用K-medoids 聚类算法实现文档语义段的划分,并通过一个自定义的目标函数来自适应的确定聚类数目K;最后根据在初始文档中的位置顺序从各语义段中选出与主题最相关的句子构成文摘。关键词:自动文摘 语义段划分 向量空间模型 聚类 K-medoidsStudy on Adaptive Clustering of Paragraphs in Automatic Summarization System Liu Haitao Lao Songyang Han Zhiguang (Department of Information System and Management, National University of Defence Technology, Changsha 410073) Abstract: Presents a useful automatic summarization method that uses automatic clustering thought. Firstly, the keyword vectors of a document and that of each paragraph of the document are got according to word frequency statistic and position feature. Based on paragraph, the vector space model for the whole article is established. Secondly, the similarity degree between paragraphs is calculated. The paragraphs of the document are classified into semantic paragraph by K-medoids clustering methods. K, the number of clusters, is determined by a self-defined objective function. Finally, according to their positions in the original document, the representative sentences are selected from each semantic paragraph to form the final summarization.Keyword:Automatic summarization; Semantic Paragraph Partition; Vector Space Model; Clustering; K-medoids
猜您喜欢
推荐内容
开源项目推荐 更多
热门活动
热门器件
用户搜过
随便看看
热门下载
热门文章
热门标签
评论