PDF Summarization in Python

PDF Summarization in Python

Abstract:

Existing academic search engines provide information on the search results as snippets which obtained from abstracts. Users have to read the PDF document to get more detailed information. Sometimes, users need more rich snippets, which also provide data from the entire PDF document. Therefore, an academic metasearch engine (Academicopter) was developed, which is focused on giving those rich snippets. Academicopter automatically summarizes PDF content of scientific journals to provide more information to users. The summarization is implemented by using a modified graph-based summarization. The summarization is also added metadata formula features for title, keywords and abstract. The results of experimental studies using data sources obtained from Google Scholar and Microsoft Academic show that Academicopter succeeded in combining search results from both data sources into a single list of search results. The ranking is the combination of both sources. Academicopter also removes duplicate data between the two data sources by displaying only one data.