ISSN 2071-8594

Russian academy of sciences

Editor-in-Chief

Gennady Osipov

I.V. Sochenkov, R. Maluleka Query Formulation for Source Retrieval based on Named Entities and N-grams Extraction

Abstract.

This paper presents an approach for the source retrieval task using two distinct keyphrase extraction strategies, namely n-grams from chunked text and named entities. The proposed approach was evaluated on TIRA and performed well against other participants of PAN CLEF.

Keywords:

source retrieval, named entity extraction, plagiarism detection.

PP. 44-47.

REFERENCES

1. Potthast, M. et al. Overview of the 5th international competition on plagiarism detection. In: CLEF Conference on Multilingual and Multimodal Information Access Evaluation. CELCT. 2013, pp. 301-331.
2. Potthast, M. et al. Overview of the 4th International Competition on Plagiarism Detection. In: CLEF (Online Working Notes/Labs/Workshop). 2012.
3. Potthast, M. et al. Overview of the 6th International Competition on Plagiarism Detection. In: Working Notes Papers of the CLEF 2014 Evaluation Labs. Ed. by Cappellato, L. et al. CEUR Workshop Proceedings. CLEF and CEUR-WS.org, Sept. 2014.
4. Stamatatos, E. et al. Overview of the PAN/CLEF 2015 Evaluation Lab. In: Information Access Evaluation meets Multilinguality, Multimodality, and Visualization. 6th International Conference of the CLEF Initiative (CLEF 15). Springer, Berlin Heidelberg New York. 2015.
5. Williams, K. et al. Unsupervised Ranking for Plagiarism Source Retrieval. In: Notebook for PAN at CLEF 2013 (2013).
6. Elizalde, V. Using statistic and semantic analysis to detect plagiarism. In: CLEF (Online Working Notes/Labs/Workshop). 2013.
7. Potthast, M. et al. ChatNoir: a search engine for the ClueWeb09 corpus. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM. 2012, pp. 1004-1004.
8. Gollub, T., Stein, B., and Burrows, S. Ousting ivory tower research: towards a web framework for providing experiments as a service. In: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM. 2012, pp. 1125-1126.
9. Hagen, M., Potthast, M., and Stein, B. \Source Retrieval for Plagiarism Detection from Large Web Corpora: Recent Approaches". In: Working Notes Papers of the CLEF (2015), pp. 1613-0073.
10. Williams, K., Chen, H., and Giles, C. Supervised Ranking for Plagiarism Source Retrieval Notebook for PAN at CLEF 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) CLEF 2014 Evaluation Labs and Workshop - Working Notes Papers, 15-18 September, Sheffield, UK. CEUR Workshop Proceedings, CEUR- WS.org (Sept. 2014).