However, many of the observations and discussions included in this survey are also applicable to other types of text, such as news articles and encyclopedic articles.Ĭitation recommendation describes the task of recommending citations for a given text. We restrict ourselves to citation recommendation for scientific publications, as this document type has been studied the most in this area.
#Docear pdf software how to#
Last but not least, we shed light on the evaluation methods and outline general challenges in the evaluation and how to meet them. We then present an overview of the approaches and data sets for citation recommendation and identify differences and commonalities using various dimensions. In this article, we give a thorough introduction to automatic citation recommendation research. However, to the best of our knowledge, no literature survey has been conducted explicitly on citation recommendation. In recent years, several approaches and evaluation data sets have been presented. Due to the overload of published scientific works in recent years on the one hand, and the need to cite the most appropriate publications when writing scientific texts on the other hand, citation recommendation has emerged as an important research topic. Lastly, we have provided a set of 388.325 news articles clustered by topic and named-entity annotated.Ĭitation recommendation describes the task of recommending citations for a given text. It contains about 25.000 annotated sentences for each focus entity class (person, organization, location). Secondly, we have created an automatically generated corpus of annotated news articles that may be used for training machine learning models for named entity recognition. Firstly, we have provided a high performant custom trained model for machine learn- ing identification of Albanian language. Since information retrieval is strongly connected with natural language process- ing, this work includes results in this regard related to training / testing datasets. The implemented protototypes serve as proof-of-concept for further implementations of similar systems in industry. The experimental work makes use of articles of a scientific journal written in Albanian and news articles published online by various media.
In this disertation are reported various applications of information retrieval tech- niques in collections of documents written in Albanian.
However, even though the available information has increased considerably, it is still not easy to quickly identify relevant documents written in Albanian due to the lack of availability of appropriate tools that facilitate this. This considerable amount of information has made available numerous possibilities for analysts, researchers, legal workers, and any other interested parties. They range from news articles to legal documents, scientific publishings, multimedia (photo, video, audio) ones, etc. In the recent years the number of documents in Albanian published in internet has increased considerably.