Son Güncelleme:

21/08/2020 - 15:07

Üniversitemiz öğretim üyelerinden Prof. Dr. Deniz Zeyrek Bozşahin’in yazarları arasında bulunduğu “TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style” başlıklı makale Language Resources and Evaluation’da yayınlandı.

TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks—which led us to annotate Hypophora, and the decision to avoid projection. We report our annotation consistency, and post-annotation alignment experiments, and provide a cross-lingual comparison based on corpus statistics.


Zeyrek, D., Mendes, A., Grishina, Y., Kurfalı, M., Gibbon, S., & Ogrodniczuk, M. (2020). TED multilingual discourse bank (TED-MDB): A parallel corpus annotated in the PDTB style. Language Resources and Evaluation, 54(2), 587-613. doi:10.1007/s10579-019-09445-9

 

Makaleye erişim için: https://link.springer.com/article/10.1007/s10579-019-09445-9


ODTÜ Yazarı

Prof. Dr. Deniz Zeyrek Bozşahin

Web of Science/Publons Araştırmacı Kimliği: M-8082-2017
dezeyrek@metu.edu.tr Scopus Yazar Kimliği: 36055536300
Yazar Hakkında ORCID: 0000-0001-9248-0141

Etiketler/Anahtar sözcükler:

Annotation, Corpus creation, Discourse, Discourse relations, Multilingual corpus


Diğer Yazarlar:
Mendes, A., Grishina, Y., Kurfalı, M. (ODTÜ), Gibbon, S., & Ogrodniczuk, M.


Ek Bilgiler:
We thank our annotators (Robin Goodfellow Malamud, Robin Schäfer, Olha Zolotarenko, Nuno Martins, Aida Cardoso, Celina Heliasz, Joanna Bilińska, Daniel Ziembicki, İpek Süsoy). The research has been partially supported by Textlink, by the Scientific and Technological Research Council of Turkey—BIDEB-2219 Postdoctoral Research program, by the Polish National Science Centre (Contract Number 2014/15/B/HS2/03435) and by the FCT—Fundação para a Ciência e a Tecnologia (project ID: PEst-OE/LIN/UI0214/2013). The support of Bonnie Webber and Manfred Stede is greatly acknowledged though all errors are our own.