The Parsed Old and Middle Irish Corpus (POMIC) is a corpus of Irish texts spanning the years from c. 700 to c. 1100. The current beta-version of the corpus consists of 14 texts which have been POS-tagged and syntactically parsed. The corpus is, however, a work in progress and future additions are envisioned which will include texts written at the end of the Middle Irish period (up to around 1200) as well as very early legal material that may, in some cases, be dated to the 7th century.
The corpus files may be searched using the corpus query software CorpusSearch developed by Beth Randall. I include in the download list below a current .jar file in order to run CorpusSearch # java -classpath CS_2.003.04.jar csearch/CorpusSearch
Corpus Manual
The annotation scheme adopted for the corpus is described in the manual (found below in the download list). This manual was developed as an adaptation of the manual (Release 2, 2010) for the Penn Corpora of Historical English written by Beatrice Santorini. The manual for POMIC is at present incomplete, but this will be rectified in future updates. I have tried to follow the Penn manual as closely as possible, but I have deviated from the Penn manual in order to show how POMIC differs from the Penn corpora.
Downloads
In order to use the corpus you can download the following corpus text files (.psd) (encoded in Mac OS Roman, at present). As well as the current (incomplete) corpus manual (.pdf).
I would like to thank Beatrix Färber (UCC, History Dept.) and Dr Hugh Fogarty (formerly of UCD, editor of TLH) for allowing me to use some of the texts from the CELT and TLH databases respectively in order to create POS-tagged and syntactically parsed versions. I would also like to thank Professor Liam Breatnach of the School of Celtic Studies for guidance in various matters relating to the corpus.
This work was done while holding an O’Donovan Scholarship at the School of Celtic Studies in the Dublin Institute for Advanced Studies (2011–2014)
Citation
If using this corpus for research purposes, please cite it as:
Lash, Elliott. 2014. The Parsed Old and Middle Irish Corpus (POMIC). Version 0.1. https://www.dias.ie/index.php?option=com_content&view=article&id=6586&Itemid=224&lang=en
Contact info
In order to improve the corpus, I welcome any email sent to the following address:
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
The Parsed Old and Middle Irish Corpus (POMIC)
Introduction
The Parsed Old and Middle Irish Corpus (POMIC) is a corpus of Irish texts spanning the years from c. 700 to c. 1100. The current beta-version of the corpus consists of 14 texts which have been POS-tagged and syntactically parsed. The corpus is, however, a work in progress and future additions are envisioned which will include texts written at the end of the Middle Irish period (up to around 1200) as well as very early legal material that may, in some cases, be dated to the 7th century.
Tag-set and Parsing based on Penn Corpora
The tag-set and parsing annotation adopted for POMIC is intended to be broadly compatible with the Penn-group of corpora (see for instance the corpora of historical English data, the corpus of historical Icelandic data, the corpus of historical Portuguese data, the corpus of historical German data, and the corpus of historical Greek data, among some others).
CorpusSearch
The corpus files may be searched using the corpus query software CorpusSearch developed by Beth Randall. I include in the download list below a current .jar file in order to run CorpusSearch
# java -classpath CS_2.003.04.jar csearch/CorpusSearch
Corpus Manual
The annotation scheme adopted for the corpus is described in the manual (found below in the download list). This manual was developed as an adaptation of the manual (Release 2, 2010) for the Penn Corpora of Historical English written by Beatrice Santorini. The manual for POMIC is at present incomplete, but this will be rectified in future updates. I have tried to follow the Penn manual as closely as possible, but I have deviated from the Penn manual in order to show how POMIC differs from the Penn corpora.
Downloads
In order to use the corpus you can download the following corpus text files (.psd) (encoded in Mac OS Roman, at present). As well as the current (incomplete) corpus manual (.pdf).
Acknowledgements
I would like to thank Beatrix Färber (UCC, History Dept.) and Dr Hugh Fogarty (formerly of UCD, editor of TLH) for allowing me to use some of the texts from the CELT and TLH databases respectively in order to create POS-tagged and syntactically parsed versions. I would also like to thank Professor Liam Breatnach of the School of Celtic Studies for guidance in various matters relating to the corpus.
This work was done while holding an O’Donovan Scholarship at the School of Celtic Studies in the Dublin Institute for Advanced Studies (2011–2014)
Citation
If using this corpus for research purposes, please cite it as:
Lash, Elliott. 2014. The Parsed Old and Middle Irish Corpus (POMIC). Version 0.1. https://www.dias.ie/index.php?option=com_content&view=article&id=6586&Itemid=224&lang=en
Contact info
In order to improve the corpus, I welcome any email sent to the following address:
Elliott Lash: eljlash@gmail.com
School of Celtic Studies
Recent Posts
Vacancy: Bergin Fellowship
Tionól 2024 — call for papers
Lecture, Thursday May 23: ‘The Pathology of Love in Medieval Irish Literature’
Lecture, Thursday May 16: ‘The Translation of Medieval Irish Law Texts: Trial and Error’
Saint Patrick’s Day draw
Language switcher