2 Instructions for use

1This tutorial starts with a brief introduction to corpora and corpus analysis, followed by an introduction of the characteristics of specialized corpora of parliamentary debates and an overview of research into language and gender. The second part of the tutorial is a hands-on, which demonstrates the potential of some of the best-known corpus analysis techniques, such as concordances, frequency lists, keywords and collocations, to explore the topics female MPs debate in in the Slovenian Parliament over time and to compare and contrast their language use with that of their male counterparts.

2All the resources and tools used in this tutorial are online and available under open license. Corpus querying will be demonstrated on the NoSketchEngine concordancer, while additional manual analysis and visualization of the results will be performed in a spreadsheet editor (e.g., Google Spreadsheet or MS Excel).

1Screencasts, explanations of corpus querying procedures and links to the results are provided in blue boxes for anyone who wishes to reproduce the searches on their own.

3The siParl 2.0 corpus can be queried online through the NoSketchEngine or KonText concordancers at CLARIN.SI, the Slovenian node of CLARIN ERIC, the European research infrastructure for language resources and technology. The siParl 2.0 corpus can also be downloaded from the CLARIN.SI repository and then further analysed with other corpus or text mining tools. Tutorials showing how this can be done are available online, e.g., Corpus Analysis with Antconc and Basic Text Processing in R.

4This tutorial is an updated version of the original tutorial which was based on the previous version of the siParl corpus. In comparison to siParl 1.0, the siParl 2.0 corpus contains richer and cleaner speaker and session metadata which makes it possible to distinguish between MPs and other speakers. In addition, speeches have been labelled with parliamentary terms, which simplifies comparative analysis across different legislative periods. Furthermore, additional linguistic annotation layers, such as Universal Dependency features, syntactic parses and named entities, have also been added to the corpus, but since these will not be used in this tutorial, we do not elaborate on them further.

5If you wish to immediately proceed to hands-on exercises, skip to section 6, since the following three sections (3, 4 and 5) are dedicated to a general theoretical overview and an introduction to the basic terminology. Even though you can follow the tasks in section 6 without studying these three introductory sections, we strongly encourage you to do so before finishing this tutorial. They will give you the necessary theoretical foundations that will ensure a comprehensive understanding and independent use of the demonstrated analytical procedures and adequate interpretation of the results.