The Institute has comprehensive collections compiled over the course of more than 100 years.

Open data and licensing

The Institute’s aim is to open its data resources by digitising its linguistic corpora and making them available online to a broad audience.

Metadata

The following services contain metadata about the Institute’s corpora and material:

Licencing

The Institute’s current and future electronic language corpora that are public and without legislative or contractual limitations on their use will be opened up as public data resources, under a Creative Commons licence and in machine-readable format. In accordance with recommendation JHA 189 based on the Finnish Act on the Openness of Government Activities, the primary licence is Creative Commons Attribution 4.0. Previously, the Institute has used GNU and EUPL licences. Current licences also include CLARIN.

The following corpora and material are available under general public licence:

  • Baltic Finnic Language Atlas | AVAA | Creative Commons Attribution 4.0
  • Dictionary of Karelian | Kotus | Creative Commons Attribution 4.0
  • Kettunen’s Dialect Atlas | AVAA | Creative Commons Attribution 4.0
  • Dialect Corpus of the Finnish Syntax Archive in cooperation with the Finnish Syntax Archive at the University of Turku | Korp | Creative Commons Attribution-NoDerivatives 4.0
  • Vocabulary of Modern Finnish | Kotus | GNU LGPL, EUPL v. 1.1, Creative Commons Attribution-NoDerivatives 3.0
  • Atlas of Place Names | AVAA | Creative Commons Attribution 4.0
  • Place Name Database| Kotus | Creative Commons Attribution 4.0
  • Samples of Finnish (SKN) – AV corpus | LAT | CLARIN PUB, Creative Commons Attribution 4.0
  • Distribution Maps of the Dictionary of Finnish Dialects | AVAA | Creative Commons Attribution 4.0
  • New Year’s Speeches of the President of the Republic | Korp | EUPL
  • Swedish Place Names in Finland | AVAA | Creative Commons Attribution 4.0
  • Corpus of Old Literary Finnish | Korp | CLARIN PUB, EUPL v. 1.1

The corpora and material mentioned above are available through the following services:

  • AVAA, an open data publishing platform for research data, provided by the Ministry of Education and Culture’s Open Science and Research Project and produced by CSC – IT Center for Science
  • Kotus, the Institute’s own service, previously known as Kaino
  • Korp, a platform for text corpora, provided by the Language Bank of Finland (FIN-CLARIN) and produced by CSC – IT Center for Science
  • LAT, a platform for audio-visual corpora, provided by the Language Bank of Finland (FIN-CLARIN) and produced by CSC – IT Center for Science

In addition to the open data resources, the Institute has scientific corpora that are subject to licence, usually in the interests of protecting personal information.