The Institute has comprehensive collections compiled over the course of more than 100 years.

Lexical corpora

The Institute has comprehensive collections of lexical corpora. They contain data on Finnish dialects, Old Literary Finnish, Karelian and Finnish slang.

The largest lexical collections are the corpora of the Word Archive of Finnish Dialects, collected since the beginning of the 20th century. The main corpus of Finnish dialects contains more than 8 million data items about approximately 400,000 words. The material covers all Finnish dialects spoken in current Finnish territory as well as on the Karelian Isthmus and in Ingria. It also includes the Finnish dialects spoken in West Bothnia (Sweden) and Finnmark (Norway) and the extinct dialect of the immigrants from Savo, spoken in Värmland (western central Sweden).

The Dictionary of Old Literary Finnish is compiled based on a corpus of approx. 500,000 entry slips. The word corpus of Karelian consists of more than 550,000 dialect word entry slips. The oldest entries are from the late 19th century and the newest ones from the 1970s.


Solmu-sanalippu. Suomen murteiden sana-arkisto.
An entry slip. Photo: The archives of the Institute.