Toponyms Data Bank



A research project begun in 1990 at the Research Institute for the Languages of Finland aims to create a computerized database for the study of Finnish place-names, incorporating field work materials from the Archive of Names as a research tool for scholars in a number of disciplines.

Modern Finnish toponymics are primarly based on systematic field work. Field work on toponyms has been done since the 1870s, and since the 1960s and 1970s it has been conducted more systematically with the help of increasingly better methods and maps (1). Saving new field collections as electronic data was started in the middle of the 1980s, and so the electronic archive of place-names was founded. Nowadays, The Archive of Names in the Research Institute for the Languages of Finland contains about 2.5 million geographical names in Finnish for the region of Finland and nearby territories. The Finland-Swedish place-names are not included in the archive of Finnish names.

Since computerizing the archive material was considered important, it was decided that an electronic database covering a part of the completed toponym collections should be collated (4). At the present, the Toponyms Data Bank is one of the projects at the Research Department of the Research Institute for the Languages of Finland, and two or three onomasts are working on it yearly. About 25% of the project personnel study problems relating to basic research.

One of the primary arguments for the database is its capability to respond to the modern onomastic theories and those of the future. Thus the project could not be based on one strictly theoretical framework. From the very first planning stages it was clear that the data base should allow the maximum flexibility in intergrating the manual archive as well as in searching relevant material for the onomastic planning and guidance and for onomastic and other research work.

Despite of the argument mentioned above the development of the toponyms data base is prominently influenced by the toponomastic theory, developed in Finland by Kurt Zilliacus (2) and Eero Kiviniemi (3). The theory of analyzing toponyms syntactically and semantically is applied in the toponym data base; particularly the concept of toponyms as a system, as a part of the greater nomenclature to which each toponym belongs, as well as the concept of the syntactically and semantically defined place-name generics and specifics are prominent in the database.

The toponyms database has to allow the intergration of different kinds of toponymic data, for instance thorougly classified data for different studies. The database is going to form a material, useful for many purposes and compatible with many kinds of operating systems.

The primary aim is to allow the maximum flexibility in interactive searching. We want to be able to find lexical elements of toponyms no matter where they occur in the database, as well as suffixes in toponyms; we also want to analyze the distribution of the structure of the place-names, analyze the nomenclature of a particular area, for instance a river and lake system or a village; we want to be able to find toponyms of particular types of place, e.g. those of islands and islets, swamps and marshland, rivers, meadows etc.; we also want to be able to know what kind of places are important for the name givers generally and regionally and how those places are named, and what kind of toponymic systems are used: do they differ from each other and how, have the toponyms changed and how, what are the parallel toponyms like, etc.

The Archive of Names organised by parish has, as I mentioned above, unique toponymic material of 2.5 million names based on oral sources. This large corpus material can be widely – also world-wide when needed – used only as electronic data, and a computerized database will enable scholars to search the archive more efficiently. As collating such a large archive electronically demands resources not currently available, 10% of the material was decided to be computerized first. Our goal is to collect 250,000 toponymic entries covering 40 regions. By the end of the year 1995, the systematized database contained roughly 100,000 entries covering 14 areas.

The database of toponyms is going to form a regional sample of the toponymic collection at the Archive of Names. Primary requirements for the regional sample were as follows: The sample region had to be totally collected with the aid of maps. From the point of the view of the state, it was important that different regions could be equally represented, because the administrative type (town or parish) of the sample region, its geographical situation and features, e.g. the density of its particular river and lake system, are essential in Finnish nomenclatures. One of the primary criteria was diaectal representativeness, too.


The Toponyms Data Bank offers three alternative modes of use, according to the need of users. The software applications for toponymic data have been developed at the Research Institute for the Languages of Finland.

First, the database for general access is DEC Rdb (Relation database), and searches are effected with an interactive SQL (Structured Query Language). SQL is a sophisticated searching technique which allows to combine data from different types of record within a database, and it is flexible in the range of search strategies it allows. Its terminology is not very transparent, so an extensive user's guide (6) with several examples for querying the database is available. However, more suitable user interfaces are needed. On the other hand, the flexibility of SQL is an advantage.

Second, the toponym data is available – temporarily for the staff of the Research Institute only – in text data base TRIP, where searches are effected with the CCL (Common Command Language).

Third, the same toponym data, including more information than the data bases, is available in corpus for users who need for example more information or copies of data for their research. The toponymic data belongs to the corpora of the Research Institute for the Languages of Finland. Text copora are available e.g. for modern Finnish, Old Finnish, dialects, names etc. Several different programs are available for using the data: SEARCH and AGREP for approximate string matching, and SEGREP for SGML coded data.

Organization of entries and data compilation

Each sample area included in the database usually covers a municipality or two neighbouring municipalities which have been collected in their entirety. For the database (DEC RdB) the entries are structured so that 33 columns or fields are reserved for each toponym. More information about detailed fields is available in Appendix.

Each of filed toponyms will be stored in the database with the following information:

a) toponym in its standard-Finnish form
b) boundaries of the word components
c) structural type and/or formation of the toponym
d) dialectical variant
e) inflection
f) administrative status (municipality, village)
g) location on the map (the map number and square)
h) type of place
i) provenance of the toponym
j) comments on location
k) comments on type of place
l) parallel toponyms
m) relating toponymes
n) name of document.

As an entry example an island in the city of Pori in the village of Yyteri called Iso Rimpikari:

a) Iso Rimpikari [Great Rimpikari]
b) Iso/ Rimpi+kari [Great/ Rimpi+kari]
c) td [= a component added to toponyme]
d) isorimpikari
e) s [inessive]
f) 609 Yyteri [town Pori and village Yyteri]
g) 114207 (6): 5E 5
h) 75 saari [island]
i) reedy [dialectal rimpi = 'reed']
j) near Pikku Rimpikari
k) raised
l) Rimpikari, Rimpikarit, 1805 Rimbi Karin Luodot (MHA A85 17/2)
m) Pikku Rimpikari [Little Rimpikari]
n) RH68 [the collector's initials and the year 1968 when collected].

Standard-Finnish form

Each toponym included in the data bank is available in form which is the normalized spelling form. It is important for the benefit of non-specialists and for functioning as a headword.

The boundaries

The place-name generics and specifics, and other elements are marked with boundaries which are based on the concept of syntactically and semantically defined place-name element. There are two levels of boundaries for these features: (/)-boundary for boundaries between place name generics and specifics, and (+)-boundary for other elements. The suffixies are not separeted as elements in the data bank.

The boundary types used in the the database are coded as follows:

A. the single-element toponyms

a) no marks, e.g. Britannia [= the Britain]
b) (+) Musta+lampi [= ´Black+ Lake´, a village]

B. the toponyms with a place name generic and specific

c) (/) Musta/lampi [= ´Black/ Lake´, a lake], Iso/-Britannia = [The Great/-Britain]
d) (+/) Mustan+lammin/tie [= ´Black+ Lake/ Road´, a road]
e) (/+) Iso/ Musta+lampi [= ´Great/ Musta+lampi´, a lake]

f) (+/+) Mustan+lammin/ Sammakko+lammit [= ´Black+ Lake/ Froggy+ Ponds´, froggy ponds near the Black Lake].

The frequencies (%) of the types are a) 17.7, b) 13.9, c) 55.5, d) 11.1, e) 1.8, f) 0.3 (more complicated combinations only 0.5).

The structural type and/or formation of the toponym

The field for the structural type and/or formation is important for searching toponyms of different structural and/or formation types. This field is important to facilitate searches; you can for instance find the toponyms inculding another toponym as a place-name specific or generic, the toponyms with an added specific, metonymy in namegiving, loan-names, variation and antonymy in namegiving, toponyms consisting of a word stem only, elliptic and epexegetic toponyms, and what kind of methaphors are used in namegiving.

As to the material in the database, about 56% of toponyms have no mark in the field for structural type and/or formation of the toponym and about 39% are formed from former place- names in the nomenclature of the toponyms data base.

Dialectal variant and inflexion

The pronunciation of a toponym is given when needed. The inflexion types occuring in toponyms are coded.

The administrative status (municipality, village)

The municipality is coded with an official number code, and the village by its name. Location on the map (the map number and square) is represented in two ways. A reference to the map sheet according to the base map (1:20,000) of the National Land Survey of Finland is obligatory. Also a reference to the map included in the collection in the Archive of Names is given.

The comments on the filed toponym

The provenance of the toponym is coded as a comment like the comments on location, the type of place, the age of a place and/or a toponym, and other comments. The provenance of the toponym is collated as it is written in the archive entry without comments whether they are true or not. Folkloristic data is marked, too.

The type of place

The type of place is coded with two character fields, one giving the number code and the other including a one-word explanation (mostly a geographical term). Thus using the number code the users of the database are allowed to select for instance settlement names, river names, lake names, field names, using one number code for the type of place.

The type of place is classified into 8 main classes in each of which there are several groups, altogether 43 subgroups. The classification concerning the type of place from the point of view of namegiving is developed in Finland and published by Eero Kiviniemi (3). The main classes are district (10), settlement (20), roads and artefacts (30), agricultural land, hayfields, pastures (40), topography (50), soil and vegetation (60), places by and surrounded by water (70), river and lake systems (80).

THE NUMBER CODES OF THE TYPES OF PLACE USED IN THE TOPONYMS DATA BANK (The Research Institute for the Languages of Finland) (3, 5)

11 settled or administrative areas and places: county, municipality, village, a part of a village, a plot of land; cult site
12 land district: a piece of land (also cultivated or populated) as to its size, form och situation; also boundaries
21 permanent/impermanent/former residence
22 outbuilding (also boat hut, cellar, well); works; buildings
31 road, lane, path, cattleway
32 a strech of a road, resting place; gate, bridge; crossing; duckboards; harbour
33 artefacts; excavalion, ditch, dam, embankment, tar-burning pit, ruin
41 field, field area, a part of field, clearance
42 hayfield, meadow
43 pasture
44 (a former) burnt-over clearing for cultivation
51 elevated surface (mountain, hill), highland
52 cliff, bank, brink; slope
53 sink, depression; wet low-lying land
54 bolder field, rock; cave; ravine etc.
55 even (open) ground or uneven ground
56 narrow passage; bend
61 forestland, wood
62 bog; swamp, marshland; quagmire, bog pool
63 bush, thicket
64 a place according to the composition (and humidity) of its soil
65 a place defined by its vegetation
66 tree
71 peninsula
72 bay
73 shore; alluvial land, flood land
74 isthmus, neck of land
75 island, islet, rock
81 lake, pond; water (system)
82 open lake, open sea; channel, strait, water way
83 deep, shoal, shallow, fishing ground
84 pool, puddle, pond, mud pit
85 river, branch, river head
86 place in a river (not a current)
87 rapid, current 

The frequencies (%) of the Finnish toponyms grouped by the type of place are as follows: settlement (classes 10 - 30) 23, cultivation (class 40) 22, topography and soil (classes 50 - 60) 23, and places by water and river and lake systems (classes 70 - 80) 28.


THE ENTRIES OF THE TOPONYMS DATA BANK (The Research Institute for the Languages of Finland) (6)

The entries are structured so that 33 columns or fields (DEC RdB) are reserved for each toponym.
Column information

0. entry number: identifies the entry

1. headword: standard-Finnish form

2. headword with the boundaries: standard-Finnish form with the boundaries (and a possible question mark, ?)

3. boundaries: the boundaries of the word component and name element(s)

4. structure: the structural type and/or formation of the toponym (and a possible question mark, ?)

5. dialect: the dialectal figure(s) of the toponym; or facultative standard-Finnish form

6. inflection: the code(s) for the inflection

7. municipality: the official number code for municipality

8. former municipality: the official code number for the former municipality

9. village1: the name of the village where the place is situated

10. village2: another village name when needed

11. village help: a possible question mark

12. map1: the number of the map sheet (according to the base map, National Land Survey of Finland)

13. location1: the map square(s) and codes for location (or non-basic map location)

14. map2: a possible number of another map

15. location2: same information as for location 1

16. location help: a possible question mark, ?

17. code of the type of place: the code number for the type of place and a possible question mark

18. type of place: the word for the type of place and a possible description for it

19. code for the former type of place: the code number for the former type of place (when the nature of the place has changed)

20. former type of place: the word for the former type of place

21. comments on the toponym:

&1 the provenance of the toponym

&2 the location of the place

&3 more descriptions on the type of place

&4 the age of the place and/or the toponym

&5 other comments

22. parallel toponym1: the first parallel toponym

23. - description1: the description of the first parallel toponym

24. parallel toponym2: the second parallel toponym

25. - description1: the description of the second parallel toponym

26. parallel toponym help: a possible asterisk (*) for further information available in the toponym corpus or a possible question mark

27. name figure in an old document: the oldest name figure and the date of the document; a possible asterisk (*) for further information in the corpus

28. relating toponym: the relating toponym(s)

29. archive reference: refers to the headword in the manual archive of names

30. collector1: the initials of the collector and the year when collected

31. collector2: the initials of the second collector and the year when collected


