Toponyms Data Bank
RAIJA MIIKKULAINEN (1999)
A research project begun in 1990 at the Research Institute for the Languages of Finland aims to create a computerized database for the study of Finnish place-names, incorporating field work materials from the Archive of Names as a research tool for scholars in a number of disciplines.
Modern Finnish toponymics are primarly based on systematic field work. Field work on toponyms has been done since the 1870s, and since the 1960s and 1970s it has been conducted more systematically with the help of increasingly better methods and maps (1). Saving new field collections as electronic data was started in the middle of the 1980s, and so the electronic archive of place-names was founded. Nowadays, The Archive of Names in the Research Institute for the Languages of Finland contains about 2.5 million geographical names in Finnish for the region of Finland and nearby territories. The Finland-Swedish place-names are not included in the archive of Finnish names.
Since computerizing the archive material was considered important, it was decided that an electronic database covering a part of the completed toponym collections should be collated (4). At the present, the Toponyms Data Bank is one of the projects at the Research Department of the Research Institute for the Languages of Finland, and two or three onomasts are working on it yearly. About 25% of the project personnel study problems relating to basic research.
One of the primary arguments for the database is its capability to respond to the modern onomastic theories and those of the future. Thus the project could not be based on one strictly theoretical framework. From the very first planning stages it was clear that the data base should allow the maximum flexibility in intergrating the manual archive as well as in searching relevant material for the onomastic planning and guidance and for onomastic and other research work.
Despite of the argument mentioned above the development of the toponyms
data base is prominently influenced by the toponomastic theory, developed
in Finland by Kurt Zilliacus (2) and Eero Kiviniemi (3). The theory of
analyzing toponyms syntactically and semantically is applied in the toponym
data base; particularly the concept of toponyms as a system, as a part
of the greater nomenclature to which each toponym belongs, as well as the
concept of the syntactically and semantically defined place-name generics
and specifics are prominent in the database.
The toponyms database has to allow the intergration of different kinds of toponymic data, for instance thorougly classified data for different studies. The database is going to form a material, useful for many purposes and compatible with many kinds of operating systems.
The primary aim is to allow the maximum flexibility in interactive searching.
We want to be able to find lexical elements of toponyms no matter where
they occur in the database, as well as suffixes in toponyms; we also want
to analyze the distribution of the structure of the place-names, analyze
the nomenclature of a particular area, for instance a river and lake system
or a village; we want to be able to find toponyms of particular types of
place, e.g. those of islands and islets, swamps and marshland, rivers,
meadows etc.; we also want to be able to know what kind of places are important
for the name givers generally and regionally and how those places are named,
and what kind of toponymic systems are used: do they differ from each other
and how, have the toponyms changed and how, what are the parallel toponyms
The Archive of Names organised by parish has, as I mentioned above, unique toponymic material of 2.5 million names based on oral sources. This large corpus material can be widely – also world-wide when needed – used only as electronic data, and a computerized database will enable scholars to search the archive more efficiently. As collating such a large archive electronically demands resources not currently available, 10% of the material was decided to be computerized first. Our goal is to collect 250,000 toponymic entries covering 40 regions. By the end of the year 1995, the systematized database contained roughly 100,000 entries covering 14 areas.
The database of toponyms is going to form a regional sample of the toponymic collection at the Archive of Names. Primary requirements for the regional sample were as follows: The sample region had to be totally collected with the aid of maps. From the point of the view of the state, it was important that different regions could be equally represented, because the administrative type (town or parish) of the sample region, its geographical situation and features, e.g. the density of its particular river and lake system, are essential in Finnish nomenclatures. One of the primary criteria was diaectal representativeness, too.
The Toponyms Data Bank offers three alternative modes of use, according to the need of users. The software applications for toponymic data have been developed at the Research Institute for the Languages of Finland.
First, the database for general access is DEC Rdb (Relation database), and searches are effected with an interactive SQL (Structured Query Language). SQL is a sophisticated searching technique which allows to combine data from different types of record within a database, and it is flexible in the range of search strategies it allows. Its terminology is not very transparent, so an extensive user's guide (6) with several examples for querying the database is available. However, more suitable user interfaces are needed. On the other hand, the flexibility of SQL is an advantage.
Second, the toponym data is available – temporarily for the staff of the Research Institute only – in text data base TRIP, where searches are effected with the CCL (Common Command Language).
Third, the same toponym data, including more information than the data bases, is available in corpus for users who need for example more information or copies of data for their research. The toponymic data belongs to the corpora of the Research Institute for the Languages of Finland. Text copora are available e.g. for modern Finnish, Old Finnish, dialects, names etc. Several different programs are available for using the data: SEARCH and AGREP for approximate string matching, and SEGREP for SGML coded data.
Organization of entries and data compilation
Each sample area included in the database usually covers a municipality or two neighbouring municipalities which have been collected in their entirety. For the database (DEC RdB) the entries are structured so that 33 columns or fields are reserved for each toponym. More information about detailed fields is available in Appendix.
Each of filed toponyms will be stored in the database with the following information:
- a) toponym in its standard-Finnish form
- b) boundaries of the word components
- c) structural type and/or formation of the toponym
- d) dialectical variant
- e) inflection
- f) administrative status (municipality, village)
- g) location on the map (the map number and square)
- h) type of place
- i) provenance of the toponym
- j) comments on location
- k) comments on type of place
- l) parallel toponyms
- m) relating toponymes
- n) name of document.
As an entry example an island in the city of Pori in the village of Yyteri called Iso Rimpikari:
- a) Iso Rimpikari [Great Rimpikari]
- b) Iso/ Rimpi+kari [Great/ Rimpi+kari]
- c) td [= a component added to toponyme]
- d) isorimpikari
- e) s [inessive]
- f) 609 Yyteri [town Pori and village Yyteri]
- g) 114207 (6): 5E 5
- h) 75 saari [island]
- i) reedy [dialectal rimpi = 'reed']
- j) near Pikku Rimpikari
- k) raised
- l) Rimpikari, Rimpikarit, 1805 Rimbi Karin Luodot (MHA A85 17/2)
- m) Pikku Rimpikari [Little Rimpikari]
- n) RH68 [the collector's initials and the year 1968 when collected].
Each toponym included in the data bank is available in form which is
the normalized spelling form. It is important for the benefit of non-specialists
and for functioning as a headword.
The place-name generics and specifics, and other elements are marked with boundaries which are based on the concept of syntactically and semantically defined place-name element. There are two levels of boundaries for these features: (/)-boundary for boundaries between place name generics and specifics, and (+)-boundary for other elements. The suffixies are not separeted as elements in the data bank.
The boundary types used in the the database are coded as follows:
- A. the single-element toponyms
- a) no marks, e.g. Britannia [= the Britain]
- b) (+) Musta+lampi [= ´Black+ Lake´, a village]
- B. the toponyms with a place name generic and specific
- c) (/) Musta/lampi [= ´Black/ Lake´, a lake], Iso/-Britannia = [The Great/-Britain]
- d) (+/) Mustan+lammin/tie [= ´Black+ Lake/ Road´, a road]
- e) (/+) Iso/ Musta+lampi [= ´Great/ Musta+lampi´, a lake]
f) (+/+) Mustan+lammin/ Sammakko+lammit [= ´Black+ Lake/ Froggy+ Ponds´, froggy ponds near the Black Lake].
The structural type and/or formation of the toponym
The field for the structural type and/or formation is important for searching toponyms of different structural and/or formation types. This field is important to facilitate searches; you can for instance find the toponyms inculding another toponym as a place-name specific or generic, the toponyms with an added specific, metonymy in namegiving, loan-names, variation and antonymy in namegiving, toponyms consisting of a word stem only, elliptic and epexegetic toponyms, and what kind of methaphors are used in namegiving.
As to the material in the database, about 56% of toponyms have no mark
in the field for structural type and/or formation of the toponym and about
39% are formed from former place- names in the nomenclature of the toponyms
Dialectal variant and inflexion
The pronunciation of a toponym is given when needed. The inflexion types
occuring in toponyms are coded.
The administrative status (municipality, village)
The municipality is coded with an official number code, and the village
by its name. Location on the map (the map number and square) is represented
in two ways. A reference to the map sheet according to the base map (1:20,000)
of the National Land Survey of Finland is obligatory. Also a reference
to the map included in the collection in the Archive of Names is given.
The comments on the filed toponym
The provenance of the toponym is coded as a comment like the comments
on location, the type of place, the age of a place and/or a toponym, and
other comments. The provenance of the toponym is collated as it is written
in the archive entry without comments whether they are true or not. Folkloristic
data is marked, too.
The type of place
The type of place is coded with two character fields, one giving the number code and the other including a one-word explanation (mostly a geographical term). Thus using the number code the users of the database are allowed to select for instance settlement names, river names, lake names, field names, using one number code for the type of place.
The type of place is classified into 8 main classes in each of which there
are several groups, altogether 43 subgroups. The classification concerning the
type of place from the point of view of namegiving is developed in Finland and
published by Eero Kiviniemi (3). The main classes are district (10), settlement
(20), roads and artefacts (30), agricultural land, hayfields, pastures (40),
topography (50), soil and vegetation (60), places by and surrounded by water
(70), river and lake systems (80).
THE NUMBER CODES OF THE TYPES OF PLACE USED IN THE TOPONYMS DATA BANK (The Research Institute for the Languages of Finland) (3, 5)
- 10 DISTRICT
- 11 settled or administrative areas and places: county, municipality, village, a part of a village, a plot of land; cult site
- 12 land district: a piece of land (also cultivated or populated) as to its size, form och situation; also boundaries
- 20 SETTLEMENTS
- 21 permanent/impermanent/former residence
- 22 outbuilding (also boat hut, cellar, well); works; buildings
- 30 ROADS; ARTEFACTS
- 31 road, lane, path, cattleway
- 32 a strech of a road, resting place; gate, bridge; crossing; duckboards; harbour
- 33 artefacts; excavalion, ditch, dam, embankment, tar-burning pit, ruin
- 40 AGRICULTURAL LAND, HAYFIELDS, PASTURES
- 41 field, field area, a part of field, clearance
- 42 hayfield, meadow
- 43 pasture
- 44 (a former) burnt-over clearing for cultivation
- 50 TOPOGRAPHY
- 51 elevated surface (mountain, hill), highland
- 52 cliff, bank, brink; slope
- 53 sink, depression; wet low-lying land
- 54 bolder field, rock; cave; ravine etc.
- 55 even (open) ground or uneven ground
- 56 narrow passage; bend
- 60 SOIL, VEGETATION
- 61 forestland, wood
- 62 bog; swamp, marshland; quagmire, bog pool
- 63 bush, thicket
- 64 a place according to the composition (and humidity) of its soil
- 65 a place defined by its vegetation
- 66 tree
- 70 PLACES BY AND SURROUNDED BY WATER, COASTAL PLACES
- 71 peninsula
- 72 bay
- 73 shore; alluvial land, flood land
- 74 isthmus, neck of land
- 75 island, islet, rock
- 80 RIVER AND LAKE SYSTEMS
- 81 lake, pond; water (system)
- 82 open lake, open sea; channel, strait, water way
- 83 deep, shoal, shallow, fishing ground
- 84 pool, puddle, pond, mud pit
- 85 river, branch, river head
- 86 place in a river (not a current)
87 rapid, current
The frequencies (%) of the Finnish toponyms grouped by the type of place
are as follows: settlement (classes 10 - 30) 23, cultivation (class 40)
22, topography and soil (classes 50 - 60) 23, and places by water and river
and lake systems (classes 70 - 80) 28.
The entries are structured so that 33 columns or fields (DEC RdB) are reserved
for each toponym.
THE ENTRIES OF THE TOPONYMS DATA BANK (The Research Institute for the Languages of Finland) (6)
0. entry number: identifies the entry
1. headword: standard-Finnish form
2. headword with the boundaries: standard-Finnish form with the boundaries (and a possible question mark, ?)
3. boundaries: the boundaries of the word component and name element(s)
4. structure: the structural type and/or formation of the toponym (and a possible question mark, ?)
5. dialect: the dialectal figure(s) of the toponym; or facultative standard-Finnish form
6. inflection: the code(s) for the inflection
7. municipality: the official number code for municipality
8. former municipality: the official code number for the former municipality
9. village1: the name of the village where the place is situated
10. village2: another village name when needed
11. village help: a possible question mark
12. map1: the number of the map sheet (according to the base map, National Land Survey of Finland)
13. location1: the map square(s) and codes for location (or non-basic map location)
14. map2: a possible number of another map
15. location2: same information as for location 1
16. location help: a possible question mark, ?
17. code of the type of place: the code number for the type of place and a possible question mark
18. type of place: the word for the type of place and a possible description for it
19. code for the former type of place: the code number for the former type of place (when the nature of the place has changed)
20. former type of place: the word for the former type of place
21. comments on the toponym:
&1 the provenance of the toponym
&2 the location of the place
&3 more descriptions on the type of place
&4 the age of the place and/or the toponym
&5 other comments
22. parallel toponym1: the first parallel toponym
23. - description1: the description of the first parallel toponym
24. parallel toponym2: the second parallel toponym
25. - description1: the description of the second parallel toponym
26. parallel toponym help: a possible asterisk (*) for further information available in the toponym corpus or a possible question mark
27. name figure in an old document: the oldest name figure and the date of the document; a possible asterisk (*) for further information in the corpus
28. relating toponym: the relating toponym(s)
29. archive reference: refers to the headword in the manual archive of names
30. collector1: the initials of the collector and the year when collected
31. collector2: the initials of the second collector and the year when collected
1. Eeva Maria Närhi, The Onomastic Central Archive – the Foundation of Finnish Onomastics. In Finnish Onomastics Namenkunde in Finnland, Studia Fennica – Review of Finnish Linguistics and Ethnology 34 (Helsinki, SKS, Finnish Literary Society), 1990, 9–25.
2. See e.g. Kurt Zilliacus, The Place-Names in ´The Skerries´. In Finnish Onomastics - Namenkunde in Finnland, Studia Fennica – Review of Finnish Linguistics and Ethnology 34 (Helsinki, SKS, Finnish Literary Society) 1990, 122–129.
3. Eero Kiviniemi, Perustietoa paikannimistä (Suomi 149. Helsinki, SKS, Finnish Literary Society)1990, 55–85.
4. Raija Miikkulainen, Die Finnische Ortsnamenbank. In Proceedings of the XVIIth International Congress of Onomastic Sciences. Volume 2. (Helsinki, The University of Helsinki and The Finnish Research Centre for Domestic Languages) 1990, 171–178.
5. Raija Miikkulainen, Kotimaisten kielten tutkimuskeskuksen paikannimitietokannan kirjoittaminen (Helsinki, Kotimaisten kielten tutkimuskeskus) 1993, 23 p. and appendix 13 p. [Writer's guide.]
6. Raija Miikkulainen and Tarmo Rahikainen, Kotimaisten kielten tutkimuskeskuksen paikannimitietokannan käyttö (Helsinki, Kotimaisten kielten tutkimuskeskus) 1993, 22 p. and apppendix 3 p. [User's guide.]
Based on the paper read at the XIXth International Congress of Onomastic Sciences Aberdeen, August 4-11, 1996.
Miikkulainen, Raija 1998. The Database of Finnish Toponyms. In W. F. H. Nicolaisen (ed.). Proceedings of the XIXth International Congress of Onomastic Sciences Aberdeen, August 4-11, 1996. Volume 2. Department of English, University of Aberdeen. P. 248 - 255.