Places in the Long Tail


Maué, P.

ISBN print



Local variations of place names are typically not present in official place name directories. Spatially-aware information retrieval tools or navigation systems rely on these gazetteers to resolve the place names into coordinates. Place names commonly appear in user-generated information on the Web, for example as tags used to describe the content of a picture. Local variations are expected to exist in the long tail of the frequency distribution of terms appearing in user-generated annotations. This thesis explores whether these annotations are a useful source to automatically acquire not officially recognised place names. It presents a methodology analysing point patterns to assess if a term refers to a place. These point patterns represent the spatial and temporal coverage of a term. A cluster analysis further separates point patterns capturing the extent of ambiguous place names. The resulting dictionary of place names can be an invaluable source for any application where users are expected to use place names in search queries.