The automatic and accurate interlinking of geospatial data poses an important scientific challenge, with direct application in several business fields. The major requirement is achieving high accuracy in identifying similar entities within datasets. For example, in a cadastral database, it is crucial that the land parcels, that were gathered from several different databases, are uniquely and clearly identified. In another example, for a geo-marketing company, it is of high importance to be able to accurately cross-reference the location/addresses of customers and companies, so that they are properly targeted.

LinkGeoML aims at researching, developing and extending machine learning methods, utilizing the vast amount of available, open geospatial data, in order to implement automated and highly accurate algorithms for interlinking geospatial entities. The proposed methods will implement novel training features, based on domain knowledge and on the analysis of open and proprietary geospatial datasets. Further, they will extend and specialize machine learning models on classification and similarity learning. The implemented technologies will be published as open source software and, also, will be integrated into existing, commercial applications for cadastration, geocoding and geomarketing, aiming at improving their functionality and increasing their commercial value and application domains.

LinkGeoML comprizes a partnership between enterprises and research organizations, aiming to perform high quality, industrial research with a twofold purpose: Provide SMEs with useful geospatial data integration tools to solve real-world problems, and advance the state of the art on machine learning methods for geospatial data integration. To achieve these purposes, LinkGeoML identifies use cases based on real-world integration problems, elicited by its two industrial partners, and researches how machine learning-based, interlinking methods can be applied to these use cases and facilitate their handling, in real-world data. Check out our initially prescribed use cases, as well as our first results.