A leading courier company operating in Greece, Cyprus, Albania, and Bulgaria handling more than 60 million shipments annually.
The efficiency and profitability of our customer rely on accurate and rapid discovery of the geographic positions of postal addresses. These positions are utilized as input for an application that addresses the “Travelling Salesman Problem,” enabling the evaluation of the shortest courier routes for mail delivery. Achieving the discovery of address geolocation involves mapping addresses to a reference dataset containing normalized and geo-positioned Greek addresses.
However, the mapping task is intricate due to various factors, including misspelled names, differences in language between input address names and official reference address names, addresses written in Greeklish (Greek language transliterated using the Latin alphabet), incomplete addresses for multiword addresses, incorrect postcodes, similar address names in proximity, extraneous descriptive noise words, errors stemming from OCR processes, and more.
The solution leverages text analytics technologies , particularly Named Entity Recognition (NER) techniques, fuzzy matching, and other machine learning algorithms. The address resolution process operates as a pipeline of tasks, starting with parsing and recognizing key components of an address, such as street, street number, post code, municipality, and county.
Following the parsing step, fuzzy matching and mapping tasks prioritize and rank search results, giving precedence to misspelled names and incorrect postcodes. Each mapped address candidate is associated with flags that provide comprehensive details about the matching process, including the processing and matching of all input words, fuzzy matching in street names, and post code matching, among others. These flags prove invaluable in postprocessing procedures which help adjust the ranking of candidate addresses in cases where multiple addresses are generated by the application.
Our application offers versatile functionality through different modes of usage. Firstly, it can be integrated as a library, allowing clients to build various types of applications, including desktop, web, or batch applications. Secondly, it can be employed as a batch application to process addresses stored in files or database tables, with the flexibility to store the results in either files or databases. Lastly, it can be utilized as a web service under Apache Tomcat, providing a concise, real time API for seamless integration and utilization.
The recognition and matching precision for input mail addresses to reference mail addresses surpasses 95%. The remaining 5% accounts for scenarios where addresses are either unknown (not present in the reference database) or exhibit significant ambiguity.