Record linkage is the process of identifying and linking records across several files/databases that refer to the same entities. It is also refereed to as data cleaning, de-duplication (when considered on a single file/database), object identification, approximate matching or approximate joins, fuzzy matching and entity resolution. A formal description of the record linkage problem can be found here.
Nowadays there is a continuous increase in files/databases in digital format, customer lists, patient lists, census data, etc. Record linkage is useful in multiple applications where there is a need for combining information from two or more files. In addition, record linkage can be used for data cleaning to find duplicates in a file. Research has been done on this topic since the 60s. Given the complexity of the problem, from the computational point of view as well as from the quality of the data, the research is ongoing and there are still open problems. This page points to some of this research providing information about books, papers and available software.
  • Papers
