Data Matching
After reading `Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection, I gained a clearer understanding of the challenges involved in working with real-world data. The book highlights that data is often incomplete, inconsistent, and noisy, and that data matching aims to identity records referring to the same real-world entity under these imperfect conditions. Rather than focusing on a single algorithm, the book presents data matching as a structured process that includes data preprocessing, indexing, similarity comparison, match classification, and evaluation.
Entity Resolution%20All%20of%20Entity%20Resolution.pdf?ou=1761137)
“(Almost) All of Entity Resolution” provides a comprehensive literature review of the entity resolution problem, which aims to systematically identify and merge multiple recors that refer to the same real-world entity across noisy and heterogeneous data sources. The paper traces the foundational history of the field from early probabilistic linkage methods in the mid-20th century to modern probabilistic and supervised machine learning approaches. It discusses deterministic and similarity-based techniques, extensions of classical probabilistic frameworks, clustering-based resolution methods, and recent advances in uncertainty quantification. Through practical examples spanning census data, human rights records, citation network, and medical data, it highlights both theoratical and practical challenges of entity resolution.
