In a new article, "Optimal F‑score Matching for Bipartite Record Linkage," published in Statistics and Computing (July 2025), Eric A. Bai, Olivier Binette, and DUPRI Scholar Jerome P. Reiter propose a novel approach to record linkage that directly optimizes the F‑score — the harmonic mean of precision and recall.
Probabilistic record linkage is often used to match records from two files, in particular when the variables common to both files comprise identifiers measured with occasional errors like names and demographic variables. Traditional record linkage methods often produce probabilities over potential matches, leaving users to choose a final linkage structure based on somewhat arbitrary decision rules. This new method instead selects the matching that maximizes expected F‑score while ensuring one-to-one correspondence between records. The authors demonstrate its effectiveness through simulations and real-world data, showing improved accuracy and interpretability over existing techniques. Code and data are publicly available on GitHub.
Read more: https://doi.org/10.1007/s11222-025-10701-y