Within a software system I developed for Service Employees International Union, we encountered three distinct contexts in which scored and heuristic matching was needed. The implementations of each approach capture many of the trade-offs similar systems often face.
In one context, the requirement was to quickly identify records which are "probably the same entity (person)" as seen in a prior dataset. One dataset might contain on the order of 100,000 records, but historical data will describe a few million entities. For this operation, it was essential to be able to perform the record correlation quickly—i.e. at several thousand records evaluated per second—and it was desirable that any stored data omits any PII (personally identifiable information).
In another context, we wished to identify likely typos or spelling variations among company names. The number of underlying companies is merely in the thousands, not in the hundreds of thousands or millions, and the speed requirement was not quite as demanding, hence the algorithm permitted use of string similarity metrics, but with scoring based partially on concordance of other fields.
In a third context, we had access to a large external data source with information about nearly all United States voters (and many non-voters), amounting to 380 million person records (some duplicate an identifier). The external source is largely canonical, but it is common for internal data to differ from external in nicknames, changed lastnames, changes or spelling variants in addresses, and other fields. Again the goal was to produce scored matches about "the person most likely referenced by internal data" as efficiently as possible.