Universal Numerical Fingerprint (UNF)¶

Contents:

alternate text

Fig.1 UNF: used to uniquely identify and verify data.

Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed. The signature is thus independent of the storage format. E.g., the same data object stored in, say, SPSS and Stata, will have the same UNF.

Early versions of Dataverse were using the first released implementation of the UNF algorithm (v.3, implemented in R). Starting with Dataverse 2.0 and throughout the 3.* lifecycle, UNF v.5 (implemented in Java) was used. Dataverse 4.0 uses the latest release, UNF v.6. Two parallel implementation, in R and Java, will be available, for cross-validation.

Learn more: Micah Altman, Jeff Gill and Michael McDonald, 2003, Numerical Issues in Statistical Computing for the Social Scientist, New York: John Wiley.