rules, the variables have been collapsed at the surname level, and only surnames
with a frequency equal to 5 or above have been included. We define as earnings the
total income net of real estate income, while real wealth has been estimated from
real estate income.
7
Examining the persistence across centuries in certain professions, as in
equation (3), requires additional datasets, because the tax records do not contain
information on professions. We proceeded as follows. First, we have individual
level data on the universe of taxpayers in the province of Florence in 2005, for
which we observed only surnames, drawn from the Italian Internal Revenue
Service. Second, we merged this dataset with the public registers containing the
surnames of lawyers, bankers, medical doctors and pharmacists, and goldsmiths.
For example, suppose that there are ???????????? taxpayers with a certain surname and that
we know that there are ????????????
1
lawyers and ????????????
2
bankers with the same surname. We
assumed, without loss of generality, that the first ????????????
1
individuals are lawyers and
the second ????????????
2
are bankers (obviously, with ∑ ????????????
????????????
< ????????????
????????????
). The public archives for these
professions are the following: bankers are taken from the ORgani SOciali ORSO
archive, which is managed by the Bank of Italy and contains registry information
on the members of the governing bodies of banks (we restrict the analysis to
Tuscan banks, as Tuscany is the Italian region where Florence is located); lawyers,
doctors and pharmacists came from the archives of the local professional orders;
finally, the National Business Register database contains registry information on
the members of the governing bodies of goldsmith firms and shops (again, we
focused on surnames in the Florence area).
3.2 The origin and distribution of surnames
Pseudo-links between ancestors and their descendants are generated using
(implicitly) geographical localization – since we consider people living in Florence
in both samples – and exploiting the informational content of surnames.
Italians surnames have some interesting peculiarities. They are inherited
from one generation to the next through the patriline, and most Italians began to
assume hereditary surnames in the 15
th
century. Some surnames derived from
one’s father’s names (patronymics) through the use of the Latin genitive (e.g.
7
Specifically, from the biannual Survey of Household Income and Wealth carried out by the Bank of
Italy (we used the waves from 2000 to 2012), we selected people living in the province of Florence,
we regressed the log of real assets on age, gender and incomes from the building (actual and
imputed rents), and we stored the coefficients. Then, we imputed real wealth for the individuals
included in the tax records using age, gender, real estate incomes and the coefficients estimated and
stored above.
11
Mattei means son of Matteo)
8
or formed by the preposition of “di”/“de” followed
by the name (e.g. Di Matteo or De Matteo meaning the son of Matteo). The origin or
residence of the family gave rise to many surnames such as the habitat – Della
Valle (i.e. “of the valley”) – specific places – Romano (i.e. “Roman”) – or nearby
landmarks – Piazza (i.e. “square”). The occupations (or utensils associated with the
occupation) were also a widespread source of surnames, such as Medici (“medical
doctors”), Martelli (“hammer”) or Forni (“ovens”). Finally, nicknames, typically
referring to physical attributes, also gave rise to some family names, e.g. Basso
(“short”) or Grasso (“fat”). The huge variety of surnames was amplified by the
extraordinary linguistic diversity of Italy. Many surnames’ endings are region-
specific, such as “-n” in Veneto (e.g. Benetton), “-iello” in Campania (e.g. Borriello),
“-u” or “-s” in Sardinia (e.g. Soru and Marras) and “-ai” or “-ucci” in Tuscany (e.g.
Bollai and Balducci).
To our aim, the context we analyzed has two striking features. First, in Italy,
there are a large number of surnames, likely one of the largest collections of
surnames of any ethnicity in the world. This is associated with a high
fractionalization: for example, the first 100 most frequent surnames only account
for 7% of the overall population, against 22% in England. Second, and
unsurprisingly, the surnames present in our data are highly Florence-specific: on
average, the ratio between the surname share in Florence and the corresponding
figure at the national level, which measures a specialization index centered on 1, is
nearly 6. Therefore, the informational content of the surname is presumably much
higher than elsewhere, supporting our empirical strategy in the identification of
the pseudo-links.
The creation of the pseudo-links between the two samples through surnames
has been pursued with some necessary degree of flexibility to account for slight
modifications in the surnames across the centuries. For example, current
taxpayers with surnames such as “Mattei”, “De Matteo” or “Di Matteo” are all
considered descendants of “Matteo”.
3.3 Descriptive analysis
In the 1427 Census, there are about 10,000 families (1,900 surnames),
corresponding to nearly 40,000 individuals. The descriptive statistics reported in
Table 1 refer to household heads. The earnings and real wealth were equal, on
average, to 36 and 291 florins, respectively. Moreover, the two variables were
8
The large number of Italian surnames ending in “-i” is also due to the medieval habit of identifying
families by the name of the ancestors in the plural (which have an “-i” suffix in Italian).
12