et al.,
2017, 2022).
8
]
Chapter 1
SETTING
THE STAGE
Italy, 30 March 2022, Italian National Coastguard officer monitors vessels fishing.
© FAO/Cristiano Minichiello
[
9
1
SETTING THE STAGE
B
efore discussing challenges in data
systems for FSN, it is important to lay
out the key definitions and conceptual
framework that will guide this analysis.
DEFINING KEY TERMS
In the title of this report and in the following
sections we use the concepts of
data, analysis
tools and data governance, which may mean
different things to different readers. A clear
definition of the way we define and use the terms
is thus critical to avoid confusion on the intended
meaning of some of the statements, implicit or
explicit value judgements, and recommendations
we present in the rest of the report.
DEFINING DATA
A variety of definitions of data can be found in the
popular and scientific literature, many of which
include
facts, statistics or knowledge, among
a variety of related terms. Several definitions
emphasize the numeric aspects, while others
recognize that data may also take other forms. For
this report, we adopt a definition inspired by Kitchin
(2021, p. 2), which states that data are:
any set of codified symbols representing
units of information regarding specific aspects
of the world that can be captured or generated,
recorded, stored, and transmitted in analogue or
digital form.
At initial glance, this phrasing may seem overly
complex, yet it represents a substantive difference
from many other existing definitions for at least
four reasons.
First, the expression
codified symbols allows a
meaningful description of data without use of the
terms
fact or knowledge. Knowledge and facts are
indeed inferences that can be gleaned following
consolidation, analysis and interpretation of data,
in relation to a specific question in context (Zins,
2007), but are not, in themselves, data. It is only
once such inferences are codified, recorded, stored
and transmitted that they become
new data, thus
closing the circle and justifying the image of a data
cycle that evolves into an ascending spiral where,
at the completion of each cycle, the amount of data
and information available for use and re-use grows.
Second, the use of
codified symbols is
appropriately inclusive language, as it makes it
clear that
data do not necessarily need to be
numeric
. While in many cases data represent
measured quantities or proportions, thanks to
the increased digitalization of the information, we
often deal with datasets consisting of essentially
qualitative information, stored in the form of
texts, images, sounds and other forms.
Third, referring to data as
codified symbols
has the further advantage of making the
importance of codifying explicit:
symbols used
to record and store data must be chosen
carefully and their meaning must be properly
10
]
DATA COLLECTION AND ANALYSIS TOOLS FOR FOOD SECURITY AND NUTRITION
communicated
. One problem that is too often
encountered in human and social science
contexts is treating data that are of an essentially
qualitative nature as if they represent measured
quantities. The problem arises with indexes or
scores corresponding to counts of binary (
yes/
no) events, and that are therefore codified with
integer numbers, which only contain ordinal
information on the involved cases. In those
cases, the numeric representation encourages
an incorrect treatment of such indexes or
scores, with analysts computing averages or
other statistics that are only meaningful and
appropriate for
interval measures. Such scores
or indexes should instead be properly treated as
ordinal measures.
6
More generally, qualitative
data must be coded following standardized
coding procedures, which inevitably begins with
the adoption of
clear operational definitions
of the concepts, constructs or attributes
captured by the data
. This is crucial to avoid
ambiguity in interpreting data, but not always
easy to achieve. Contrary to quantitative variables
reflecting unambiguously defined attributes
of the physical world (e.g. length, mass, etc.)
that can be directly observed and measured,
most qualitative data in social science consist
of variables and indicators intended to reflect
concepts or constructs that are not always
defined unambiguously and understood in the
same way by everyone. Think, for example, of the
concepts of
gender or ethnicity, or constructs
such as
poverty or food insecurity. This poses
several philosophical and practical challenges,
as even the apparently simple process of just
recording data, for example, might entail active
decision-making regarding
which value to
record, which may even have moral implications
(e.g. deciding on a person’s ethnicity simply
by observing them walking down the street or
looking at a photograph of them, or on the basis
of their name, or by asking the respondent’s
opinion in a survey; or identifying poverty with
monetary levels of disposable income; or food
insecurity with inadequate dietary energy intake).
These considerations point to the importance of
always accompanying data with clear
metadata
which provides sufficient information on the
assumptions made in producing them, and of
ensuring that sufficient competence exists to
correctly interpret them at all levels of the data
cycle when the data are used to inform decisions.
The continuing development of sophisticated
analytic methods, both in statistics and data
science, necessary for proper treatment of
non-traditional data, creates a growing need
for human resources skilled in the use of such
methods
.
As we shall discuss in more detail in Chapter
3, and stresses the importance of investing in
training and education, especially in the current
era of
big data
and the new emerging data
science (see, for example, Oliver, 2021).
Fourth, an important part of the definition of data
is that data are
generated, recorded, stored and
transmitted so that – unless artificial barriers are
put in place to prevent it – they can be accessed
repeatedly and by different users at the same
time at little or no additional cost to the owner
of the data. This is because,
when data are
used, they continue to exist and to be available
and useful
. They are neither appropriated, nor
consumed. Hence, if we want to ensure their
efficient use, there are strong arguments for
promoting as open access as possible to any
set of existing relevant data. As the issue of
open access to data may be controversial, and
in light of the ever increasing amount of data
being generated and held by private entities
and the growth of markets for data, we devote a
specific section to discuss this topic in Chapter
5, where, we note how the generation of data
has outpaced the consolidation of relevant moral
and ethical considerations and their reflections
in appropriate national and international legal
arrangements.
DEFINING ANALYSIS TOOLS
Another potentially ambiguous expression used
throughout the report is
analysis tool. In the context
of this report, it is interpreted quite generally as:
6 For an enlightening discussion on the incorrect interpretations
of counts, indexes and scores as measures in human and social
sciences, see Wright, 1999.
[
11
1
SETTING THE STAGE
A set of formal rules
7
used to guide the
processing of available data, aimed at
obtaining analytic results for a specific
purpose or research question
.
Several aspects in this definition of analysis
tool warrant discussion. First, by stressing
that analysis is conducted on existing data,
we implicitly distinguish
data analysis from
data generation in a conceptual data cycle. We
recognize that the results of an analysis are
often, and usefully, stored and remain available
in the form of new data, so that they can be
used for further and different analyses. We
also explicitly recognize that, in some cases,
existing data may be perceived insufficient
to address the problem at hand, and may
therefore lead to a call for generating new
data. Nevertheless, it is useful to distinguish
the two steps from a conceptual point of view,
as – especially in the era of
big data
– roles
and responsibilities for data collection, curation
and dissemination are very often distinct from
roles and responsibilities in the use of data
for evidence-based action. The latter entails
decisions regarding which data to use to inform
actions aimed at addressing a specific problem
and how to analyse such data. These decisions
can be made by agents who have had no direct
involvement in the collection of
primary data
.
This leads to another aspect highlighted in
the definition above, namely that effective
analysis tools are
specific, in the sense that
they must be properly designed to respond to
well-defined questions. While general analytic
methods and specific techniques for data
treatment exist (say, for example, ordinary
least square methods to estimate parameters
of a linear regression, used in the context of
an econometric analysis, or pile-sort methods
to collect and highlight associations in data
collected in the context of an anthropological
study) and are necessary components of any
7 Rules encompass procedures and techniques belonging to
different methods of inquiry, both quantitative and qualitative, as
appropriate, depending on the nature of the data and the objective of
the analysis.
analytic tool, these should never be confused
with the analytic tool itself. Insisting on the
need for specificity of the analytic tool should
encourage analysts to carefully consider the
problem at hand and select the kind of data
needed to answer the question, choosing
the most appropriate combination of analytic
methods and techniques for data treatment,
and – very importantly – present and discuss
the various assumptions made in setting up the
analytic model. Unfortunately, we have found
there to be a discouraging paucity of examples
of good analysis tools specific to food security
and nutrition, despite a relative abundance of
data and of qualitative and quantitative analytic
methods and techniques.
The final aspect that the above definition
emphasizes is that the rules that define the
analysis tool
must be formalized. That is, they
should be explicitly and clearly described in
a way that makes application of the analysis
tool replicable, consistent and susceptible to
scrutiny by reviewers.
Dostları ilə paylaş: |