Dune cdr the Single-Phase Protodune

Yüklə 4,82 Kb.

Pdf görüntüsü

səhifə	46/55
tarix	24.12.2017
ölçüsü	4,82 Kb.
	#17820

1 ... 42 43 44 45 46 47 48 49 ... 55

5.2.2 Raw data ﬂow

Chapter 5: Software and Computing
5–129
5.2
Data storage and management system
5.2.1
Data characteristics
It is the TPC data that drives the requirements for the raw ProtoDUNE-SP data. The ProtoDUNE-
SP Data Scenarios
spreadsheet [6] provides details on these numbers and a few alternative running
conditions. Table 5.1 summarizes the nominal estimates.
Table 5.1: Estimates of nominal raw data parameters driving the design for the raw data storage and
management.
Parameter
estimate
In-spill trigger rate
25 Hz
Avg. trigger rate
10 Hz
Channels
15,360
Readout time
5 ms
Compression
4×
Compressed event
60 MByte
Instantaneous rate
1.5 GByte/sec
Average rate
600 MByte/sec
Total triggers
52 M
Total volume
3 PB
The average trigger rate over the entire beam cycle assumes that one out-of-spill trigger from the
Cosmic Ray Trigger (CRT) system (Section 6.4) is acquired for every in-spill trigger due to the
beam. The assumed compression factor can be achieved even with similar levels of excess noise
as experienced in the ﬁrst year of MicroBooNE running [24]. If no excess noise is experienced, as
expected, then a compression factor of 6 – 8 is expected.
5.2.2
Raw data ﬂow
A conceptual diagram of the raw data ﬂow is presented in Figure 5.1. It reﬂects the central role of
the CERN storage service EOS in the raw data management scheme. Long-term experience has
been gained by the LHC experiments, and EOS has proven to be performant and reliable. EOS
serves as the staging area from which the data are committed to CASTOR (a hierarchical storage
management system developed at CERN) and from which data are transmitted to a number of
endpoints including principal data centers such as Fermilab and others. This scheme mirrors that
used by the LHC experiments for their much larger data samples. It is also used to provide input
to DQM and will be available for personal ad-hoc analyses.
ProtoDUNE Single-Phase Technical Design Report

Chapter 5: Software and Computing
5–130
Figure 5.1: Conceptual diagram of the ﬂow of raw data in ProtoDUNE-SP
5.3
Prompt Processing for Data Quality Monitoring
As described in Section 2.9.10, the ﬁrst point at which data quality monitoring occurs is directly
inside the DAQ Online Monitoring (OM). The DAQ computing cluster hardware is relatively high
performance and has access to the full, high-rate data stream. As such it is ideal for monitoring
algorithms which require small amounts of CPU and a large fraction of the data.
On the other end of this spectrum, some monitoring algorithms have large CPU requirements but
produce meaningful feedback on relatively little data. Running these algorithms on commodity
cluster hardware is more cost eﬀective. To manage these jobs, a special purpose system called the
“protoDUNE prompt processing system” (p3s) is developed. Unlike traditional batch systems it
limits maximum latency to provide results at the cost of 100% data throughput. The p3s is portable
to many native batch systems and is fully user-conﬁgurable. It can run multiple independent sets
of multiple interdependent jobs, and run them in parallel – to the extent allowed by available CPU
resources and while satisfying dependency information.
The prompt processing is expected to sample about 1% of the most immediate data just after it
is saved by the DAQ and is available to the hardware on which it runs. An initial estimate ﬁnds
that at least 300 dedicated cores will be required to achieve this. This estimate must be reﬁned as
a comprehensive list of monitoring algorithms is developed.
The prompt jobs are developed in the familiar form of oﬄine software modules to LArSoft (Sec-
ProtoDUNE Single-Phase Technical Design Report

Chapter 5: Software and Computing
5–131
tion 5.5). The processing is expected to include algorithms from signal processing through to full
reconstruction. Due to the sharing of the underlying art framework in both prompt processing
jobs and DAQ OM modules, it will be easy to migrate algorithms between the two contexts in the
case of computer-hardware resource constraints.
5.4
Production processing
The second major user of the raw data is the production processing. It will make several passes
of 100% of the raw data over time as algorithms improve. It consists principally of event re-
construction, which feeds into user analysis, and it may involve a data-reduction step prior to
reconstruction. A data-reduction scheme has been developed [24]; its implementation in the data
production processing chain is contingent on having available resources.
Starting with the signal-ROI there are two basic approaches to reconstruction, which are described
in the following sections. The ﬁrst starts with ﬁtting multiple Gaussian distributions to the wave-
forms and the second to retaining their binned structure.
5.5
The LArSoft framework for simulation and reconstruction
LArSoft [25] is a suite of tools for simulating and reconstructing data collected from LArTPC
detectors. It is built on the art [26] event-processing framework. The main features of the art
framework are its conﬁgurability by human-readable and editable control ﬁles (that use the Fer-
milab Hierarchical Control Language (FHiCL)), and the scheduling of program module execution.
The modules are of ﬁve types: event sources, ﬁlters, data-product producers, analyzers, and output.
Common utilities that can be accessed by any program module at any time are called services.
The art framework deﬁnes the input/output structure of ROOT-formatted ﬁles using TTrees to
store the data, metadata, and provenance information. The provenance information consists of the
contents of the FHiCL documents used to steer the processing of the job that created the output
ﬁle, and those of input ﬁles and parents.
The art framework’s division of the simulation and reconstruction jobs into modular pieces allows
multiple developers to contribute to an eﬀort, and to test their ideas in isolation before integrating
them into a larger system. Because the data read in from an event is placed in read-only memory,
analyzers can program with conﬁdence that upstream algorithms cannot alter the data, but must
produce additional data products which can later be processed or written out.
The LArSoft suite provides the interface to the event generators and Geant4 [27] for simulation
of the passage of particles through the detector, the details of which are described in Section 5.6,
and event reconstruction, the details of which are presented in Section 5.7.
The art framework and LArSoft source code are publicly available, and pre-built versions are
ProtoDUNE Single-Phase Technical Design Report

Yüklə 4,82 Kb.

Dostları ilə paylaş:

1 ... 42 43 44 45 46 47 48 49 ... 55