|
Cloud Customer Architecture for Big Data and Analytics Version 0bulut axitekturasiData Quality Analysis:
This capability allows data quality rules (stored and maintained in the
Information Management & Governance component) to be applied to data during ingestion and
transformation, and for quality measures to be stored as metadata associated with the data sets
in the analytics environment.
Data Repositories
The Data Repositories component is a set of secure data repositories that allows data to be stored for
subsequent consumption by analytics tools and users. These repositories form the heart of the analytics
environment. The repositories within this component may vary from a single Hadoop repository or
Enterprise Data Warehouse, to multiple repositories used for different purposes by different analytical
tools. Note that operational and transactional data stores (such as OLTP, ECM, etc.) are not included in
this component. Instead they form part of the Data Sources component.
Types of data repositories include:
•
Landing Zone & Data Archive
is typically an initial location for data ingested
from source systems and where raw data is persisted for archive purposes.
Data in the Landing Zone may be of varying types and formats. It may or may
not be modeled or structured, and its quality may or may not be understood.
•
History
is a repository that stores state-change histories, log data, etc. Such
repositories are typically optimized for write operations, and are used for data
that will not normally be accessed often or with real-time response
requirements.
•
Deep & Exploratory Analytics
is the application of sophisticated data mining
and analysis techniques to yield insights from large, typically heterogeneous
data sets. Deep & Exploratory Analytics repositories are typically optimized for
low-cost storage of very large data volumes, without the need for real-time
response. These repositories are used to store shared, heterogeneous data for
use by data scientists.
•
Sand Boxes
are repositories used by individual data scientists or groups who need a temporary data
repository to experiment and do quick analyses. Sand boxes are provisioned, populated, deleted and
re-deployed more often than other data repositories.
•
Dostları ilə paylaş: |
|
|