Executive Summary


Semantic Effects of Reshaping



Yüklə 464 Kb.
səhifə13/13
tarix30.10.2018
ölçüsü464 Kb.
#76220
1   ...   5   6   7   8   9   10   11   12   13

Semantic Effects of Reshaping


If only the reshaping steps allowed by the menus are done, then the XUM will be semantically equivalent to the restriction of the RMIM, in the following sense:

  • Any valid XUM instance can be automatically translated to an instance of the restricted RMIM (which is also an instance of the full RMIM)

  • Any valid instance of the restricted RMIM can be automatically translated into and instance of the XUM.

A XUM instance is converted to an RMIM instance, or vice versa, by applying the reshaping changes to the instance.


Collapsing an association with min & max cardinality [1..1] loses no information at all (provided property and association names are altered to avoid clashes).
If, during the reshaping, any associations with cardinality [0..1] have been collapsed, then:

  • In some cases, translation of a XUM instance to an RMIM instance may be indeterminate, in that the RMIM instance may or may not contain an instance of some class (which has no property values or further associations, so does not carry any property value information)

  • Valid XUM instances must obey some constraints between the cardinality of different properties in different parts of the XUM tree.

These issues are fairly straightforward, and I will do a bit of mathematical analysis of them in a later paper.





    1. Message Translation from the Reshaping Mappings


Using the tables of mappings (between RMIM and XUM) output by the reshaping tool, and other tools not described in this note, the following are now possible and have been done:


  1. Using a small runtime translation engine, an XML message instance based on the XUM can be translated to a message instance based on the current XML ITS (for instance, adding fixed values); or messages can be translated in the reverse direction. If the two are done in series, the round trip does not alter information.

  2. The same two translations can be done by generated XSLT.

  3. A small object-based query tool can be used to query a XUM message instance and an RMIM XML instance side-by-side in terms of the same (XUM) object model, to check that they contain the same information, or to highlight any discrepancies

  4. The V2-V3 mapping tool (described elsewhere) can be used to map a V2 message onto the XUM rather than (as before) an RMIM. These mappings are generally simpler to make, and can be used for automatic translation between V2, the XUM and the RMIM, with (expected, not yet tested) round-trip consistency.

In general, any XUM can be used either as a basis for messages on the wire, or just as a simpler model to map onto, when producing full V3 messages to go on the wire. In the second case, XUM instances need never go public, and the XUM can be purely an implementation tool. There has been much discussion about which of these ways XUMs should be used. The tools are agnostic.



    1. Node Counts


As a basic measure of the complexity of a message definition (note:not an instance), the tool can count the number of distinct XPaths allowed in al message instaces. To stop this number being infinite, the count stops at any node nested inside itself, and counts repeated subtrees only once (as is implicit in the definition of distinct XPaths).
Two kinds of node counts are possible:

  1. Down to leaves of the XML tree

  2. Down to the roots of V3 data type subtrees.

The tool can output theses node counts for any subtree of either the XUM or RMIM definition.


For RMIMs, the count to leaves (a) is usually about 70 times larger than the count (b) down to V3 data type trees – because there are a few big data types like IVL_TS wih about 500 leaves which occur with sufficient frequency to boost the count. But even the node counts down to V3 data type tress are very large, e.g 65,000 for the Lab ‘Observation Request’ message.
The possible significance of node counts (or lack of it) has been debated on the list. The big numbers are of course built up by multiplication and repetition (e.g. of CMETs) and it is argued that once you understand this, the numbers are of little concern.
I believe the very big node counts may be a concern for people who are not HL7 insiders and want to map their comparatively simple messages and APIs onto such large sets of possible nodes. Restricting and reshaping in combination can greatly reduce the node counts, and so may help these people understand messages and map to them more accurately.
It is clear from using the tool that reshaping on its own (without also restricting an RMIM) does not greatly reduce the node count. For instance, collapsing one association will reduce the node count by only 1 (or by a small number, if you remove InfrastructureRoot attributes at the same time).
    1. Examples


These examples are not intended to be correct HL7 (e.g. the codes are all wrong), but only to illustrate the use of the tool. They are not intended to be ‘good’ reshaping, whatever that is.
A very simple example message for PA ‘New Person’, as output by the reshaping tool from instance values entered by me, is shown in XUM form below:









The same message, output by the tool from the same entered data, is shown in XML ITS form:












Brownlow

Adelie









Herts

Hitchin


HI4 D6B


101 Ranelagh Gardens











Brownlow

George











Cambs

Cambridge


CB8 2QD


23 New Street






The sizes of the instances differ by about a factor 3. The automatic translations will round-trip between these two forms.
    1. Possible Future Work

The RMIM reshaping tool has been developed as a proof of concept and lacks a number of features needed for production use. A few are mentioned below:




  1. Control of reshaping: Currently all reshaping operations are controlled interactively by the user, mainly through menu selections. There is a need to drive the tool non-interactively, through annotations in a MIF file. For instance, an HL7 technical committee might develop MIFs and while doing so, annotate the ways in which they are to be reshaped to make XUMs.

  2. Following Versions of RMIMs: as an RMIM goes through successive versions, it would be useful for any derived XUM to follow it and be changed incrementally, rather than having to be re-made from scratch. Driving the tool by MIF annotations might help in this respect; but other enhancements may be needed.

  3. Modular Reshaping, within model boundaries: It would be useful, for instance, to be able to produce XUMs for CMETs and use them when reshaping any RMIM that uses the CMET. This would not be difficult. The tool can now be used to reshape a CMET. Basically all that would be needed is a way to import these reshaping tables into the reshaping mappings for any RMIM which uses it. There might be other complications, e.g abouty choices at the top of a CMET.

  4. Output XUM XML Schemas: Doing a basic job would not be difficult.

  5. Port the reshaping tool into Eclipse: An initial job would not be difficult, followed by better integration with other Eclipse HL7 tools.

  6. Simpler generation of XSLT for message translation: the methods I currently use (with other tools) to generate XSLT for XUM-RMIM translation are a bit of an overkill; I think it could be done more directly within the tool itself.

  7. XMI output of the XUM class model: As a bridge to UML tools; not yet done, but not hard (if we could agree which variant of XMI).

  8. Usability Testing: By this I refer not to usability of the tool (it might in the end be non-interactive, driven by annotations) but to usability of the output XUMs, for mapping and building messages. We could for instance (a) map a V2 message to an RMIM, (b) map the same V2 message onto a XUM, and see how much easier (b) is than (a). We could do this for various different approaches to XUM-building (e.g. at the discretion of an implementor with any restrictions he wants on the RMIM, defined by a TC with fewer restrictions and wider application, defined by rules, etc.)

  9. Translation Testing: We could do some serious reshaping of large messages, generate serious message instances, and try out the translations (XSLT and run-time engine) on these instances – to test performance, accuracy and round-trip consistency.


  1. Using Fixed and Default Values in Instances


Whether fixed and default values need to be present in an instance (to support semantic processing) is currently under debate.
A significant portion of the current V3 message size represents fixed values in message definition. Removing the fixed values (which is currently an accepted V3 practice, but not standing NHS practice) will significantly reduce the message size and therefore improve performance in a number of different respects.
The consequence of this, however, is that a message processing application must know the message definition (either by implicit hard coded knowledge or some form of consultation) in order to interpret the data and this can interfere with throughput performance and/or require more development.
Some applications are specifically developed with hard coded knowledge, and removal of fixed and default values from the instance makes sense for implementers. Other applications are developed to process the instances in a generic fashion, where processing is performed on the basis of RIM semantics observed in the instance.
For such generic (RIM-based) message processors, requiring access to a definition can slow down processing significantly; for example, one implementer performing generic processing will be unable to achieve their NHS service level agreements to process 2000 messages per second. (This is, however, implementation dependent – based on using PSVI to enable the generic message processing. An alternate implementation could actually speed up the processing because the definitions are pre-parsed, the instances are smaller, and merging the two may be quite quick if resources are spent preparing this implementation.)
For these applications, should fixed values be removed from message instances, some other approach needs to be found to allow for efficient semantic processing. One approach could be to stabilise XML tag names so that they do not change (if their definitions do not change) across versions of a message or across related messages in the same release (i.e. allowing name-based semantic processing). For at least one domain (e-prescribing), the NHS has found it necessary to ‘force’ name stability across versions of the same message to support this implementation approach. Such stabilising across versions is much simpler than cross-domain name-fixing for semantic processing. In fact, the names are fairly fixed already – calculated from some of the fixed values, so we can see that this is not a solution to the problem, and also not all the fixed values that really matter are (or can be) in the name.
We note that the software requirement for generic (i.e. multiple message type) processing depends on the domain. For example, in some highly transactional administrative domains such as scheduling, a requirement for generic processing doesn't often apply because message data is isolated from other domains, is relatively simple and is completely consumed in processing. Generic (i.e. not specific to message type) semantic processing is more likely to be needed for applications that handle multiple types of messages (e.g. clinical) and access / re-use data structures over a relatively long period of time. Some clinical data models are shared between different domains, though it is not clear whether it is a bad idea or not to have different wire formats for the same data model in different domains.
The current V3 approach is to standardise a wire format that supports the more complex (generic) semantic processing requirement, but not to put in fixed or default values, so making generic processing impossible without consulting the definitions. This results in an anomaly in some situations, where a simple transaction requires the addition of some fixed values to instances (to conform to the standard) which are not used in the semantic processing of the message at either end of the exchange (for example, in NHS Choose & Book messaging). Whether the extra processing associated with the fixed and default values in or out of the instance is justified depends on a number of factors and a balanced decision must be made, as it is recognised that some interests are competing in this respect.
There is an interaction between this proposal (or lack of one!) and flattening the model. If the model is flattened, then the generic message processing is very difficult. If we cannot remove fixed values, can we still flatten messages?
Fixing and using business names has a relationship to this general issue, as well. For instance, if an act is called Weight, the act.code might appear to be unnecessary. Should a message processor, if searching for weight, look for the name Weight or the code value "xyz"? If it searched for weight as an act name, another weight put in an act called "Finding" would be missed.
Even if the proposal to use business names was extended with a proposal to standardise the names at some level, it is difficult to see how acts can usefully have highly meaningful names that are anything other than a very general guide when human viewing or when parsing. Sometimes a false sense of security is dangerous. If implementers know that a certain code could be anywhere, xpaths may be complicated, but the value should always be found – but even this is very dangerous when the codes come from multi-hierarchical terminologies. If implementers are fooled into thinking that a value may always be found via the act name, the learning curve may be shorter, but it would be wrong.
So, if we cannot remove fixed values, we cannot use business names either – although the association names are basically irrelevant in a semantic processing world, the attributes names need to be fixed and the association names are the things that are closest to business names already.
We recognise that we cannot remove fixed and default at this time across all the messages, as no other approach allows for processing based on the Reference Model, which is the explicit intent of the HL7 V3 approach, and also already adopted by a number of the NHS implementers.

However even in semantic processing, there are some fixed values that are not significant. In addition, as stated above, there are domains or applications where processing based on the reference model seems less appropriate. This invites us to explore our options, and several of the next few proposals explore issues related to this.


Benefits from removing fixed and default values

Putting fixed and default values into the instance will allow applications to process instance content based only knowledge of the RIM, data types and terminologies, rather than having to know the myriad models that are part of HL7 and other realms, or pay the performance/implementation price of consulting the definitions as instances are processed.


Cost / Risk of removing fixed and default values

The cost is bigger messages. The messages are significantly bigger.


Performance after removing fixed and default values

It very much depends on context. If the implementer is processing the instances based on particular models, fixed and default values harm performance. If the implementer is processing in a more generic manner based on the reference model etc, then performance is [probably] supported by the presence of these attributes.


Usability after removing fixed and default values

Very much the same as performance – it depends on context. If the implementer is processing the instances based on particular message models, fixed and default values harm usability. If the implementer is processing in a more generic manner based on the reference model etc, then usability is supported by the presence of these attributes.







Yüklə 464 Kb.

Dostları ilə paylaş:
1   ...   5   6   7   8   9   10   11   12   13




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə