Versioning made easy
with W3C XML Schema and Pipelines
Henry S. Thompson
Architecture Domain
World Wide Web Consortium
Markup Technology Ltd.
20 April 2004
What is the versioning problem?
- Applications grow and change
- XML document types for applications grow and change
- So we end up with multiple versions of the schema for the document type
- Old code is hard to eradicate
- So we end up with multiple versions of the code implementating the application
Analysis of versioning scenarios
- Two dimensions of the problem:
- Active vs. passive
- Active
- Schema author prepares for versioning, with e.g. wildcards
- Passive
- No such preparation
- Engaged vs. dis-engaged
- Engaged
- Close coupling between developer and users; regular upgrades of schemas
- Dis-engaged
- Loose coupling; no schema upgrades
Versioning and Web Services
- David Orchard has articulated a detailed analysis of expected versioning
scenarios for Web Services
- The bad news
- We're in the Passive+Dis-engaged quadrant
- The good news
- Changes are likely to be additive
- How can we get a solution for W3C XML Schema
- New software?
- Changes to W3C XML Schema
Exploiting partial validation
- Key aspect of W3C XML Schema is fine-grained distributed validity information
- Every infoitem has validity information
- Two three-valued properties in the PSVI
- Validity
- valid; notKnown; invalid
- Validation attempted
- full; partial; none
- In the case of additive versioning, we get a PSVI like this
- It looks like we need to think about a tool that could walk the PSVI
and look for valid sub-sequences and re-decorate as it goes
Exploiting partial validation, cont'd
- We could do that
- but it would be wrong :-)
- It would have to recapitulate almost all the semantics of a schema validator
- Lightbulb! Why not use a validator itself?
- If we could just get rid of those notKnown nodes and revalidate, that would be good
- So we need a three-step pipeline
- Validation
- Surgery
- Validation
- Fortunately, Markup Technology have developed a pipeline authoring and
execution tool
- So it was easy to check this out
Introducing MT Pipeline
- The lack of a
coherent XML processing model to support decomposition of complex XML
processing tasks represents a serious bottleneck
- for enterprise use of
XML in general
- for Web Services in particular
- All that's needed is support for the basic tool in the
architect's armoury: Divide and Conquer
- In other words -- XML Pipelines
- Configurations of basic XML processing steps
- Some steps are relatively heavy
- XSLT-based transformation
- W3C XML Schema-based validation
- Others can be much simpler
- XPath-based extraction
- One-for-one renaming
What is missing?
- A standard for pipeline specification?
- Interop matters here just like everywhere else
- High performance?
- Pipelines need to be fast to be attractive
- We have a candidate standard: the Sun XML Pipeline W3C Note is a good starting point
- Published by W3C in February of 2002
- Edited by Eve Maler and Norm Walsh
- Many co-submitters, including Markup Technology
The Sun Pipeline design
- An XML document type for describing pipelines
- A pipeline is a sequence of steps, with specified input(s), output(s)
and parameters
- The processing required to perform a step is named, not defined in detail
- Dependency-driven, in the mode of
make
and ant
Re-interpreting the Sun Pipeline Note
- We have re-interpreted the proposed document type
- Removing some limitations
- Enabling more efficient implementation
- We interpret it as
simply specifying a configuration of operations on XML-encoded
information
- Without dependency-driven semantics
- Allows intermediate results to be
passed between components without serialisation
- So we think of pipelines more like shell scripts
- Mapping externally specified inputs to outputs
- Facilitates deploying pipelines
- For example in servers where they can then operate on
message-derived input to produce message-delivered output
- And we have a highly optimised implementation
Using MT Pipeline for V2S
- We're looking for a way to Validate twice
- Fortunately, MTPL already supports
- W3C XML Schema validation, with
full PSVI output in the pipeline
- Surgery, that is, XPath-based elimination of elements and/or attributes
from an infoset
- XPath extension functions to access the PSVI
- So we can just build the pipe and try it
Schema design issues
- Will this always work?
- Alas, no. Several pre-conditions:
- Schema validators are not required to keep going after finding an error
- But most (all?) of them actually do
- In order for REC's error recovery strategy to work, only top-level
element declarations can be used
- Because once the content model is blown, we don't know which local
declarations to use
- [Although some validators use them anyway]
- Will it always make sense?
- Likewise no. Syntactic additions are not always semantically clean
The additive assumption
- David Orchard points to the success of HTML's "must ignore" strategy wrt
unknown markup
- But Michael Sperberg-McQueen points out this is not the same as what's
required for David's own examples
- HTML just ignores the tags it doesn't understand
- But it processes the content
- Our story needs to ignore the whole unknown subtree
- And additional elements are not necessarily purely additive in meaning
- Ignoring
nad:country
can get you in trouble:
Changing W3C XML Schema
- Lots of people use local element declarations
- They shouldn't be disenfranchised
- The W3C XML Schema WG was already looking at a change behind the scenes
- Re-interpreting local element declarations as just that:
- Declarations scoped to their enclosing type definition
- Conceptually, all content model particles would then be
references to declarations by name
- Such a change would make local element declarations available for wildcards
- So this change would allow local declarations to work with V2S
Conclusions
- W3C XML Schema's partial validation is powerful
- Pipelines are cool
- Watch for free MT Pipeline beta
- E-mail me ([email protected]) if you are interested in participating in an alpha programme