blob: 5aa7735d28f9f0cca6c7274e1bbcdf90d1bb350e [file] [log] [blame]
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator" content="HTML Tidy, see www.w3.org" />
<meta
content="As the use of schemas grows, the need for tools to manipulate schemas grows. The new Schema Infoset Model provides a complete modeling of schemas themselves, including the concrete representations as well as the abstract relationships within a schema or a set of schemas. This article will show some of the power of this library to easily query the model of a schema for detailed information about it; we could also update the schema to fix any problems found and write the schema back out. "
name="ABSTRACT" />
<meta
content="As the use of schemas grows, the need for tools to manipulate schemas grows. The new Schema Infoset Model provides a complete modeling of schemas themselves, including the concrete representations as well as the abstract relationships within a schema or a set of schemas. This article will show some of the power of this library to easily query the model of a schema for detailed information about it; we could also update the schema to fix any problems found and write the schema back out. "
name="DESCRIPTION" />
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1" />
<meta content="Analyzing XML schemas with the Schema Infoset Model"
name="TITLE" />
<meta content="Public" name="SECURITY" />
<meta content="text/xhtml" name="FORMAT" />
<meta content="Copyright (c) 2002 by IBM Corporation"
name="COPYRIGHT" />
<meta content="us" name="IBM.COUNTRY" />
<meta content="us" name="DOCUMENTCOUNTRYCODE" />
<meta content="en" name="DOCUMENTLANGUAGECODE" />
<meta
content="[shane, curcuru, xml, schemas, analyze, analyzing, infoset, model, java, ibm, developerworks, eclipse]"
name="keywords" />
<meta http-equiv="Expires" content="0" />
<meta content="MSHTML 5.50.4616.200" name="GENERATOR" />
<title>Analyzing XML schemas with the Schema Infoset Model</title>
<link href="images/../dwtip1/dw-style-r1.css" type="text/css"
rel="stylesheet" />
</head>
<body bgcolor="#ffffff" leftmargin="2" topmargin="2"
marginwidth="2" marginheight="2">
<a id="main" name="main"></a>
<p><span class="title">Analyzing XML schemas with the Schema
Infoset Model</span></p>
<table cellspacing="0" cellpadding="0" width="168" align="right"
border="0">
<tbody>
<tr>
<td width="8"><img height="21" alt="" src="images/../dwtip1/c.gif"
width="5" /></td>
<td width="160">
<table cellspacing="0" cellpadding="0" width="160" border="0">
<tbody>
<tr>
<td width="160" bgcolor="#000000" height="1"><img height="1" alt=""
src="images/../dwtip1/c.gif" width="160" /></td>
</tr>
<tr>
<td align="middle" background="images/../dwtip1/bg-gold.gif"
height="5"><b>Contents:</b></td>
</tr>
<tr>
<td width="160" bgcolor="#666666" height="1"><img height="1" alt=""
src="images/../dwtip1/c.gif" width="160" /></td>
</tr>
<tr>
<td>
<table cellspacing="0" cellpadding="0" width="160" border="0">
<tbody>
<tr>
<td><a href="#h1">Example: Analyzing schemas</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<tr>
<td><a href="#h2">Loading schemas</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<tr>
<td><a href="#h3">Convenient schema querying</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<tr>
<td><a href="#h4">Schema components model</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<tr>
<td><a href="#h5">Your report: Types missing max/min
facets</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<tr>
<td><a href="#conclusion">Conclusion</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<tr>
<td><a href="#samplecode">Sample code</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<!--Standard links for every article-->
<tr>
<td><a href="#resources">Resources</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<tr>
<td><a href="#author1">About the author</a></td>
</tr>
<tr>
<td height="1"><img height="5" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
<tr>
<td><img height="10" alt="" src="images/../dwtip1/c.gif"
width="160" /></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<table cellspacing="0" cellpadding="0" width="160" border="0">
<tbody>
<tr>
<td width="150" bgcolor="#000000" colspan="2" height="2"><img
height="2" alt="" src="images/../dwtip1/c.gif" width="160" /></td>
</tr>
<tr>
<td width="150" bgcolor="#ffffff" colspan="2" height="2"><img
height="2" alt="" src="images/../dwtip1/c.gif" width="160" /></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p><span class="atitle2">Easily perform complex queries on your
schemas with this model</span></p>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr valign="top" align="left">
<td>
<p>Level: Intermediate</p>
</td>
</tr>
</tbody>
</table>
<p><a href="#author1">Shane Curcuru</a> (<a
href="mailto:shane_curcuru@us.ibm.com">shane_curcuru@us.ibm.com</a>)<br />
Advisory Software Engineer, IBM<br />
July 2002</p>
<blockquote>As the use of schemas grows, the need for tools to
manipulate schemas grows. The new Schema Infoset Model provides a
complete modeling of schemas themselves, including the concrete
representations as well as the abstract relationships within a
schema or a set of schemas. This article will show some of the
power of this library to easily query the model of a schema for
detailed information about it; we could also update the schema to
fix any problems found and write the schema back out.</blockquote>
<p style="font-style: italic"><b>Note:</b> This tip assumes you
have a basic knowledge of schema documents; there are a number of
links to schema documentation and a tutorial in <a
href="#resources">Resources</a>.</p>
<p>Although there are a number of parsers and tools that use
schemas to validate or analyze XML documents, tools that allow
querying and advanced manipulation of schema documents themselves
are still being built. The Schema Infoset Model (AKA
org.eclipse.xsd.*, or just "the library") provides a rich API
library that models schemas -- both their concrete representations
(perhaps in a schema.xsd file) and the abstract concepts in a
schema as defined by the specification. As anyone who has read the
schema specs knows, they're quite detailed, and this model strives
to expose all the details within any schema. This will then allow
you to efficiently manage your schema collection, and empower
higher level schema tools -- perhaps schema-aware parsers and
transformers.</p>
<table cellspacing="0" cellpadding="5" width="50%" align="right"
border="1">
<tbody>
<tr>
<td background="images/../dwtip1/bg-gold.gif"><a id="sidebar"
name="sidebar"><b>Schema Infoset Model UML diagrams</b></a><br />
<p>The library includes various UML diagrams for the actual library
classes, which gives a quick overview of the relationships and
attributes of common schema components.</p>
<p><b>Abstract Schema Component relationships</b><br />
<a href="images/../dwtip1/Relations.gif">This diagram</a> shows the
relations between Schema Infoset Components -- the abstract
relationships between schema objects as modeled in the library.
Black diamonds show strong composition or aggregation; open
diamonds show weak aggregation.</p>
<p><b>Abstract Schema Component attributes</b><br />
<a href="images/../dwtip1/Attributes.gif">This diagram</a> shows
some of the attributes of the abstract schema components as modeled
in the library, as well as part of the class hierarchy.</p>
<p><b>Schema Library class listing</b><br />
<a
href="../../../references/articles/dwtip1-scpw/sidebar-listing.html">
This listing</a> shows the core classes included in
org.eclipse.xsd.</p>
<p>These diagrams are included in the library's documentation,
including several other UML diagrams for both the abstract and
concrete class trees.</p>
</td>
</tr>
</tbody>
</table>
<p>For an interface listing of the library showing all the schema
objects modeled, please see <a href="#sidebar">Schema Infoset Model
UML diagrams</a>. The library also includes the UML diagrams used
in building the library interfaces themselves; these diagrams show
the relationships between the library objects, which very closely
mimic the concepts in the schema specifications.</p>
<p><a id="h1" name="h1"><span class="atitle2">Example: Analyzing
your schemas</span></a><br />
In this example, you'll want to check your schema for possibly
failing to specify restrictions on integer-derived types. This
could be useful for ensuring that all order quantities in purchase
orders have been bounded. Here, the schemas must be very specific,
so you want to require that all simple types that derive from
integers include both min/maxInclusive or min/maxExclusive facets.
However, if the min/maxInclusive or min/maxExclusive facets are
inherited from a type which this type derives from, that is still
sufficient.</p>
<p>While you can use XSLT or XPath to query a schema's concrete
representation in an <code>.xsd</code> file or inside some other
<code>.xml</code> content, it is much more difficult to discover
the type derivations and interrelationships that schema components
actually have. Since the Schema Infoset Model library models both
the concrete representation and the abstract concept of the schema,
it can easily be used to collect details about its components, even
when the schema may have deep type hierarchies or be defined in
multiple schema files.</p>
<p>In this simple schema, you will find some types that meet the
criteria of having max/min facets, and some that do not. (You can
find the full schema in FindTypesMissingFacets.xsd included in the
<a href="images/../dwtip1/xsdqcode.zip">zip file</a>.)</p>
<a id="code1" name="code1"><b>Listing 1. Sample schema</b></a>
<table cellspacing="0" cellpadding="5" width="100%"
bgcolor="#cccccc" border="1">
<tbody>
<tr>
<td>
<pre>
<code>&lt;xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.research.ibm.com/XML/NS/xsd"
xmlns="http://www.research.ibm.com/XML/NS/xsd"&gt;
&lt;!-- SimpleType missing both max/min facets --&gt;
&lt;xsd:simpleType name="integer-noFacets"&gt;
&lt;xsd:restriction base="xsd:integer"/&gt;
&lt;/xsd:simpleType&gt;
&lt;!-- Derived type has inherited min facet but missing max facet --&gt;
&lt;xsd:simpleType name="<span
class="boldcode">positiveInteger-inheritedMinFacet</span>"&gt;
&lt;xsd:restriction base="xsd:positiveInteger"/&gt;
&lt;/xsd:simpleType&gt;
&lt;!-- Derived type with both effective max/min facets --&gt;
&lt;xsd:simpleType name="positiveInteger-bothFacets"&gt;
&lt;xsd:restriction base="<span
class="boldcode">positiveInteger-inheritedMinFacet</span>"&gt;
&lt;xsd:maxExclusive value="100"/&gt;
&lt;/xsd:restriction&gt;
&lt;/xsd:simpleType&gt;
&lt;!-- etc... --&gt;
&lt;/xsd:schema&gt;
</code>
</pre>
</td>
</tr>
</tbody>
</table>
<p><a id="h2" name="h2"><span class="atitle2">Loading schemas into
the library</span></a><br />
The library can read and write schema objects from a variety of
sources. I'll show it using the org.eclipse.emf ResourceSet
framework to easily load sets of schemas; you can also build and
emit schemas directly from or to a DOM object that you manage
yourself. The library provides a custom <code>XSDResourceSet</code>
implementation that can intelligently and automatically load sets
of schemas related by includes, imports, and redefines. The
abstract relationship between related schemas is also modeled in
the library.</p>
<a id="code2" name="code2"><b>Listing 2. Loading a schema</b></a>
<table cellspacing="0" cellpadding="5" width="100%"
bgcolor="#cccccc" border="1">
<tbody>
<tr>
<td>
<pre>
<code>// String variable schemaURL is "FindTypesMissingFacets.xsd" or the URL to your schema
// Create a resource set and load the main schema file into it.
ResourceSet resourceSet = new ResourceSetImpl();
XSDResourceImpl xsdSchemaResource = (XSDResourceImpl)resourceSet.getResource(
URI.createDeviceURI(schemaURL), true);
// getResources() returns an iterator over all the resources, therefore, the main resource
// and those that have been included, imported, or redefined.
for (Iterator resources = resourceSet.getResources().iterator();
resources.hasNext(); /* no-op */)
{
// Return the first schema object found, which is the main schema
// loaded from the provided schemaURL
Resource resource = (Resource)resources.next();
if (resource instanceof XSDResourceImpl)
{
XSDResourceImpl xsdResource = (XSDResourceImpl)resource;
// This returns a org.eclipse.xsd.XSDSchema object
return xsdResource.getSchema();
}
}
</code>
</pre>
</td>
</tr>
</tbody>
</table>
<p><a id="h3" name="h3"><span class="atitle2">Convenient schema
querying</span></a><br />
Now that you have an <code>XSDSchema</code> object, you need to
query it to find any types that are missing max/min facets. First,
you'll use some convenient library methods to quickly find all of
its <code>simpleTypeDefinition</code>s that derive from the
built-in integer type. Since the library provides a complete model
of the abstract meaning of a schema, this turns out to be very
straightforward. You can query the <code>XSDSchema</code> for its
<code>getTypeDefinitions()</code> listing, and then filter for
<code>XSDSimpleTypeDefinition</code>s that actually inherit from
the base integer type.</p>
<a id="code3" name="code3"><b>Listing 3. Getting a list of specific
types</b></a>
<table cellspacing="0" cellpadding="5" width="100%"
bgcolor="#cccccc" border="1">
<tbody>
<tr>
<td>
<pre>
<code>// A handy convenience method quickly gets all
// typeDefinitions within the schema
List allTypes = schema.getTypeDefinitions();
ArrayList allIntegerTypes = new ArrayList();
for (Iterator iter = allTypes.iterator();
iter.hasNext(); /* no-op */)
{
XSDTypeDefinition typedef = (XSDTypeDefinition)iter.next();
// Filter out for only simpleTypes...
if ((typedef instanceof XSDSimpleTypeDefinition)
// ... and filter for built-in integer types
// Use a worker method in the very handy sample
// program org.eclipse.xsd.util. XSDSchemaQueryTools
&amp;&amp; XSDSchemaQueryTools.isTypeDerivedFrom(typedef,
schema.getSchemaForSchemaNamespace(), "integer"))
{
// The filter found one; save it and continue.
allIntegerTypes.add(typedef);
}
}
</code>
</pre>
</td>
</tr>
</tbody>
</table>
<p><a id="h4" name="h4"><span class="atitle2">The schema components
model</span></a><br />
Every component defined in the W3C schema specifications is modeled
in detail in the library. Now that you have a list of all
<code>XSDSimpleTypeDefinition</code>s that derive from an integer,
you can query this list for ones that are missing either their max
or min facets, and produce a report. Note that the library can
conveniently group the effective max/minExclusive or
max/minInclusive facets together for quick searching; it also
provides detailed access to each type, including the actual lexical
values if needed.</p>
<a id="code4" name="code4"><b>Listing 4. Querying XSDSimpleType
components</b></a>
<table cellspacing="0" cellpadding="5" width="100%"
bgcolor="#cccccc" border="1">
<tbody>
<tr>
<td>
<pre>
<code>for (Iterator iter = allIntegerTypes.iterator();
iter.hasNext(); /* no-op */)
{
XSDSimpleTypeDefinition simpleType = (XSDSimpleTypeDefinition)iter.next();
// First, exclude any UNION or LIST types, since
// the schema spec says they can't have min/max facets:
// Part 2: Datatypes in:
// <a
href="http://www.w3.org/TR/xmlschema-2/#defn-coss">'4.1.5 Constraints on Simple Type Definition Schema Components'</a>
if ((XSDVariety.LIST_LITERAL == simpleType.getVariety())
|| (XSDVariety.UNION_LITERAL == simpleType.getVariety()))
{
// Unions and lists cannot have min/max facets at all,
// so there's no need to report them
continue;
}
// Get the effective max/min facets for each type -
// this includes ones declared in this type or
// ones that are inherited, and so forth
XSDMaxFacet maxFacet = simpleType.getEffectiveMaxFacet();
XSDMinFacet minFacet = simpleType.getEffectiveMinFacet();
// If you don't have the proper ones, report the error.
if ((null == maxFacet) || (null == minFacet))
{
if (null != simpleType.getName())
{
// A component's URI in the library is effectively
// its &lt;target namespace&gt;#&lt;name&gt;
System.out.println("Schema named component: " + simpleType.getURI() );
}
else
{
// It's an anonymous type, so ask the library
// to construct a default 'alias' for it
System.out.println("Schema anonymous component: " + simpleType.getAliasURI() );
}
System.out.print(" is missing these required facets: ");
if (null == maxFacet)
{
System.out.print(" XSDMaxFacet (either inclusive or exclusive) ");
}
if (null == minFacet)
{
System.out.print(" XSDMinFacet (either inclusive or exclusive) ");
}
// You could also report on the facets this type does have like:
// if (minFacet.isExclusive) {
// System.out.println("minFacet.getValue=" + minFacet.getValue());
// }
}
}
</code>
</pre>
</td>
</tr>
</tbody>
</table>
<p><a id="h5" name="h5"><span class="atitle2">Your report: Types
missing max/min facets</span></a><br />
With just a little bit of code, you've discovered some fairly
detailed information about the schema. If you download the sample
code and run it against the provided schema file, you should see a
listing like this:</p>
<a id="code5" name="code5"><b>Listing 5. The output report</b></a>
<table cellspacing="0" cellpadding="5" width="100%"
bgcolor="#cccccc" border="1">
<tbody>
<tr>
<td>
<pre>
<code>Schema missing max/min facet report on: FindTypesMissingFacets.xsd
Schema named component: http://www.research.ibm.com/XML/NS/xsd#integer-minFacet
is missing these required facets: XSDMaxFacet (either inclusive or exclusive)
Schema named component: http://www.research.ibm.com/XML/NS/xsd#integer-noFacets
is missing these required facets: XSDMaxFacet (either inclusive or exclusive)
XSDMinFacet (either inclusive or exclusive)
Schema named component: http://www.research.ibm.com/XML/NS/xsd#positiveInteger-inheritedMinFacet
is missing these required facets: XSDMaxFacet (either inclusive or exclusive)
</code>
</pre>
</td>
</tr>
</tbody>
</table>
<p><a id="conclusion" name="conclusion"><span
class="atitle2">Conclusion</span></a><br />
Although this is a contrived example, it does show how the
library's detailed representation of a schema makes it easy to find
exactly the parts of a schema you need. The library provides setter
methods for the properties of schema components, so it is easy to
update your sample to automatically fix any found types by adding
any missing facets. And since the library models the concrete
representation of the schema as well, you can write your updated
schema back out to an <code>.xsd</code> file.</p>
<p><a id="samplecode" name="samplecode"><span
class="atitle2">Sample code</span></a><br />
A sample program, <code>XSDFindTypesMissingFacets.java</code>,
shows the example in this article. It uses a schema document
<code>FindTypesMissingFacets.xsd</code> which has a number of types
with and without max/min facets.</p>
<p>You can download the sample program and the following sample
.java files in a <a href="images/../dwtip1/xsdqcode.zip">zip
file.</a></p>
<p>Copies of several other sample .java files normally shipped with
the Schema Infoset Model are also attached. These include:</p>
<ul>
<li><code>XSDSchemaQueryTools.java</code> showcases a number of
other ways to perform advanced queries on schema objects.</li>
<li><code>XSDSchemaBuildingTools.java</code> with convenience
methods for building schemas programmatically.</li>
<li><code>XSDPrototypicalSchema.java</code> uses the library to
build the ever-popular schema <a
href="http://www.w3.org/TR/xmlschema-0/#po.xsd">primer
PurchaseOrder sample</a>.</li>
</ul>
<p><a id="resources" name="resources"><span
class="atitle2">Resources</span></a></p>
<ul>
<li><a href="images/../dwtip1/xsdqcode.zip"><img
src="images/../dwtip1/icon-zip.gif" alt="zip" border="0" /></a>
Download a zip file with the <a
href="images/../dwtip1/xsdqcode.zip">sample codes</a> for this
article, including the sample program, schema document, and other
.java files.<br />
<br />
</li>
<li>See a full <a
href="../../../references/articles/dwtip1-scpw/sidebar-listing.html">
schema library class listing</a>.<br />
<br />
</li>
<li>Read some of IBM's thoughts about <a
href="http://www-106.ibm.com/developerworks/cgi-bin/click.cgi?url=http://www.research.ibm.com/XML/schema/WD-XML-Schema-Infoset-API-Req.htm&amp;origin=x">
what makes a good schema API</a>.<br />
<br />
</li>
<li>Start with an <a
href="http://www.xml.com/pub/a/2000/11/29/schemas/part1.html">Introduction
to XML Schemas</a> by Eric van der Vlist.<br />
<br />
</li>
<li>See W3C's <a href="http://www.w3.org/XML/Schema">schema
specifications</a> (primer, datatypes, and structures).<br />
<br />
</li>
<li>Download Apache's <a
href="http://xml.apache.org/xerces2-j/">Xerces-J parser</a>, which
includes basic schema validation tools.<br />
<br />
</li>
<li>View or discuss a previous version of this article <a
href="http://www-106.ibm.com/developerworks/library/x-schemimj/index.html?loc=x">
at IBM's developerWorks</a>.<br />
<br />
</li>
</ul>
<p>This content was adapted from an article on IBM developerWorks
at <a
href="http://www.ibm.com/developerWorks/">http://www.ibm.com/developerWorks/</a>.</p>
<table cellspacing="0" cellpadding="0" width="100%" border="0">
<tbody>
<tr>
<td><a id="author1" name="author1"><span class="atitle2">About the
author</span></a><br />
Shane Curcuru has been a developer and quality engineer at Lotus
and IBM for 12 years and is a member of the Apache Software
Foundation. He has worked on such diverse projects as Lotus 1-2-3,
Lotus eSuite, Apache's Xalan-J XSLT processor, and a variety of XML
Schema tools. Questions about this article or about automated
testing can be sent to him at shane_curcuru@us.ibm.com.</td>
</tr>
</tbody>
</table>
<br clear="all" />
<img height="10" alt="" src="images/../dwtip1/c.gif" width="100"
border="0" /><br />
</body>
</html>