| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <html> |
| <head> |
| <title>Central content type catalog for Eclipse</title> |
| <link rel="stylesheet" href="../default_style.css" type="text/css"> |
| </head> |
| <body text="#000000" bgcolor="#ffffff"> |
| <h1>A central content type catalog for Eclipse</h1> |
| <p><font size="-1">Last modified: Feb 16, 2005</font></p> |
| <p><font size="-1"><strong>Note: this document has been updated/extended to depict |
| the solution actually implemented in the 3.0 release and changes made during |
| the 3.1 cycle. The <a href="http://dev.eclipse.org/viewcvs/index.cgi/%7Echeckout%7E/platform-core-home/documents/content_types.html?rev=1.4"> |
| original proposal |
| </a> written during the 3.0 development cycle, as well as <a href="http://dev.eclipse.org/viewcvs/index.cgi/platform-core-home/documents/content_types.html"> |
| all versions of this document</a> are also available</strong> (<img src="../images/changed.gif" width="12" height="12"> |
| marks interesting changes since the original proposal).</font></p> |
| <p><cite><strong>Plan item description:</strong> Content-type-based editor lookup. |
| The choice of editor is currently based on file name pattern. This is not very |
| flexible, and breaks down when fundamentally different types of content are |
| found in files with undistinguished file names or internal formats. For example, |
| many different models with specialized editors get stored in XML format files |
| named *.xml. Eclipse should support a notion of content type for files and resources, |
| and use these to drive decisions like which editor to use. This feature would |
| also be used by team providers when doing comparisons based on file type. The |
| several existing file-type registries in Eclipse should be consolidated. [Platform |
| Core, Platform UI] [Theme: User experience] (bug <a |
| href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>, <a |
| href="http://dev.eclipse.org/bugs/show_bug.cgi?id=51791">51791</a>, <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52784">52784</a>) |
| </cite></p> |
| <p>This plan item is about two important features: </p> |
| <ol> |
| <li>a single content-type repository to be provided by Eclipse on top of which |
| content-type related features provided by any plugins could be built upon, and</li> |
| <li>a mechanism for automatically determining the content type given a file name and/or its |
| contents.</li> |
| </ol> |
| <h2>Driving forces</h2> |
| <ul> |
| <li>the catalog must be extensible: plug-ins must be able to contribute new |
| content types;</li> |
| <li>content types must have an identity, a unique identifier by which they can |
| be unambiguously retrieved from the catalog;</li> |
| <li>content types should be hierarchical: new content types are very often specializations |
| of existing ones (example, Ant Scripts and Plugin manifests are kinds of XML |
| documents, XML is a kind of text document), so it should be possible for new |
| content types to inherit interesting properties from existing ones (see <a href="#FAQ-hierarchy">FAQ</a>);</li> |
| <li>content types have either a predominantly binary or text nature;</li> |
| <li>content types are associated with specific file names/extensions;</li> |
| <li>some level of automatic content/name based type discovery must be provided;</li> |
| <li>existing plug-in-specific content type registry could be replaced/built |
| upon the central catalog;</li> |
| <li>encoding determination is strongly related concern and should be taken into |
| consideration when sketching a solution.</li> |
| </ul> |
| <h3>On automatic content type detection</h3> |
| <p>Content types determine many properties and actions related to files such as |
| encoding, associated editors, etc. Automatic content type determination allows |
| content type specific actions without requiring the user to manually define |
| the content type for a given file. Content type detection is based on:</p> |
| <ul> |
| <li>file selection specifications</li> |
| <li>file contents</li> |
| </ul> |
| <p>Content type determination based on file name/extension ("file selection |
| specs") is the easiest one to compute. Each content type has a set of file |
| selection specs associated to it. Determining the content type corresponding |
| to a file selection spec is done by a simple lookup on the catalog. </p> |
| <p>Content type determination based on file contents is more complex, and requires |
| examining the contents. Since we are talking about an open set of possible content |
| types, this examination implies in delegation to content type detectors contributed |
| by other plug-ins (content describers).</p> |
| <h2>Solution</h2> |
| <h3>The proposed API</h3> |
| <p>The proposed API contains 4 new interfaces in a new package called <code>org.eclipse.core.runtime.content</code>:</p> |
| <ul> |
| <li><code><a href="#IContentType">org.eclipse.core.runtime.content.IContentType</a></code></li> |
| <li><code><a href="#IContentTypeManager">org.eclipse.core.runtime.content.IContentTypeManager</a></code></li> |
| <li><code><a href="#IContentDescription">org.eclipse.core.runtime.content.IContentDescription</a></code></li> |
| <li><code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code></li> |
| </ul> |
| <p>Following is a brief description for each of them. </p> |
| <h4><code><a name="IContentType"></a>org.eclipse.core.runtime.content.IContentType</code></h4> |
| <p>Represents a content type in the platform. <code>IContentType</code> instances |
| are provided by the platform, built from extensions to the <code>org.eclipse.core.runtime.contentTypes</code> |
| extension point. Relevant properties for <code>IContentType</code> are:</p> |
| <ul> |
| <li>unique id (example: org.eclipse.core.runtime.xml), which is based on the |
| plug-in's unique identifier (the registry namespace);</li> |
| <li>user-friendly name (example: Text, or XML document, or ZIP file, );</li> |
| <li>file selection spec - comma-separated lists of associated file names and |
| extensions;</li> |
| <li>default charset (example: ISO-8859-1, for Java properties files);</li> |
| <li>content describer (see <code><a href="#IContentDescriber">org.eclipse.core.runtime.content.IContentDescriber</a></code>), |
| a class that knows how to recognize if a given stream of bytes contains compatible |
| to the content type, and how to extract other content-type specific information |
| from the stream.</li> |
| </ul> |
| <p>Also, <code>IContentType</code> provides methods that check whether the given |
| file name is matched by this content type file selection spec, or whether a |
| content type is a subtype of another content type.</p> |
| <h4><a name="IContentTypeManager"></a><code>org.eclipse.core.runtime.content.IContentTypeManager</code></h4> |
| <p>Represents the content type registry. Provides methods for obtaining the content |
| type associated to a file name, and for discovering the corresponding content |
| type for a stream of bytes. <code>IContentTypeManager</code> allows clients |
| to:</p> |
| <ul> |
| <li>retrieve the content type for a given id;</li> |
| <li>retrieve a set of content types associated to a given file name;</li> |
| <li>discover which content types recognize a given stream as a valid sample |
| for the corresponding file format;</li> |
| <li>obtain a description for a stream of bytes, including platform (such as |
| encoding) and custom (content type specific) properties.</li> |
| </ul> |
| <h4><code><a name="IContentDescriber"></a>org.eclipse.core.runtime.content.IContentDescriber</code></h4> |
| <p>Content-based content type detection and content description rely on specialized |
| content detectors associated to content types. When a content type is contributed |
| to the platform, a content describer class may be provided. Content describers |
| are able to detect if a given stream of bytes is conformant to the content type |
| file format, and may also be able to extract important properties from the contents, |
| such as what charset was used to encode the contents (for text files), and any |
| content type specific information that may be required.</p> |
| <p>The main method in <code>IContentDescriber</code> is:</p> |
| <p><code>int describe(InputStream contents, IContentDescription description, QualifiedName[] |
| options) throws IOException;</code></p> |
| <p>The first thing implementations for this method must do is to check if the |
| contents represent a valid sample for their corresponding content type file |
| format. If not (or if cannot be determined), this method should exit immediately, |
| returning <code>IContentDescription.INVALID</code> or <code>IContentDescription.INDETERMINATE</code>, |
| depending on how strict the file format is. Otherwise, this method should return |
| <code>IContentDescription.VALID</code>, but only after trying to provide all |
| required information (according to the specified options, if any) by reading |
| the contents and filling the <a href="#IContentDescription">content description</a> |
| provided.</p> |
| <p><strong><em>Note</em></strong><em>: it is essential that for this mechanism |
| to work in a suitable manner, the execution of content describers by the platform |
| should not cause the activation of the plugins providing them. In the Eclipse |
| 3.0 runtime, plug-ins that have built-in bundle manifests will be able to selectively |
| enable/disable auto-activation on a per-package basis (for more information, |
| see <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=52393">bug 52393</a>). |
| Although this will not be enforced by the platform, content describers <strong>must</strong> |
| be self-contained and not trigger auto-activation.</em></p> |
| <h4><code><a name="IContentDescription"></a>org.eclipse.core.runtime.content.IContentDescription</code></h4> |
| <p>Content descriptions are obtained by calling <code>IContentTypeManager#getDescriptionFor</code> |
| method. A content description contains interesting information (such as encoding) |
| about an arbitrary stream of bytes. These information are filled partially by |
| the platform and partially by the content describer elected (if any).</p> |
| <h3>Conflict resolution</h3> |
| <p>Content types are managed by the platform but plug-ins are in charge of contributing |
| content types. While this provides good flexibility, it also opens oportunities |
| for conflicts. There are a few scenarios where conflicts may arise:</p> |
| <ol> |
| <li><em>two content types provided by independently developed plug-ins intended |
| for the same file selection specification</em> - this may happen for popular |
| file types that are not provided by the platform. In this case, one of the |
| content types will be ellected arbitrarily. User intervention is required |
| (by disabling one of the content types, for instance) if automatic determination |
| does not choose the preferred content type If multiple content types can be |
| returned by the API, both will appear in the list.<img src="../images/changed.gif" width="12" height="12"> (see |
| bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=83004">83004</a>) <font size="-1"> |
| </font></li> |
| <li><em>two related (but different) content types that share a same file name/extension |
| specification</em> - think of general XML documents and Ant build scripts |
| (a sub-type of XML document, inheriting its file selection specifications). |
| This is different than the previous case here at most one content type will |
| be contributing file specs (the more basic type). In this scenario, for APIs |
| that return content-types based exclusively on file name, the ancestor will |
| be appear first in the returned array, since it is more general. Note that |
| when a general type specifies a file extension for it to be associated with, |
| and a subtype specifies a file name that has the same file extension, the |
| more specific type will appear before the general one. For APIs that do content-based |
| analysis, if both content type describers deem the contents as valid, the |
| more specific content type will also appear first. For two sibling content |
| types that deem a same set of bytes as valid, an arbitrary order will be chosen.</li> |
| <li><em>two completely unrelated content types that share a same file name/extension |
| specification</em> - this is the more unlikely scenario (imagine an image |
| file format and a word-processor file format sharing the same file extension). |
| In this case, one of the content types will be ellected arbitrarily. User |
| intervention is required (by disabling one of the content types, for instance) |
| if automatic determination does not choose the preferred content type.<img src="../images/changed.gif" width="12" height="12"> (see |
| bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=83004">83004</a>) <font size="-1"></li> |
| </ol> |
| <h3>Frequently Asked Questions</h3> |
| <ol> |
| <li> How will plugin providers benefit from a central content type catalog? |
| <p> Generally by using the same content type registry and sharing the same |
| concept of content/file type. Other examples are: |
| <ul> |
| <li>a builder could use its well-known content type to filter out files |
| whose names don't match with the content type file section spec;</li> |
| <li>the user interface could know what editors to offer for a given file |
| selected (associations between editors and content types should be kept |
| separately from the content type catalog);</li> |
| </ul></p> |
| </li> |
| <li><a name="FAQ-hierarchy"></a>Why are content types hierarchical? |
| <p>To allow important properties to be inherited by new specialized content |
| types: |
| <ul> |
| <li>the default charset</li> |
| <li>text/binary nature</li> |
| <li>content description</li> |
| <li>associations defined externally by plug-ins (for instance, any editors |
| associated with an ancestor should work with any descendants)</li> |
| </ul></p> |
| </li> |
| <li> What happens if the base type for a new content type is not present in |
| the platform (the plug-in that provides it is not available)? |
| <p>The content type (and consequently any descendants) will be deemed invalid |
| and ignored.</p> |
| </li> |
| <li><a name="FAQ-MIME"></a>Do Eclipse's content types have anything to do with |
| IANA's MIME Media Types? |
| <p>Not so far. The main reason is that not every file format with a content |
| type in Eclipse will have a MIME type, so we could not use it as the main |
| association mechanism between content types and applications. We considered |
| keeping MIME-types as an optional property for each content type, and provide |
| a method findContentTypesByMIMEType (or something like that), but decided |
| removing it since there was no sound use case for that (and the idea in |
| the initial proposal was to keep only the essential stuff to avoid distractions).<font size="-1"><strong><img src="../images/changed.gif" width="12" height="12"></strong></font></p> |
| </li> |
| <li>How can users customize the way content types are chosen? |
| <p>By associating additional file specs to existing content types.</p> |
| </li> |
| <li>Can plug-ins override the content type describer for an existing content |
| type? |
| <p>Not so far. It is up to the plugin provider to determine whether a content |
| type describer will be provided.</p> |
| </li> |
| <li>Can two completely unrelated content types be associated with the same file |
| spec? |
| <p>One of the content types (arbitrarily selected) will be chosen as the preferred |
| one (the other will also be associated). Priorities are also taken into |
| account.</p> |
| </li> |
| <!--li>How are conflicts (two different content types associated to the same file) |
| prevented? |
| <p>They are not. At least, not automatically. It is up to clients to decide |
| what to do when more than one content type is offered by the platform. A |
| client that does not care about which one is picked, will randomly choose |
| one of them. A client that cares but does not know which one to choose may |
| refuse to use any of them. User-guided code may ask the user what should |
| be done. The content type chosen may be marked as the default one for the |
| file spec, or the user may want to mark one as an alias for the other.</p> |
| </li--> |
| <li>What if a given file name is matched by two different file specs provided |
| by two completely unrelated content types? |
| <p> As seem above, the only way this can happen is when two <em>different</em> |
| file specs (for instance, a file name and a file extension) accept the same |
| file name (for instance, one content type is associated with a "xml" |
| file extension, other is associated with a "plugin.xml" file name. |
| ) File name specs have priority over file extension specs (so plugin.xml |
| is a plugin manifest before being a XML document). The normal case is that |
| the content type that defines a file name spec is based on the file type |
| that defines a file extension spec (a plugin manifest is a kind of XML document). |
| This ensures that actions applicable to general XML documents will be applicable |
| to a plugin manifest.</p> |
| </li> |
| <li>What are aliases for? <font size="-1"><strong><img src="../images/changed.gif" width="12" height="12"></strong></font> |
| (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=83004">83004</a>) |
| <p>It is a mechanism to prevent conflicts. When multiple plugins contribute |
| content types associated with the same file specs, we have a conflict. Conflicts |
| are bad because introduce ambiguity (what is the right content type?). Most |
| of times when such conflicts arise, it is a case of independently developed |
| plugins trying to contribute the same content type (semantically speaking). |
| Imagine a plug-in com.examples.foo that wants to be associated to the Java |
| Source content typ (org.eclipse.jdt.core.javaSource)e, provided by org.eclipse.jdt.core, |
| but that does not require org.eclipse.jdt.core to be present to work. Such |
| plug-in can contribute its own Java Source content type (com.examples.foo.java), |
| and mark it as an alias for the content type provided by JDT/Core. If JDT/Core |
| is present, the com.examples.foo.java will be omitted from the content type |
| registry, and all references to it will be automatically interpreted as |
| references to org.eclipse.jdt.core.javaSource instead.<font size="-1"></font></p> |
| </li> |
| <li>How do I prevent my specialized content-type to be disabled even if its |
| parent is not available? <font size="-1"><strong><img src="../images/changed.gif" width="12" height="12"></strong></font> |
| (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=83004">83004</a>) |
| <p> Sometimes a plugin A does not depend on plugin B, but declares a content |
| type which is intended to be a specialization of another content type declared |
| by B. To prevent the content type declared by A to be disabled: </p> |
| <ol> |
| <li>declare an alias content type having the intended parent as its target |
| (a placeholder);</li> |
| <li>make your specialized content type have this placeholder as its base |
| type;</li> |
| </ol> |
| <p>If the originally intended base type is available, your base type will |
| be marked as just an alias, and your specialized content type will be properly |
| attached to the intended content type. Otherwise, the placeholder will be |
| enabled, and although things might not be as great as intended (actions |
| associated to the original content type will not be available), your content |
| type will still be enabled.<font size="-1"></font></p> |
| </li> |
| <li>When should a file-association be contributed instead of declaring a new |
| (derived) content type? |
| <p>New content types should be created only if there is no existing content |
| type with the semantics required. Otherwise, when only additional file specs |
| must be provided, file associations are the way to go.</p> |
| </li> |
| <li>Are file specs inherited? |
| <p> Only if none is specified in the sub type. </p> |
| </li> |
| <li>How does a client figure out whether a given file is a text file or not? |
| <p>The proposed approach is to check if the file's content type is a kind |
| of the "org.eclipse.core.runtime.text" content type, which is |
| intended to be the ancestor for all text oriented content types. If it turns |
| out to be a very frequent idiom, we might consider proving a convenience |
| API to do that.</p> |
| </li> |
| <li>Do content types have to contribute content describers? |
| <p> No, although if the file has a identifiable signature/format, it is recommended, |
| because improves the overall quality of content-based content type lookups.</p> |
| </li> |
| </ol> |
| <p> <em>Note: comments are encouraged. Any questions/concerns not addressed here |
| should be discussed in the platform-core-dev list, or bug <a |
| href="http://bugs.eclipse.org/bugs/show_bug.cgi?id=37668">37668</a>. </em></p> |
| <h2>Addendum: issues to be addressed in the 3.1 cycle<font size="-1"><strong><img src="../images/changed.gif" width="12" height="12"></strong></font></h2> |
| <p>The solution described above was implemented and relatively succesful. Some |
| components took advantage of the new content type infrastructure, but still |
| in many cases file-association is being done in an ad-hoc manner. Also, no UI |
| was provided for customizing content types (such as changing the default encoding, |
| adding associations with files) so the user has no control on how the content |
| type detection mechanism works. Thus, the main issues to be addressed in the |
| 3.1 cycle are:</p> |
| <ul> |
| <li>ensure the content type framework works for clients that have not adopted |
| it yet, or at least we understand why it is not (cannot be made) suitable |
| to them. Examples are: Platform/UI, Platform/CVS, and products built on top |
| of Eclipse.</li> |
| <li>ensure users are granted appropriate means so they can customize the behavior |
| of content type detection so things just work for them.</li> |
| <li>ensure content type resolution works (or can be made to work) appropriately |
| in setups where multiple independently developed products contribute conflicting |
| content types. </li> |
| </ul> |
| <h3>Support more use cases</h3> |
| <p>Ensure the content type support works for the SDK plug-ins and for products |
| built on top of Eclipse (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=78654">78654</a>).</p> |
| <p><strong>Platform/UI - file/editor association</strong></p> |
| <p>Content type-editor association is definitely the most important use case for |
| the content type support. The basic idea is that for a given file or stream |
| of data, the UI should be able to:</p> |
| <ol> |
| <li>provide a default editor to be used when opening the file</li> |
| <li>present any other editors that are also able to handle the file</li> |
| <li>allow the user to pick a completely unrelated file not initially suggested</li> |
| <li>remember the user decision (if the user wants so) so it becomes the default |
| for that file</li> |
| </ol> |
| <p> <strong>1</strong>, <strong>2</strong> and <strong>4</strong> are currently |
| supported by the existing file-editor association mechanism. <strong>3</strong> |
| is being requested by users, and it is orthogonal (as 4 is) to the content type |
| support provided by runtime.</p> |
| <p>Content types add a level of indirection between files and editors. At a first |
| glance, there is no reason why changing the default editor would affect what |
| content type is assigned to a file, so users should be able to pick up any editors |
| without affecting content type detection.</p> |
| <p><strong>Platform/Team - binary vs ascii files</strong></p> |
| <p>The Team plug-in keeps a catalog of file extensions and their expected content |
| type (either binary or ASCII). If content types were broadly adopted throughout |
| the rest of Eclipse (so that most files dealt with by users have a content type), |
| couldn't the Team plug-in use content type based encoding determination to figure |
| out a good default for this setting? (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=85490">85490</a>)</p> |
| <h3>Give users more power</h3> |
| <p>Ensure users have means to customize how the content type detection works for |
| them. Provide UI for content types. May provide some way of showing related |
| objects for a given content type (editors, views, comparators, etc). Users cannot |
| provide content type detection code, so user-defined content types would be |
| useful only for cases where content type detection is file name based (like |
| for non-formatted text files, such as source files, configuration files, etc).</p> |
| <p>A simple plug-in that allows users to add/remove file associations is available |
| in the <a href="../downloads.html">downloads page</a>.</p> |
| <h3>Improve content type determination</h3> |
| <p>Ensure content type detection works (or can be made to work) appropriately |
| when incompatible products are deployed together.</p> |
| <ol> |
| <li><strong>project-specific content type resolution</strong> - to reduce chances |
| for collisions, there should be a way of constraining which content types |
| are available on a project basis. One possible solution would be to have content |
| type-nature associations. Natures could declare some preferred content types, |
| and those preferred content types would always win (see bug <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=69640">69640</a>)</li> |
| <li><strong>user-defined content type determination - </strong>two plug-ins |
| provide completely unrelated content types A and B for files with extension |
| ".foo". The current implementation will choose one of them (by priority, |
| depth, or arbitrarily if they have the same priority and depth). If A is automatically |
| chosen, the user might want to choose B instead as the default content type. |
| Instead of doing that on a file basis, it seems better to do it on a coarser |
| granularity, such as at the project level or at the workspace level (see bug |
| <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=85765">85765</a>). |
| </li> |
| <li><strong>product defined policies for overriding content types</strong> - |
| a product may want to override some of the existing content type definitions. |
| Products should be able to do that in a way that would circumvent the regular |
| conflict resolution.</li> |
| </ol> |
| <p>See also corresponding PR <a href="https://bugs.eclipse.org/bugs/show_bug.cgi?id=78654">78654</a> |
| - content type should be used universally</p> |
| </body></html> |