plugins/org.eclipse.mat.ui.help/reference/inspections/component_report.dita - mat/org.eclipse.mat - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "reference.dtd" >
 <reference id="ref_inspections_component_report" xml:lang="en-us">
 	<title>Component Report</title>
 	<shortdesc>Analyze a component for possible memory waste and
 		other inefficiencies.</shortdesc>
 	<prolog>
 		<copyright>
 			<copyryear year=""></copyryear>
 			<copyrholder>
 				Copyright (c) 2008, 2010 SAP AG and others.
 			    All rights reserved. This program and the accompanying materials
 			    are made available under the terms of the Eclipse Public License v1.0
 			    which accompanies this distribution, and is available at
 			    http://www.eclipse.org/legal/epl-v10.html
 			</copyrholder>
 		</copyright>
 	</prolog>

 	<refbody>
 		<section id="introduction">
 			<title>Introduction</title>
 			<p>A heap dump contains millions of objects. But which of those
 				belong to your component? And what conclusions can you draw from
 				them? This is where the Component Report can help.</p>
 			<p>
 				Before starting, one has to decide what constitutes a component.
 				Typically, a component is either a set of classes in a
 				<b>common root package</b>
 				or a set of classes loaded by the same
 				<b>class loader</b>
 				.
 			</p>
 			<p>
 				Using this root set of objects, the component report calculates a
 				customized retained set. This retained set includes all objects kept
 				alive by the root set. Additionally, it assumes that all objects
 				that have become
 				<i>finalizable</i>
 				actually have been finalized and that also all soft references have
 				been cleared.
 			</p>
 		</section>
 		<section id="run">
 			<title>Executing the Component Report</title>
 			<p> To run the report for a common root package, select the component
 				report from the tool bar and provide a regular expression to match
 				the package:</p>
 			<image href="component_report_package.png">
 				<alt>Regular expression to match common root package to be
 					used for the component report.</alt>
 			</image>
 			<p> Alternatively, one can group the class histogram by class loader
 				and then right-click the appropriate class loader and select the
 				component report:</p>
 			<image href="component_report_classloader.png">
 				<alt>Group histogram by class loader.</alt>
 			</image>
 		</section>
 		<section id="overview">
 			<title>Overview</title>
 			<p>The component report is rendered as HTML. It is stored in a ZIP
 				file next to the heap dump file.</p>
 			<image href="component_report_overview.png">
 				<alt>Overview section of the component report.</alt>
 			</image>
 			<p>
 				<ol outputclass="arrows">
 					<li>Details about the size, the number of classes, the
 						number of objects and the number of different class loaders.</li>
 					<li>The pie chart shows the size of the component relative to
 						the total heap size.</li>
 					<li>
 						The
 						<xref href="top_consumers.dita" scope="local">Top Consumers</xref>
 						section lists the biggest object, classes, class loader and
 						packages which are retained by the component. It provides a good
 						overview of what is actually kept alive by the component.
 					</li>
 					<li>
 						<xref href="retained_set.dita" scope="local">Retained Set
 						</xref>
 						displays all objects grouped by classes which are retained.
 					</li>
 				</ol>
 			</p>
 		</section>
 		<section id="strings">
 			<title>Duplicate Strings</title>
 			<p>
 				Duplicate Strings are a prime example for memory waste: multiple
 				char arrays with identical content. To find the duplicates, the
 				report
 				<xref href="group_by_value.dita" scope="local">groups</xref>
 				the char arrays by their value. It lists all char arrays with 10 or
 				more instances with identical content.
 			</p>
 			<p>
 				The content of the char arrays typically gives away ideas how to
 				reduce the duplicates:
 				<ul>
 					<li>
 						Sometimes the duplicate strings are used as
 						<b>keys or values in hash maps</b>
 						. For example, when reading heap dumps, MAT itself used to read
 						the char constant denoting the type of an attribute into memory.
 						It turned out that the heap was littered with many 'L's for
 						references, 'B's for bytes, and 'Z's for booleans, etc. By
 						replacing the
 						<codeph>char</codeph>
 						with an
 						<codeph>int</codeph>
 						, MAT could save some of the precious memory. Alternatively,
 						Enumerations could do the same trick.
 					</li>
 					<li>
 						When reading
 						<b>XML documents</b>
 						, fragments like
 						<codeph>UTF-8</codeph>
 						, tag names or tag content remains in memory. Again, think about
 						using Enumerations for the repetitive content.
 					</li>
 					<li>
 						Another option is
 						<xref
 							href="http://java.sun.com/javase/6/docs/api/java/lang/String.html#intern()"
 							format="html">interning</xref>
 						the String. This adds the string to a pool of strings which is
 						maintained privately by the class
 						<codeph>String</codeph>
 						. For each unique string, the pool will keep on instance alive.
 						However, if you are interning, make sure do it
 						<b>responsibly</b>
 						: A big pool of strings will have maintenance costs and one cannot
 						rely on interned strings being garbage collected.
 					</li>
 				</ul>
 			</p>
 		</section>
 		<section id="emptycol">
 			<title>Empty Collections</title>
 			<p>Even if collections are empty, they usually consume memory
 				through their internal object array. Imagine a tree structure where
 				every node eagerly creates array lists to hold its children, but
 				only a few nodes actually possess children.</p>
 			<p>
 				One remedy is the lazy initialization of the collections: create the
 				collection only when it is actually needed. To find out who is
 				responsible for the empty collections, use the
 				<xref href="immediate_dominators.dita" scope="local">immediate
 					dominators</xref>
 				command.
 			</p>
 		</section>
 		<section id="colfillratio">
 			<title>Collection Fill Ratio</title>
 			<p>Just like empty ones, collections with only a few elements
 				also take up a lot of memory. Again, the backing array of the
 				collection is the main culprit. The examination of the fill ratios
 				using a heap dump from a production system gives hints to what
 				initial capacity to use.</p>
 		</section>
 		<section id="softref">
 			<title>Soft Reference Statistics</title>
 			<p>
 				Soft references are cleared by the virtual machine in response to
 				memory demand. Usually, soft references are used to implement
 				caches: keep the objects around while there is sufficient memory,
 				clear the objects if free memory becomes low.
 				<ul>
 					<li>Usually objects are cached, because they are expensive
 						to re-create. Across a whole application, soft referenced objects
 						might carry very different costs. However, the virtual machine
 						cannot know this and clears the objects on some least recently
 						used algorithm. From the outside, this is very unpredictable and
 						difficult to fine tune.</li>
 					<li>
 						Furthermore, soft references can impose a
 						<i>stop-the-world</i>
 						phase during garbage collection. Oversimplified, the GC marks the
 						object graph behind the soft references while the virtual machine
 						is stopped.
 					</li>
 				</ul>
 			</p>
 		</section>
 		<section id="finalizer">
 			<title>Finalizer Statistics</title>
 			<p>
 				Objects which implement the
 				<codeph>finalize</codeph>
 				method are included in the component report, because those objects
 				can have serious implications for the memory of a Java Virtual
 				Machine:
 				<ul>
 					<li>
 						Whenever an object with finalizer is created, a corresponding
 						<codeph>java.lang.ref.Finalizer</codeph>
 						object is created. If the object is only reachable via its
 						finalizer, it is placed in the queue of the finalizer thread and
 						processed. Only then the next garbage collection will actually
 						free the memory. Therefore it takes at least two garbage
 						collections until the memory is freed.
 					</li>
 					<li>When using Sun's current virtual machine implementation,
 						the finalizer thread is a single thread processing the finalizer
 						objects sequentially. One blocking finalizer queue therefore can
 						easily keep alive big chunks of memory (all those other objects
 						ready to be finalized).</li>
 					<li>
 						Depending on the actual algorithm, finalizer may require a
 						<i>stop-the-world</i>
 						pause during garbage collections. This, of course, can have
 						serious implications for the responsiveness of the whole
 						application.
 					</li>
 					<li>Last not least, the time of execution of the finalizer is
 						up to the VM and therefore unpredictable.</li>
 				</ul>
 			</p>
 		</section>
 		<section id="mapcollision">
 			<title>Map Collision Ratios</title>
 			<p>This sections analyzes the collision ratios of hash maps. Maps
 				place the values in different buckets based on the hash code of the
 				keys. If the hash code points to the same bucket, the elements
 				inside the bucket are typically compared linearly.</p>
 			<p>High collision ratios can indicate sub-optimal hash codes.
 				This is not a memory problem (a better hash code does not save
 				space) but rather performance problem because of the linear access
 				inside the buckets.</p>
 		</section>
 	</refbody>
 </reference>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE reference PUBLIC "-//OASIS//DTD DITA Reference//EN" "reference.dtd" >
	<reference id="ref_inspections_component_report" xml:lang="en-us">
	<title>Component Report</title>
	<shortdesc>Analyze a component for possible memory waste and
	other inefficiencies.</shortdesc>
	<prolog>
	<copyright>
	<copyryear year=""></copyryear>
	<copyrholder>
	Copyright (c) 2008, 2010 SAP AG and others.
	All rights reserved. This program and the accompanying materials
	are made available under the terms of the Eclipse Public License v1.0
	which accompanies this distribution, and is available at
	http://www.eclipse.org/legal/epl-v10.html
	</copyrholder>
	</copyright>
	</prolog>

	<refbody>
	<section id="introduction">
	<title>Introduction</title>
	<p>A heap dump contains millions of objects. But which of those
	belong to your component? And what conclusions can you draw from
	them? This is where the Component Report can help.</p>
	<p>
	Before starting, one has to decide what constitutes a component.
	Typically, a component is either a set of classes in a
	<b>common root package</b>
	or a set of classes loaded by the same
	<b>class loader</b>
	.
	</p>
	<p>
	Using this root set of objects, the component report calculates a
	customized retained set. This retained set includes all objects kept
	alive by the root set. Additionally, it assumes that all objects
	that have become
	<i>finalizable</i>
	actually have been finalized and that also all soft references have
	been cleared.
	</p>
	</section>
	<section id="run">
	<title>Executing the Component Report</title>
	<p> To run the report for a common root package, select the component
	report from the tool bar and provide a regular expression to match
	the package:</p>
	<image href="component_report_package.png">
	<alt>Regular expression to match common root package to be
	used for the component report.</alt>
	</image>
	<p> Alternatively, one can group the class histogram by class loader
	and then right-click the appropriate class loader and select the
	component report:</p>
	<image href="component_report_classloader.png">
	<alt>Group histogram by class loader.</alt>
	</image>
	</section>
	<section id="overview">
	<title>Overview</title>
	<p>The component report is rendered as HTML. It is stored in a ZIP
	file next to the heap dump file.</p>
	<image href="component_report_overview.png">
	<alt>Overview section of the component report.</alt>
	</image>
	<p>
	<ol outputclass="arrows">
	<li>Details about the size, the number of classes, the
	number of objects and the number of different class loaders.</li>
	<li>The pie chart shows the size of the component relative to
	the total heap size.</li>
	<li>
	The
	<xref href="top_consumers.dita" scope="local">Top Consumers</xref>
	section lists the biggest object, classes, class loader and
	packages which are retained by the component. It provides a good
	overview of what is actually kept alive by the component.
	</li>
	<li>
	<xref href="retained_set.dita" scope="local">Retained Set
	</xref>
	displays all objects grouped by classes which are retained.
	</li>
	</ol>
	</p>
	</section>
	<section id="strings">
	<title>Duplicate Strings</title>
	<p>
	Duplicate Strings are a prime example for memory waste: multiple
	char arrays with identical content. To find the duplicates, the
	report
	<xref href="group_by_value.dita" scope="local">groups</xref>
	the char arrays by their value. It lists all char arrays with 10 or
	more instances with identical content.
	</p>
	<p>
	The content of the char arrays typically gives away ideas how to
	reduce the duplicates:
	<ul>
	<li>
	Sometimes the duplicate strings are used as
	<b>keys or values in hash maps</b>
	. For example, when reading heap dumps, MAT itself used to read
	the char constant denoting the type of an attribute into memory.
	It turned out that the heap was littered with many 'L's for
	references, 'B's for bytes, and 'Z's for booleans, etc. By
	replacing the
	<codeph>char</codeph>
	with an
	<codeph>int</codeph>
	, MAT could save some of the precious memory. Alternatively,
	Enumerations could do the same trick.
	</li>
	<li>
	When reading
	<b>XML documents</b>
	, fragments like
	<codeph>UTF-8</codeph>
	, tag names or tag content remains in memory. Again, think about
	using Enumerations for the repetitive content.
	</li>
	<li>
	Another option is
	<xref
	href="http://java.sun.com/javase/6/docs/api/java/lang/String.html#intern()"
	format="html">interning</xref>
	the String. This adds the string to a pool of strings which is
	maintained privately by the class
	<codeph>String</codeph>
	. For each unique string, the pool will keep on instance alive.
	However, if you are interning, make sure do it
	<b>responsibly</b>
	: A big pool of strings will have maintenance costs and one cannot
	rely on interned strings being garbage collected.
	</li>
	</ul>
	</p>
	</section>
	<section id="emptycol">
	<title>Empty Collections</title>
	<p>Even if collections are empty, they usually consume memory
	through their internal object array. Imagine a tree structure where
	every node eagerly creates array lists to hold its children, but
	only a few nodes actually possess children.</p>
	<p>
	One remedy is the lazy initialization of the collections: create the
	collection only when it is actually needed. To find out who is
	responsible for the empty collections, use the
	<xref href="immediate_dominators.dita" scope="local">immediate
	dominators</xref>
	command.
	</p>
	</section>
	<section id="colfillratio">
	<title>Collection Fill Ratio</title>
	<p>Just like empty ones, collections with only a few elements
	also take up a lot of memory. Again, the backing array of the
	collection is the main culprit. The examination of the fill ratios
	using a heap dump from a production system gives hints to what
	initial capacity to use.</p>
	</section>
	<section id="softref">
	<title>Soft Reference Statistics</title>
	<p>
	Soft references are cleared by the virtual machine in response to
	memory demand. Usually, soft references are used to implement
	caches: keep the objects around while there is sufficient memory,
	clear the objects if free memory becomes low.
	<ul>
	<li>Usually objects are cached, because they are expensive
	to re-create. Across a whole application, soft referenced objects
	might carry very different costs. However, the virtual machine
	cannot know this and clears the objects on some least recently
	used algorithm. From the outside, this is very unpredictable and
	difficult to fine tune.</li>
	<li>
	Furthermore, soft references can impose a
	<i>stop-the-world</i>
	phase during garbage collection. Oversimplified, the GC marks the
	object graph behind the soft references while the virtual machine
	is stopped.
	</li>
	</ul>
	</p>
	</section>
	<section id="finalizer">
	<title>Finalizer Statistics</title>
	<p>
	Objects which implement the
	<codeph>finalize</codeph>
	method are included in the component report, because those objects
	can have serious implications for the memory of a Java Virtual
	Machine:
	<ul>
	<li>
	Whenever an object with finalizer is created, a corresponding
	<codeph>java.lang.ref.Finalizer</codeph>
	object is created. If the object is only reachable via its
	finalizer, it is placed in the queue of the finalizer thread and
	processed. Only then the next garbage collection will actually
	free the memory. Therefore it takes at least two garbage
	collections until the memory is freed.
	</li>
	<li>When using Sun's current virtual machine implementation,
	the finalizer thread is a single thread processing the finalizer
	objects sequentially. One blocking finalizer queue therefore can
	easily keep alive big chunks of memory (all those other objects
	ready to be finalized).</li>
	<li>
	Depending on the actual algorithm, finalizer may require a
	<i>stop-the-world</i>
	pause during garbage collections. This, of course, can have
	serious implications for the responsiveness of the whole
	application.
	</li>
	<li>Last not least, the time of execution of the finalizer is
	up to the VM and therefore unpredictable.</li>
	</ul>
	</p>
	</section>
	<section id="mapcollision">
	<title>Map Collision Ratios</title>
	<p>This sections analyzes the collision ratios of hash maps. Maps
	place the values in different buckets based on the hash code of the
	keys. If the hash code points to the same bucket, the elements
	inside the bucket are typically compared linearly.</p>
	<p>High collision ratios can indicate sub-optimal hash codes.
	This is not a memory problem (a better hash code does not save
	space) but rather performance problem because of the linear access
	inside the buckets.</p>
	</section>
	</refbody>
	</reference>