docs/org.eclipse.wst.webtools.doc.user/topics/twimpweb.html - sourceediting/webtools.sourceediting - Git at Google

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html>
 <head>
 <!-- /*******************************************************************************
  * Copyright (c) 2000, 2005 IBM Corporation and others.
  * All rights reserved. This program and the accompanying materials
  * are made available under the terms of the Eclipse Public License v1.0
  * which accompanies this distribution, and is available at
  * http://www.eclipse.org/legal/epl-v10.html
  *
  * Contributors:
  *     IBM Corporation - initial API and implementation
  *******************************************************************************/ -->
 <link rel="stylesheet" type="text/css" href="../../org.eclipse.wst.doc.user/common.css" />
 <title>Importing existing Web resources using HTTP or FTP</title>
 </head>
 <body id="twimpweb"><a name="twimpweb"><!-- --></a>

 <h1 class="topictitle1">Importing existing Web resources using HTTP or FTP</h1>
 <div><div class="skipspace"> <p>You can import existing Web resources

 using wizards that invoke HTTP or FTP. These import wizards automate the transfer

 of complete Web sites into Web projects by:</p>
 <ul><li>Enabling you to continue development of your Web applications while importing

 your Web resources.</li>
 <li>Prompting you for information that helps you find and navigate through

 the Web site</li>
 <li>Providing you with options for limiting the scope of the import and eliminating

 pages that you do not want to import</li>
 </ul>
 <p>These import wizards also support the import capabilities for Web

 servers that are equipped with firewalls. Both HTTP and FTP import support

 Proxies while FTP import supports SOCKS.</p>
 <p>To use the HTTP or FTP

 import wizards, you must designate an existing project in which

 to import the files. You will be able to view all the files from the imported

 Web site within the selected project folder.</p>
 <p>The HTTP import uses the

 HTTP protocol to crawl through the Web site based on an initial URL that you

 provide. The import action uses the URL to retrieve any HTML content available

 and also parses for HTTP links. The process repeats until it parses content

 and links that are referenced to other web pages that are encountered within

 the web site. HTTP import cannot parse pages that contain servlets or programs

 that are executed when a form is posted or embedded in JavaServer Pages (JSPs). </p>
 <p>The

 files transferred to your project represent a logical snapshot of the Web

 site's URL. This means that your Web project is populated with files that

 are acquired by the HTML response of the serving site. This also means that

 it is not necessary that the physical resources on the serving site will be

 copied to your project. For example, an HTTP request for a JSP page will return

 a rendered HTML response, not the JSP page itself. It is recommended that

 you use HTTP import for static pages and for sites that do not have FTP access.</p>
 <p>To

 import existing Web resources into the Web project using HTTP, perform the

 following steps:</p>
 </div>
 <ol><li class="skipspace"><span>Create a new project where you wish to import Web resources using

 the <span class="uicontrol">New Web Project</span> wizard.</span></li>
 <li class="skipspace"><span>If you intend to use an existing project, select the project in

 the Project Explorer view.</span></li>
 <li class="skipspace"><span>Select <span class="menucascade"><span class="uicontrol">File</span> &gt; <span class="uicontrol">Import</span></span>. </span></li>
 <li class="skipspace"><span>In the Import dialog, select <span class="uicontrol">HTTP</span> and click <span class="uicontrol">Next</span>.</span></li>
 <li class="skipspace"><span>In the <span class="uicontrol">Specify the destination folder and the resources

 to import</span> page, type the requisite project information. </span> <ul><li><span class="uicontrol">Folder</span> - The imported files are placed in the default

 location (the Web content folder). You can click the <span class="uicontrol">Browse</span> button

 to change the location for the imported files for your project</li>
 <li><b>URL</b> - Type in the HTTP URL in the <span class="uicontrol">URL</span> field.

 The URL should include the domain name and starting directory for the URL/initial

 web-page.  <ul><li>If you enter a directory URL without a start page (for example, www.domain.net/Sports/),

 the default file name will be used when the web server returns HTML content

 (for example, if you do not specify a default, index.html is used.). </li>
 <li>HTTP crawling may create files that do not exist on the original server.

 For example, an HTTP reference to a directory may cause a Web server to respond

 with HTML content that describes the directory. The HTTP crawler saves this

 response as index.html</li>
 <li>If you enter just a domain name (for example, www.domain.net), the Import

 wizard will try to find a default page in the document root directory. </li>
 </ul>
 If you click the <span class="uicontrol">Advanced</span> button, you have the

 option of specifying a proxy connection in the Advanced Settings dialog box.

 If you select the <span class="uicontrol">Use a proxy server</span> check box, you

 will have the option of selecting a SOCKS or HTTP proxy, and supplying the

 corresponding server and port values. </li>
 <li><b>Depth limit while following HTTP links </b>- You can limit the scope

 of import that follows links by selecting the appropriate radio button provided.

  <ul><li><span class="uicontrol">No limit</span>- This option will allow the HTTP import

 to parse through all pages within the domain. </li>
 <li><span class="uicontrol">Limit to</span>- This option determines the depth limit

 of link levels that are crawled. For example, if you choose 1, all web pages

 within one link (level 1) from the page that it is being imported from will

 be navigated. If you limit it to 2, then all level 1 links and the ones linked

 directly to level 1 web pages will be imported. <p>For example, one might

 specify a crawl depth of 2 and an initial URL http://host/initialLevel/index.html

 . If index.html has a reference to http://host/initialLevel/L2/L3/index2.html

 , then index2.html, which is at level 3, is filtered out and its content will

 not be parsed for follow on crawling. </p>
 </li>
 </ul>
 </li>
 </ul>
 </li>
 <li class="skipspace"><span>Click <span class="uicontrol">Next</span> for more options, or <span class="uicontrol">Finish</span> to

 import the Web site. </span></li>
 <li class="skipspace"><span>If you select <b>Next</b>, in the <b>Specify appropriate import

 options</b> page, select among the choices provided.  </span> <ul><li><span class="uicontrol">Convert Links to document relative</span> - If you select

 this option, links within HTML files are updated in a document-relative fashion,

 rather than creating absolute links based on their new location in a file

 system.  </li>
 <li><span class="uicontrol">Overwrite existing resources without warning</span> -

 If you select this option, existing workbench files in your project will be

 overwritten. If this option is not selected, files imported will not be overwritten.

 There is no prompting for selectively over-writing files.  </li>
 <li><span class="uicontrol">Do not follow links to files in parent folders of the starting

 URL</span> - If you select this option, you will prevent the FTP import

 to crawl resources above the initial provided URL. For example, if the initial

 URL is http://host/l1/l2/index.html and a link within the page references

 http://host/index.html, this option will determine whether the linked resource

 should be included in the import. If you do not have this option checked,

 you run the risk to crawl very large sites, and importing huge volumes of

 files unnecessarily.</li>
 <li><span class="uicontrol">Connection timeout</span> - This option determines the

 HTTP connection timeout value. It is measured, in milliseconds. Connection

 timeouts are a way of specifying how long you would prefer to wait for a message

 from the server before giving up.</li>
 </ul>
 </li>
 <li class="skipspace"><span>Click <span class="uicontrol">Finish</span> to import the Web site with

 options. </span></li>
 <li class="skipspace"><span>Verify the resulting directory structure and file data integrity

 in the newly-populated project or folder. </span></li>
 </ol>
 </div>
 </body>
 </html>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE html
	PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
	<html>
	<head>
	<!-- /*******************************************************************************
	* Copyright (c) 2000, 2005 IBM Corporation and others.
	* All rights reserved. This program and the accompanying materials
	* are made available under the terms of the Eclipse Public License v1.0
	* which accompanies this distribution, and is available at
	* http://www.eclipse.org/legal/epl-v10.html
	*
	* Contributors:
	* IBM Corporation - initial API and implementation
	*******************************************************************************/ -->
	<link rel="stylesheet" type="text/css" href="../../org.eclipse.wst.doc.user/common.css" />
	<title>Importing existing Web resources using HTTP or FTP</title>
	</head>
	<body id="twimpweb"><a name="twimpweb"><!-- --></a>

	<h1 class="topictitle1">Importing existing Web resources using HTTP or FTP</h1>
	<div><div class="skipspace"> <p>You can import existing Web resources

	using wizards that invoke HTTP or FTP. These import wizards automate the transfer

	of complete Web sites into Web projects by:</p>
	<ul><li>Enabling you to continue development of your Web applications while importing

	your Web resources.</li>
	<li>Prompting you for information that helps you find and navigate through

	the Web site</li>
	<li>Providing you with options for limiting the scope of the import and eliminating

	pages that you do not want to import</li>
	</ul>
	<p>These import wizards also support the import capabilities for Web

	servers that are equipped with firewalls. Both HTTP and FTP import support

	Proxies while FTP import supports SOCKS.</p>
	<p>To use the HTTP or FTP

	import wizards, you must designate an existing project in which

	to import the files. You will be able to view all the files from the imported

	Web site within the selected project folder.</p>
	<p>The HTTP import uses the

	HTTP protocol to crawl through the Web site based on an initial URL that you

	provide. The import action uses the URL to retrieve any HTML content available

	and also parses for HTTP links. The process repeats until it parses content

	and links that are referenced to other web pages that are encountered within

	the web site. HTTP import cannot parse pages that contain servlets or programs

	that are executed when a form is posted or embedded in JavaServer Pages (JSPs). </p>
	<p>The

	files transferred to your project represent a logical snapshot of the Web

	site's URL. This means that your Web project is populated with files that

	are acquired by the HTML response of the serving site. This also means that

	it is not necessary that the physical resources on the serving site will be

	copied to your project. For example, an HTTP request for a JSP page will return

	a rendered HTML response, not the JSP page itself. It is recommended that

	you use HTTP import for static pages and for sites that do not have FTP access.</p>
	<p>To

	import existing Web resources into the Web project using HTTP, perform the

	following steps:</p>
	</div>
	<ol><li class="skipspace"><span>Create a new project where you wish to import Web resources using

	the <span class="uicontrol">New Web Project</span> wizard.</span></li>
	<li class="skipspace"><span>If you intend to use an existing project, select the project in

	the Project Explorer view.</span></li>
	<li class="skipspace"><span>Select <span class="menucascade"><span class="uicontrol">File</span> > <span class="uicontrol">Import</span></span>. </span></li>
	<li class="skipspace"><span>In the Import dialog, select <span class="uicontrol">HTTP</span> and click <span class="uicontrol">Next</span>.</span></li>
	<li class="skipspace"><span>In the <span class="uicontrol">Specify the destination folder and the resources

	to import</span> page, type the requisite project information. </span> <ul><li><span class="uicontrol">Folder</span> - The imported files are placed in the default

	location (the Web content folder). You can click the <span class="uicontrol">Browse</span> button

	to change the location for the imported files for your project</li>
	<li><b>URL</b> - Type in the HTTP URL in the <span class="uicontrol">URL</span> field.

	The URL should include the domain name and starting directory for the URL/initial

	web-page. <ul><li>If you enter a directory URL without a start page (for example, www.domain.net/Sports/),

	the default file name will be used when the web server returns HTML content

	(for example, if you do not specify a default, index.html is used.). </li>
	<li>HTTP crawling may create files that do not exist on the original server.

	For example, an HTTP reference to a directory may cause a Web server to respond

	with HTML content that describes the directory. The HTTP crawler saves this

	response as index.html</li>
	<li>If you enter just a domain name (for example, www.domain.net), the Import

	wizard will try to find a default page in the document root directory. </li>
	</ul>
	If you click the <span class="uicontrol">Advanced</span> button, you have the

	option of specifying a proxy connection in the Advanced Settings dialog box.

	If you select the <span class="uicontrol">Use a proxy server</span> check box, you

	will have the option of selecting a SOCKS or HTTP proxy, and supplying the

	corresponding server and port values. </li>
	<li><b>Depth limit while following HTTP links </b>- You can limit the scope

	of import that follows links by selecting the appropriate radio button provided.

	<ul><li><span class="uicontrol">No limit</span>- This option will allow the HTTP import

	to parse through all pages within the domain. </li>
	<li><span class="uicontrol">Limit to</span>- This option determines the depth limit

	of link levels that are crawled. For example, if you choose 1, all web pages

	within one link (level 1) from the page that it is being imported from will

	be navigated. If you limit it to 2, then all level 1 links and the ones linked

	directly to level 1 web pages will be imported. <p>For example, one might

	specify a crawl depth of 2 and an initial URL http://host/initialLevel/index.html

	. If index.html has a reference to http://host/initialLevel/L2/L3/index2.html

	, then index2.html, which is at level 3, is filtered out and its content will

	not be parsed for follow on crawling. </p>
	</li>
	</ul>
	</li>
	</ul>
	</li>
	<li class="skipspace"><span>Click <span class="uicontrol">Next</span> for more options, or <span class="uicontrol">Finish</span> to

	import the Web site. </span></li>
	<li class="skipspace"><span>If you select <b>Next</b>, in the <b>Specify appropriate import

	options</b> page, select among the choices provided. </span> <ul><li><span class="uicontrol">Convert Links to document relative</span> - If you select

	this option, links within HTML files are updated in a document-relative fashion,

	rather than creating absolute links based on their new location in a file

	system. </li>
	<li><span class="uicontrol">Overwrite existing resources without warning</span> -

	If you select this option, existing workbench files in your project will be

	overwritten. If this option is not selected, files imported will not be overwritten.

	There is no prompting for selectively over-writing files. </li>
	<li><span class="uicontrol">Do not follow links to files in parent folders of the starting

	URL</span> - If you select this option, you will prevent the FTP import

	to crawl resources above the initial provided URL. For example, if the initial

	URL is http://host/l1/l2/index.html and a link within the page references

	http://host/index.html, this option will determine whether the linked resource

	should be included in the import. If you do not have this option checked,

	you run the risk to crawl very large sites, and importing huge volumes of

	files unnecessarily.</li>
	<li><span class="uicontrol">Connection timeout</span> - This option determines the

	HTTP connection timeout value. It is measured, in milliseconds. Connection

	timeouts are a way of specifying how long you would prefer to wait for a message

	from the server before giving up.</li>
	</ul>
	</li>
	<li class="skipspace"><span>Click <span class="uicontrol">Finish</span> to import the Web site with

	options. </span></li>
	<li class="skipspace"><span>Verify the resulting directory structure and file data integrity

	in the newly-populated project or folder. </span></li>
	</ol>
	</div>
	</body>
	</html>