<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<!-- /*******************************************************************************
 * Copyright (c) 2000, 2005 IBM Corporation and others.
 * All rights reserved. This program and the accompanying materials
 * are made available under the terms of the Eclipse Public License v1.0
 * which accompanies this distribution, and is available at
 * http://www.eclipse.org/legal/epl-v10.html
 * 
 * Contributors:
 *     IBM Corporation - initial API and implementation
 *******************************************************************************/ -->
<link rel="stylesheet" type="text/css" href="../../org.eclipse.wst.doc.user/common.css" />
<title>Importing existing Web resources using HTTP or FTP</title>
</head>
<body id="twimpweb"><a name="twimpweb"><!-- --></a>

<h1 class="topictitle1">Importing existing Web resources using HTTP or FTP</h1>
<div><div class="skipspace"> <p>You can import existing Web resources

using wizards that invoke HTTP or FTP. These import wizards automate the transfer

of complete Web sites into Web projects by:</p>
<ul><li>Enabling you to continue development of your Web applications while importing

your Web resources.</li>
<li>Prompting you for information that helps you find and navigate through

the Web site</li>
<li>Providing you with options for limiting the scope of the import and eliminating

pages that you do not want to import</li>
</ul>
<p>These import wizards also support the import capabilities for Web

servers that are equipped with firewalls. Both HTTP and FTP import support

Proxies while FTP import supports SOCKS.</p>
<p>To use the HTTP or FTP

import wizards, you must designate an existing project in which

to import the files. You will be able to view all the files from the imported

Web site within the selected project folder.</p>
<p>The HTTP import uses the

HTTP protocol to crawl through the Web site based on an initial URL that you

provide. The import action uses the URL to retrieve any HTML content available

and also parses for HTTP links. The process repeats until it parses content

and links that are referenced to other web pages that are encountered within

the web site. HTTP import cannot parse pages that contain servlets or programs

that are executed when a form is posted or embedded in JavaServer Pages (JSPs). </p>
<p>The

files transferred to your project represent a logical snapshot of the Web

site's URL. This means that your Web project is populated with files that

are acquired by the HTML response of the serving site. This also means that

it is not necessary that the physical resources on the serving site will be

copied to your project. For example, an HTTP request for a JSP page will return

a rendered HTML response, not the JSP page itself. It is recommended that

you use HTTP import for static pages and for sites that do not have FTP access.</p>
<p>To

import existing Web resources into the Web project using HTTP, perform the

following steps:</p>
</div>
<ol><li class="skipspace"><span>Create a new project where you wish to import Web resources using

the <span class="uicontrol">New Web Project</span> wizard.</span></li>
<li class="skipspace"><span>If you intend to use an existing project, select the project in

the Project Explorer view.</span></li>
<li class="skipspace"><span>Select <span class="menucascade"><span class="uicontrol">File</span> &gt; <span class="uicontrol">Import</span></span>. </span></li>
<li class="skipspace"><span>In the Import dialog, select <span class="uicontrol">HTTP</span> and click <span class="uicontrol">Next</span>.</span></li>
<li class="skipspace"><span>In the <span class="uicontrol">Specify the destination folder and the resources

to import</span> page, type the requisite project information. </span> <ul><li><span class="uicontrol">Folder</span> - The imported files are placed in the default

location (the Web content folder). You can click the <span class="uicontrol">Browse</span> button

to change the location for the imported files for your project</li>
<li><b>URL</b> - Type in the HTTP URL in the <span class="uicontrol">URL</span> field.

The URL should include the domain name and starting directory for the URL/initial

web-page.  <ul><li>If you enter a directory URL without a start page (for example, www.domain.net/Sports/),

the default file name will be used when the web server returns HTML content

(for example, if you do not specify a default, index.html is used.). </li>
<li>HTTP crawling may create files that do not exist on the original server.

For example, an HTTP reference to a directory may cause a Web server to respond

with HTML content that describes the directory. The HTTP crawler saves this

response as index.html</li>
<li>If you enter just a domain name (for example, www.domain.net), the Import

wizard will try to find a default page in the document root directory. </li>
</ul>
If you click the <span class="uicontrol">Advanced</span> button, you have the

option of specifying a proxy connection in the Advanced Settings dialog box.

If you select the <span class="uicontrol">Use a proxy server</span> check box, you

will have the option of selecting a SOCKS or HTTP proxy, and supplying the

corresponding server and port values. </li>
<li><b>Depth limit while following HTTP links </b>- You can limit the scope

of import that follows links by selecting the appropriate radio button provided.

 <ul><li><span class="uicontrol">No limit</span>- This option will allow the HTTP import

to parse through all pages within the domain. </li>
<li><span class="uicontrol">Limit to</span>- This option determines the depth limit

of link levels that are crawled. For example, if you choose 1, all web pages

within one link (level 1) from the page that it is being imported from will

be navigated. If you limit it to 2, then all level 1 links and the ones linked

directly to level 1 web pages will be imported. <p>For example, one might

specify a crawl depth of 2 and an initial URL http://host/initialLevel/index.html

. If index.html has a reference to http://host/initialLevel/L2/L3/index2.html

, then index2.html, which is at level 3, is filtered out and its content will

not be parsed for follow on crawling. </p>
</li>
</ul>
</li>
</ul>
</li>
<li class="skipspace"><span>Click <span class="uicontrol">Next</span> for more options, or <span class="uicontrol">Finish</span> to

import the Web site. </span></li>
<li class="skipspace"><span>If you select <b>Next</b>, in the <b>Specify appropriate import

options</b> page, select among the choices provided.  </span> <ul><li><span class="uicontrol">Convert Links to document relative</span> - If you select

this option, links within HTML files are updated in a document-relative fashion,

rather than creating absolute links based on their new location in a file

system.  </li>
<li><span class="uicontrol">Overwrite existing resources without warning</span> -

If you select this option, existing workbench files in your project will be

overwritten. If this option is not selected, files imported will not be overwritten.

There is no prompting for selectively over-writing files.  </li>
<li><span class="uicontrol">Do not follow links to files in parent folders of the starting

URL</span> - If you select this option, you will prevent the FTP import

to crawl resources above the initial provided URL. For example, if the initial

URL is http://host/l1/l2/index.html and a link within the page references

http://host/index.html, this option will determine whether the linked resource

should be included in the import. If you do not have this option checked,

you run the risk to crawl very large sites, and importing huge volumes of

files unnecessarily.</li>
<li><span class="uicontrol">Connection timeout</span> - This option determines the

HTTP connection timeout value. It is measured, in milliseconds. Connection

timeouts are a way of specifying how long you would prefer to wait for a message

from the server before giving up.</li>
</ul>
</li>
<li class="skipspace"><span>Click <span class="uicontrol">Finish</span> to import the Web site with

options. </span></li>
<li class="skipspace"><span>Verify the resulting directory structure and file data integrity

in the newly-populated project or folder. </span></li>
</ol>
</div>
</body>
</html>