blob: cba6bc38973bcdb27561b3337261225c69590d57 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<!-- /*******************************************************************************
* Copyright (c) 2000, 2005 IBM Corporation and others.
* All rights reserved. This program and the accompanying materials
* are made available under the terms of the Eclipse Public License v1.0
* which accompanies this distribution, and is available at
* http://www.eclipse.org/legal/epl-v10.html
*
* Contributors:
* IBM Corporation - initial API and implementation
*******************************************************************************/ -->
<link rel="stylesheet" type="text/css" href="../../org.eclipse.wst.doc.user/common.css" />
<title>Importing existing Web resources using HTTP or FTP</title>
</head>
<body id="twimpweb"><a name="twimpweb"><!-- --></a>
<h1 class="topictitle1">Importing existing Web resources using HTTP or FTP</h1>
<div><div class="skipspace"> <p>You can import existing Web resources
using wizards that invoke HTTP or FTP. These import wizards automate the transfer
of complete Web sites into Web projects by:</p>
<ul><li>Enabling you to continue development of your Web applications while importing
your Web resources.</li>
<li>Prompting you for information that helps you find and navigate through
the Web site</li>
<li>Providing you with options for limiting the scope of the import and eliminating
pages that you do not want to import</li>
</ul>
<p>These import wizards also support the import capabilities for Web
servers that are equipped with firewalls. Both HTTP and FTP import support
Proxies while FTP import supports SOCKS.</p>
<p>To use the HTTP or FTP
import wizards, you must designate an existing project in which
to import the files. You will be able to view all the files from the imported
Web site within the selected project folder.</p>
<p>The HTTP import uses the
HTTP protocol to crawl through the Web site based on an initial URL that you
provide. The import action uses the URL to retrieve any HTML content available
and also parses for HTTP links. The process repeats until it parses content
and links that are referenced to other web pages that are encountered within
the web site. HTTP import cannot parse pages that contain servlets or programs
that are executed when a form is posted or embedded in JavaServer Pages (JSPs). </p>
<p>The
files transferred to your project represent a logical snapshot of the Web
site's URL. This means that your Web project is populated with files that
are acquired by the HTML response of the serving site. This also means that
it is not necessary that the physical resources on the serving site will be
copied to your project. For example, an HTTP request for a JSP page will return
a rendered HTML response, not the JSP page itself. It is recommended that
you use HTTP import for static pages and for sites that do not have FTP access.</p>
<p>To
import existing Web resources into the Web project using HTTP, perform the
following steps:</p>
</div>
<ol><li class="skipspace"><span>Create a new project where you wish to import Web resources using
the <span class="uicontrol">New Web Project</span> wizard.</span></li>
<li class="skipspace"><span>If you intend to use an existing project, select the project in
the Project Explorer view.</span></li>
<li class="skipspace"><span>Select <span class="menucascade"><span class="uicontrol">File</span> &gt; <span class="uicontrol">Import</span></span>. </span></li>
<li class="skipspace"><span>In the Import dialog, select <span class="uicontrol">HTTP</span> and click <span class="uicontrol">Next</span>.</span></li>
<li class="skipspace"><span>In the <span class="uicontrol">Specify the destination folder and the resources
to import</span> page, type the requisite project information. </span> <ul><li><span class="uicontrol">Folder</span> - The imported files are placed in the default
location (the Web content folder). You can click the <span class="uicontrol">Browse</span> button
to change the location for the imported files for your project</li>
<li><b>URL</b> - Type in the HTTP URL in the <span class="uicontrol">URL</span> field.
The URL should include the domain name and starting directory for the URL/initial
web-page. <ul><li>If you enter a directory URL without a start page (for example, www.domain.net/Sports/),
the default file name will be used when the web server returns HTML content
(for example, if you do not specify a default, index.html is used.). </li>
<li>HTTP crawling may create files that do not exist on the original server.
For example, an HTTP reference to a directory may cause a Web server to respond
with HTML content that describes the directory. The HTTP crawler saves this
response as index.html</li>
<li>If you enter just a domain name (for example, www.domain.net), the Import
wizard will try to find a default page in the document root directory. </li>
</ul>
If you click the <span class="uicontrol">Advanced</span> button, you have the
option of specifying a proxy connection in the Advanced Settings dialog box.
If you select the <span class="uicontrol">Use a proxy server</span> check box, you
will have the option of selecting a SOCKS or HTTP proxy, and supplying the
corresponding server and port values. </li>
<li><b>Depth limit while following HTTP links </b>- You can limit the scope
of import that follows links by selecting the appropriate radio button provided.
<ul><li><span class="uicontrol">No limit</span>- This option will allow the HTTP import
to parse through all pages within the domain. </li>
<li><span class="uicontrol">Limit to</span>- This option determines the depth limit
of link levels that are crawled. For example, if you choose 1, all web pages
within one link (level 1) from the page that it is being imported from will
be navigated. If you limit it to 2, then all level 1 links and the ones linked
directly to level 1 web pages will be imported. <p>For example, one might
specify a crawl depth of 2 and an initial URL http://host/initialLevel/index.html
. If index.html has a reference to http://host/initialLevel/L2/L3/index2.html
, then index2.html, which is at level 3, is filtered out and its content will
not be parsed for follow on crawling. </p>
</li>
</ul>
</li>
</ul>
</li>
<li class="skipspace"><span>Click <span class="uicontrol">Next</span> for more options, or <span class="uicontrol">Finish</span> to
import the Web site. </span></li>
<li class="skipspace"><span>If you select <b>Next</b>, in the <b>Specify appropriate import
options</b> page, select among the choices provided. </span> <ul><li><span class="uicontrol">Convert Links to document relative</span> - If you select
this option, links within HTML files are updated in a document-relative fashion,
rather than creating absolute links based on their new location in a file
system. </li>
<li><span class="uicontrol">Overwrite existing resources without warning</span> -
If you select this option, existing workbench files in your project will be
overwritten. If this option is not selected, files imported will not be overwritten.
There is no prompting for selectively over-writing files. </li>
<li><span class="uicontrol">Do not follow links to files in parent folders of the starting
URL</span> - If you select this option, you will prevent the FTP import
to crawl resources above the initial provided URL. For example, if the initial
URL is http://host/l1/l2/index.html and a link within the page references
http://host/index.html, this option will determine whether the linked resource
should be included in the import. If you do not have this option checked,
you run the risk to crawl very large sites, and importing huge volumes of
files unnecessarily.</li>
<li><span class="uicontrol">Connection timeout</span> - This option determines the
HTTP connection timeout value. It is measured, in milliseconds. Connection
timeouts are a way of specifying how long you would prefer to wait for a message
from the server before giving up.</li>
</ul>
</li>
<li class="skipspace"><span>Click <span class="uicontrol">Finish</span> to import the Web site with
options. </span></li>
<li class="skipspace"><span>Verify the resulting directory structure and file data integrity
in the newly-populated project or folder. </span></li>
</ol>
</div>
</body>
</html>