| <?xml version="1.0" encoding="UTF-8"?> |
| <!DOCTYPE html |
| PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| <html> |
| <head> |
| <link rel="stylesheet" type="text/css" href="../../com.ibm.help.doc/swg_info_common.css" /> |
| <title>Importing existing Web resources using HTTP or FTP</title> |
| <script language="JavaScript"> |
| function popup_window( url, id, width, height ) |
| { |
| popup = window.open( url, id, 'toolbar=no,scrollbars=no,location=no,statusbar=no,menubar=no,resizable=no,width=' + width + ',height=' + height + ',left=,top=' ); |
| popup.focus(); |
| } |
| </script><script language="JavaScript" src="help/liveHelp.js"></script></head> |
| <body id="twimpweb"><a name="twimpweb"><!-- --></a> |
| |
| <h1 class="topictitle1">Importing existing Web resources using HTTP or FTP</h1> |
| <div><div class="skipspace"> <p>You can import existing Web resources |
| |
| using wizards that invoke HTTP or FTP. These import wizards automate the transfer |
| |
| of complete Web sites into Web projects by:</p> |
| <ul><li>Enabling you to continue development of your Web applications while importing |
| |
| your Web resources.</li> |
| <li>Prompting you for information that helps you find and navigate through |
| |
| the Web site</li> |
| <li>Providing you with options for limiting the scope of the import and eliminating |
| |
| pages that you do not want to import</li> |
| </ul> |
| <p>These import wizards also support the import capabilities for Web |
| |
| servers that are equipped with firewalls. Both HTTP and FTP import support |
| |
| Proxies while FTP import supports SOCKS.</p> |
| <p>To use the HTTP or FTP |
| |
| import wizards, you must designate an existing project in which |
| |
| to import the files. You will be able to view all the files from the imported |
| |
| Web site within the selected project folder.</p> |
| <p>The HTTP import uses the |
| |
| HTTP protocol to crawl through the Web site based on an initial URL that you |
| |
| provide. The import action uses the URL to retrieve any HTML content available |
| |
| and also parses for HTTP links. The process repeats until it parses content |
| |
| and links that are referenced to other web pages that are encountered within |
| |
| the web site. HTTP import cannot parse pages that contain servlets or programs |
| |
| that are executed when a form is posted or embedded in JavaServer Pages (JSPs). </p> |
| <p>The |
| |
| files transferred to your project represent a logical snapshot of the Web |
| |
| site's URL. This means that your Web project is populated with files that |
| |
| are acquired by the HTML response of the serving site. This also means that |
| |
| it is not necessary that the physical resources on the serving site will be |
| |
| copied to your project. For example, an HTTP request for a JSP page will return |
| |
| a rendered HTML response, not the JSP page itself. It is recommended that |
| |
| you use HTTP import for static pages and for sites that do not have FTP access.</p> |
| <p>To |
| |
| import existing Web resources into the Web project using HTTP, perform the |
| |
| following steps:</p> |
| </div> |
| <ol><li class="skipspace"><span>Create a new project where you wish to import Web resources using |
| |
| the <span class="uicontrol">New Web Project</span> wizard.</span></li> |
| <li class="skipspace"><span>If you intend to use an existing project, select the project in |
| |
| the Project Explorer view.</span></li> |
| <li class="skipspace"><span>Select <span class="menucascade"><span class="uicontrol">File</span> > <span class="uicontrol">Import</span></span>. </span></li> |
| <li class="skipspace"><span>In the Import dialog, select <span class="uicontrol">HTTP</span> and click <span class="uicontrol">Next</span>.</span></li> |
| <li class="skipspace"><span>In the <span class="uicontrol">Specify the destination folder and the resources |
| |
| to import</span> page, type the requisite project information. </span> <ul><li><span class="uicontrol">Folder</span> - The imported files are placed in the default |
| |
| location (the Web content folder). You can click the <span class="uicontrol">Browse</span> button |
| |
| to change the location for the imported files for your project</li> |
| <li><b>URL</b> - Type in the HTTP URL in the <span class="uicontrol">URL</span> field. |
| |
| The URL should include the domain name and starting directory for the URL/initial |
| |
| web-page. <ul><li>If you enter a directory URL without a start page (for example, www.domain.net/Sports/), |
| |
| the default file name will be used when the web server returns HTML content |
| |
| (for example, if you do not specify a default, index.html is used.). </li> |
| <li>HTTP crawling may create files that do not exist on the original server. |
| |
| For example, an HTTP reference to a directory may cause a Web server to respond |
| |
| with HTML content that describes the directory. The HTTP crawler saves this |
| |
| response as index.html</li> |
| <li>If you enter just a domain name (for example, www.domain.net), the Import |
| |
| wizard will try to find a default page in the document root directory. </li> |
| </ul> |
| If you click the <span class="uicontrol">Advanced</span> button, you have the |
| |
| option of specifying a proxy connection in the Advanced Settings dialog box. |
| |
| If you select the <span class="uicontrol">Use a proxy server</span> check box, you |
| |
| will have the option of selecting a SOCKS or HTTP proxy, and supplying the |
| |
| corresponding server and port values. </li> |
| <li><b>Depth limit while following HTTP links </b>- You can limit the scope |
| |
| of import that follows links by selecting the appropriate radio button provided. |
| |
| <ul><li><span class="uicontrol">No limit</span>- This option will allow the HTTP import |
| |
| to parse through all pages within the domain. </li> |
| <li><span class="uicontrol">Limit to</span>- This option determines the depth limit |
| |
| of link levels that are crawled. For example, if you choose 1, all web pages |
| |
| within one link (level 1) from the page that it is being imported from will |
| |
| be navigated. If you limit it to 2, then all level 1 links and the ones linked |
| |
| directly to level 1 web pages will be imported. <p>For example, one might |
| |
| specify a crawl depth of 2 and an initial URL http://host/initialLevel/index.html |
| |
| . If index.html has a reference to http://host/initialLevel/L2/L3/index2.html |
| |
| , then index2.html, which is at level 3, is filtered out and its content will |
| |
| not be parsed for follow on crawling. </p> |
| </li> |
| </ul> |
| </li> |
| </ul> |
| </li> |
| <li class="skipspace"><span>Click <span class="uicontrol">Next</span> for more options, or <span class="uicontrol">Finish</span> to |
| |
| import the Web site. </span></li> |
| <li class="skipspace"><span>If you select <b>Next</b>, in the <b>Specify appropriate import |
| |
| options</b> page, select among the choices provided. </span> <ul><li><span class="uicontrol">Convert Links to document relative</span> - If you select |
| |
| this option, links within HTML files are updated in a document-relative fashion, |
| |
| rather than creating absolute links based on their new location in a file |
| |
| system. </li> |
| <li><span class="uicontrol">Overwrite existing resources without warning</span> - |
| |
| If you select this option, existing workbench files in your project will be |
| |
| overwritten. If this option is not selected, files imported will not be overwritten. |
| |
| There is no prompting for selectively over-writing files. </li> |
| <li><span class="uicontrol">Do not follow links to files in parent folders of the starting |
| |
| URL</span> - If you select this option, you will prevent the FTP import |
| |
| to crawl resources above the initial provided URL. For example, if the initial |
| |
| URL is http://host/l1/l2/index.html and a link within the page references |
| |
| http://host/index.html, this option will determine whether the linked resource |
| |
| should be included in the import. If you do not have this option checked, |
| |
| you run the risk to crawl very large sites, and importing huge volumes of |
| |
| files unnecessarily.</li> |
| <li><span class="uicontrol">Connection timeout</span> - This option determines the |
| |
| HTTP connection timeout value. It is measured, in milliseconds. Connection |
| |
| timeouts are a way of specifying how long you would prefer to wait for a message |
| |
| from the server before giving up.</li> |
| </ul> |
| </li> |
| <li class="skipspace"><span>Click <span class="uicontrol">Finish</span> to import the Web site with |
| |
| options. </span></li> |
| <li class="skipspace"><span>Verify the resulting directory structure and file data integrity |
| |
| in the newly-populated project or folder. </span></li> |
| </ol> |
| </div> |
| <p> |
| (C) Copyright IBM Corporation 2000, 2005. All Rights Reserved. |
| </p> |
| </body> |
| </html> |