WordPress Import Static HTML Site

June 20, 2014 Maithilish

Chapter 10. WordPress Import

10.1. WordPress Import Static HTML site

In the previous chapters, we saw WordPress installation and optimization. In this chapter, we cover importing Static HTML site and pages as WordPress Posts or Pages and managing dynamic stylesheets of web pages in WordPress.

Sample HTML Files

To import static HTML site or pages, we need to access OpenShift Shell through RHC or PuTTY (for Windows users). Appendix B, OpenShift RHC Client Tools explains RHC and PuTTY setup. Before continuing, setup these tools and download and extract HTML sample files to OpenShift server as explained in the Appendix.

HTML Import 2 Plugin

To import static HTML site or pages to WordPress, we use HTML Import 2 by Stephanie Leary. Install the Plugin as explained in Chapter 6, WordPress Plugins.
Setup HTML Import

Before importing the pages, we need to setup the Plugin to adjust its behavior during import.

Go to the HTML Import Settings page using SettingsHTML Import. In the settings page, select Files tab. Set Directory to import to the directory, which contains the sample HTML pages.
WordPress import static HTML site - WordPress HTML Import 2 Plugin Settings
Plugin defaults this field to /var/lib/openshift/<app UUID>/app-root/data/current/html-files-to-import. Leave the path till data as it is and change current/html-files-to-import as htmlimport so that the final path looks something like – /var/lib/openshift/<app UUID>/app-root/data/htmlimport. Screenshot shows an example UUID, replace it with the real UUID of the OpenShift application. Save the settings.
Next we need to setup how the Plugin should treat the contents. In HTML Import Settings, choose Content tab.
WordPress import static HTML site - WordPress HTML Import 2 Plugin Settings Content Tab
To import contents between the body tags, from the HTML files, set Select content by to HTML tag and enter body in the Tag field. Leave attribute and value to blank. There is also an option to import the entire file including the header, but this may affect the WordPress header.
In More Content Options, we need to check following items.
  • Import Linked Images – to import linked images to the Media Library.
  • Import Linked Documents – to import linked documents the Media Library.
  • Update internal links – to suffix internal links with the site URL.
  • Use Meta Description as excerpts.
  • Convert Special Characters.
When importing Word or Frontpage HTML pages, check the Clean up bad.
Save the settings.

Import Posts

With HTML Import Plugin, we can import HTML pages either as WordPress Posts or Pages. To start with, let’s import sample pages as WordPress Posts.
Open HTML Import Settings and select Titles & Metadata tab and set Import Files as to Posts. We may also set publish status, timestamp and author fields. Save the settings.
WordPress import static HTML site - WordPress HTML Import 2 Settings - Titles and Metadata Tab
HTML Import Plugin by default uses the page title tag to set the permalink of the post. For example, when a page with title WordPress Tutorial is imported to codetab.org, permalink is set as http://www.codetab.org/wordpress-tutorial/. Before importing, set proper title for the pages as the keywords in URL is essential for SEO.
Finally, to import files go to ToolsImport and select HTML option from the Import list to open the HTML Import screen. Select a directory of files and click Submit to import the files in app-root/data/htmlimport directory.
WordPress import static HTML site - WordPress HTML Import 2 - Import a directory of files
Alternatively, we can also import a single file from the PC, but this option does not import the linked media files. In codetab.org we usually use the first option.
Before the import, we can set categories and tags to posts in Categories and Tags tab. We can also attach categories and tags after the import. Imported Posts are available in PostsAll Posts and select the posts and do a Bulk Edit.

Import Pages

HTML Import 2 Plugin can also import static HTML pages as WordPress Pages. To import as WordPress Pages, before the import, go to HTML Import Settings and select Titles & Metadata tab and set Import Files as to Pages.
WordPress import static HTML site - WordPress HTML Import 2 settings to import pages
WordPress Page can not have categories and tags, however, it can have a parent page. Parent page is set in Import pages as children of option, where we can select another page from the dropdown list as the parent for the imported pages. If this field is set to None(top level) then imported pages becomes top level pages.
We can also set the parent after the import. Select pages from PagesAll Pages and do a bulk edit to attach parent page.
In codetab.org, we use the top level page wordpress-tutorial, as the parent page for the other pages of this tutorial. As already explained this parent page is known as corner page in SEO parlance.

Sample HTML Pages

In the htmlimport directory contains sample HTML pages in two directories:
  • directory tea contains HTML page – History of Tea in China
  • directory coffee contains HTML page – History of Coffee
When HTML Import 2 Plugin import them as WordPress Posts it attaches them to the default category – Uncategorized. However, when we import them as WordPress Pages, Plugin attaches them to a parent page named after the directory. For example, it creates a blank page with name Coffee and attaches the History of Coffee as its child page!
Each set of imported Static HTML pages may have come with a separate stylesheet and managing them is crucial to preserve the page style. In the next section, we learn about WordPress Child theme to manage the dynamic stylesheets.