Taking responsibility for your web content

01/08/2009

A new website for a business is an amazing investment, and can be an incredible tool for generating revenue. A CMS is an excellent way for non-technical users to have a website that they can maintain and perpetuate without technical intervention, saving time and money.

For compatibility and future-proofing, it is important that the content of a website keeps to certain standards. In some cases these standards are a legal requirement, further accentuating their pertinence.

There are many tools available to aid the authoring of content on the web, giving a WYSIWYG interface to fields that can accept markup. However, these tools are not without their disadvantages. Internationalisation and functionality causes the applications to be relatively large in size, resulting in longer download times and decreased performance. On older computers this can, in the worst cases, result in an application becoming virtually unusable. They are also not error-proof. It is very likely that close grouping of formatting or over-zealous deleting of text can render the markup generated by these tools invalid and, therefore, useless.

I am of the opinion that there is very little need for these types of WYSIWYG editor on the net. Markup languages, from a purely formatting point of view, are incredibly intuitive and the learning curve is shallow so why not learn the basics (all you need) and really take charge of your site?

If you're still with me, I applaud you and welcome you to read my "Introduction to Markup Languages from a Non-Technical CMS Author's Perspective". Ah, brevity - that title's going to stay with you, I can feel it.

First and foremost, it's important to know a little about what a markup language actually is. If you've used the Internet at all, I don't doubt that you recognise the acronym "HTML". HTML is a type of markup language focused on organising data in a document for display on the web. Data is organised hierarchically using elements, many of which have their own default display properties in browsers. The way an element is used to separate data is very simple - you put a tag at the beginning of the data, and a corresponding one at the end. A set of tags is usually constructed in the following way:

<tagName>your content</tagName>

From an authoring perspective, there are very few tag names that you need to know about, and they are all concerned with visually formatting your data. For example, you can italicise a portion of text by enclosing it with the "em" tag:

<em>italicised text</em>

And by the same logic, you can embolden text using the "strong" tag:

<strong>emboldened text</strong>

It's really that simple. You now have an understanding of how visual data is displayed on nearly every page on the Internet (there are different types of markup that you don't need to worry about unless you're bored/inquisitive!). I bet you wondered what all the fuss was about! So do I—every day.

Now all that's left is to describe all of the data in the document you're creating. It's not just bold and italic content that needs to be described. The beginning and end of paragraphs, headings and lists need to be accounted for if you want to display in the expected way. Then there are slightly different tags that are used to describe links to other documents and embedding images.

Displaying paragraphs and headings follows the same format as the aforementioned tags. The names you'll be needing are "p" for paragraphs, "h1" for main headings (only use one of these per page), "h2" for secondary headings and so forth up to "h6" for senary headings:

<h3>Tertiary Heading</h3>

Listing data requires a slightly different mindset. Firstly, you define a list ("ul" for unordered/bulleted lists and "ol" for numbered/alpha lists) as a group of list item ("li") elements:

<ul> <li>A list item</li> <li>A list item</li> <li>A list item</li> </ul>

<ol> <li>A list item</li> <li>A list item</li> <li>A list item</li> </ol>

Now that you know how to group elements, we can move onto something a little more involved (only a little, though—promise!). As I said before, a link ("a" tag) is used to direct people to other documents on the web. To do this, the "a" tag uses an HTML attribute called "href". All attributes are space-separated and defined in the opening tag with the format attributeName="attributeValue", so the "href" attribute of an "a" tag becomes:

<a href="http://www.google.com/"&gt;Clickable text</a>

Now the "href" attribute contains a URL for google.com, and when rendered on your page, the text "Clickable text" will navigate your browser to http://www.google.com when clicked by a user (I told you it wasn't that much more difficult!).

Finally, an author needs to be able to embed images in their page. This introduces a new type of tag—and the only potential source of confusion when describing a document—the self-closing tag. A well-structured markup document closes every tag that it opens in the order that it opens. That might seem convoluted, but all it means is that if you open a "p" tag then an "em" tag, you must close the "em" tag before the "p", thus:

Correct: <p>Some content with an <em>italicised section</em>.</p>

Incorrect: <p>Some content with an <em>italicised section.</p></em>

Although it might seem unnecessary or obvious, it links back to the hierarchical nature of markup languages and the second example shows a contravention of the parent/child relationship implied in hierarchical data organisation.

This in mind, self-closing tags play an important part in maintaining the structure of a document. These tags do not format any text in a page, but they describe a non-text element. Examples are images ("img"), line breaks ("br") and horizontal rules ("hr"), and the format is as follows:

<br/>

Logically, a line break is not required to describe or format any text, but to act as its own entity. That being said, it is not recommended to use line break elements unless you absolutely must. An image is embedded in a document using a combination of the self-closing tag and the attribute "src" (abbreviated from "source", for ease of recollection in future!). This is the bare minimum that is required for a functioning image to be embedded. However, for accessibility and validation purposes, the "alt" (abbreviated from "alternate text") attribute must also be included. The value of this attribute is used to describe the images to browsers or users that cannot render/see the image as everyday users. This could be visually impaired users or search engine spiders reading your site. A complete, valid image is embedded using the following code:

<img src="http://www.domain.com/path/to/image.jpg" alt="A description of the image"/>

This code embeds an image or, in the event that the image cannot be found, displays the text "A description of the image". In order to find the value for the "src" attribute of an image, navigate to the document containing an image, right click and select the option that best fits "view image" or "copy image address" so that you can paste the URL as the value of the "src" attribute.

Now you're able to format any data you need to be able to author content on the web, all that's left is the attention to detail.

Tag attributes can accept any character, and can be enclosed with either single or double quotes. Problems can arise when you need to put single or double quotes in your attribute. Of course, if you only need to be able to put one in, just enclose the attribute with the other and you'll be fine, but when you need both there could be problems. Single or double quotes can be included in any attribute by using the escape character backslash "" before the quote:

<img src="http://domain.com/path/to/image.png" alt="So I was, like, "You're so weird!""/>

With the above code, the text So I was, like, "You're so weird" will display if the image is not found, for whatever reason.

So there we have it, a simple but comprehensive introduction to everything you need to author content in your CMS or blog. Even though your site may have a WYSIWYG editor, knowledge of the language gives you increased control and decreases the chances that you'll break your site with the content you create.

Further Reading:: Luckily, there is amazing documentation for HTML (check the list of basic tags at the top left of that page) that you can peruse to once you're done with this article and you need a quick reference.