Web Standards – A Crash Course (Part 1)

Web Standards – A Crash Course (Part 1)

These are the notes for a presentation I delivered in May of 2006. You can see the accompanying slides as an interactive Quicktime movie (6.1 MB). I recommend opening the slides in a separate window so you can follow along.

Introduction

In this presentation, I’m going to be talking about web standards. We’ll discuss the philosophical basis of what web standards really mean, then move on to a few guidelines to follow to produce better websites. Lastly, we’ll step through some of the technical aspects of building compliant sites with solid HTML and CSS.

I’ve devoted the past several years to understanding web standards, and continue to learn new things every day. This topic is much deeper and broader than anything we can explore in a just few hours with a slideshow, so consider this a highlights reel. My hope is that you’ll come out of this with a better understanding of how the web works and how we can all benefit from following web standards, as both users and as developers.

What are web standards?

For starters, what are web standards? What does the term refer to? In a nutshell, web standards are the functional specifications for the core languages the web is built on. They define how user-agents should handle and interpret these languages, and how web authors should write them.

These languages have been mapped out by the World Wide Web Consortium — a multinational, non-profit organization of very smart and incredibly meticulous people who spend their time arguing about how things should work on the web. There are other standards, defined by other standards bodies, but for our purposes we’re going to focus on the W3C open standards for HTML and CSS.

Web standards are not laws, but a set of recommendations and best practices. They become the standards when we — browser makers and web authors alike — agree to follow the same set of rules.

Language is an agreement

Two people speaking the same language must agree to follow certain rules of grammar and sentence structure, and accept common definitions of their words in order to communicate with each other. Breaking those rules makes communication extremely difficult, even impossible.

The same applies when computers talk to each other — which is, after all, what the internet is all about. For one computer to understand another, they need to speak the same language and follow the same rules.

So you can think of web standards as the rules of grammar that make web languages understandable. Assuming that web browsers are programmed according to the specifications, we need to speak a language the browsers can understand to get our message across. By the same token, a browser manufacturer who wants their software to operate correctly needs to build it so it correctly interprets the language as it is written.

Luckily, the vast majority of browsers out there today are very fluent in the standardized forms of HTML and CSS. This wasn’t always the case. Even though these standards have been around for a decade or more, it’s taken that long for the browsers to catch on.

In the late 90s — right around the time the web was really taking off — Microsoft and Netscape were the two key players in the browser market, and they were locked in a bitter rivalry, competing for market dominance. In the name of innovation, both companies often ignored the existing standards and introduced their own proprietary features that weren’t part of the original language specifications. You could write a bit of code that would work in Internet Explorer but not in Netscape Navigator, and vice versa. But web authors who wanted their sites to function equally in both major browsers often had to go through a lot of extra effort to get the job done.

For example, we had to “fork” our JavaScript to work with proprietary object models, basically writing everything twice, and any browser that doesn’t understand one of the two simply fails entirely.

Or if that was too complicated, we’d sometimes build a site for one browser and exclude any others. If you don’t fit our criteria for support, tough luck, we don’t want your business.

Or, perhaps the most common approach, we’d build sites in a way that was proven to work in different browsers, even if it meant writing a lot of complicated HTML. We used tables to lay out boxes and columns, invisible spacer images to force those columns to be the size we wanted, and lots of embedded font elements to change the appearance of text.

This is what was required if we wanted to our pages to look nice in both Netscape and IE, and you still see this kind of old-school tag soup all the time. But this approach is messy, inaccessible, really hard to maintain, and fundamentally wrong.

Luckily for everyone the browser wars reached a kind of stalemate a few years ago (though Microsoft will claim victory) and since then we’ve been gradually reclaiming the scorched earth. Today, instead of writing a hundred lines of hard-to-maintain tag soup, we can write HTML the way we should have been writing it all along: according to agreed-upon standards.

A simple h1 element is understood perfectly by every version of every browser ever produced, and will remain understandable long into the future. All browsers recognize this as the markup for a top-level heading.

Of course, this markup won’t be especially attractive when rendered with a browser’s default style settings. But it’s not meant to be. HTML was never really intended to describe the way content looks. It merely provides structure and meaning to what would otherwise be plain text.

But nobody wants to look at default fonts at default sizes in a bland, top-down hierarchy. The web is all about communication, and a vital part of that is visual design. When the web became a commercial venue, graphic design became even more important as a vehicle for branding and visual identity. But our tools were limited, and so we resorted to all kinds of hacks and trickery in the HTML under the hood to make content look decent on screen.

What we needed was a method of affecting the presentation of content without complicating the markup behind it. And that’s exactly what Cascading Style Sheets do.

With a little bit of CSS, we can change the font and color of our ugly title, and give it some space and decoration to make it a bit more pleasant to look at. With some different CSS we can get a whole new look, and we can even add imagery to really make it pop.

But the key thing to notice here is that each of these examples is still the same, simple, understandable h1 element. The visual presentation of it lays on top of the markup in a separate layer.

This was the big turning point a few years ago: the release of newer browsers with improved support of CSS, which has been an official standard for a decade but wasn’t reliably usable until 2000 or so. But today the older browsers are dropping away and the bulk of CSS is very well supported by the current generation. There is no longer any reason to keep building sites the way we did in the late 90s. We can finally create the web the way it was meant to be — HTML for structure and CSS for style.

Layers

A web page is comprised of layers.

The content layer, marked up with HTML (or XHTML), which tells a user agent how to denote the different parts of the document and what they mean. This gives us our structural foundation.

The presentation layer, written in CSS, tells the browser how to visually render the document. It lays over the structured content, changing its appearance but leaving the structure intact.

The behavior layer is written in a client-side scripting language, usually JavaScript. This tells the browser how to react to certain events, letting us manipulate parts of the document to enhance functionality and interactivity.

These are different languages performing different tasks. When these facets of the page are all mixed together, the language becomes muddy and difficult to understand. But if we separate the layers, we end up with three distinct-yet-connected beings, each of which can be modified and maintained without interfering with the others. Each layer then becomes lighter and stronger.

Gotta Keep ’em Separated

Separation of content from presentation and behavior is one of the key concepts of developing with web standards, and it brings with it some clear benefits:

Ease of maintenance

With content and presentation moved to their own layers, each one becomes easier to deal with. We can change content without rebuilding the whole page to readjust the design, and we can redesign an entire site without touching a single tag of HTML.

Quicker development

Clean markup is much more human-readable, so it’s easier to figure out at a glance. You can open a document someone else built and get right to work without having to dig through a bunch of presentational tag soup. When pages share an external style sheet, you can build new pages and they’ll be designed automatically.

Interoperability and device independence

Content can be made understandable to every device and user agent capable of parsing HTML, whether it’s a desktop browser, a PDA, a cell phone, or a Bluetooth toaster. Those that also understand CSS will draw the page all nice and pretty without losing that fundamental understanding of the markup.

Faster page loads and lower bandwidth usage

A style sheet gets loaded once, then cached for use on subsequent pages. This means the only thing a browser needs to download from page to page is new content without a lot of heavy presentational markup. Simpler markup is also rendered faster, since the browser has to do less work to interpret it.

Improved accessibility

Not everyone who uses the web does so with eyes and a mouse. Many visually impaired people use assistive software that reads the content aloud, so it’s important to make that content understandable without visual cues. People with reduced motor functions use the keyboard to navigate, so it’s important to make a site usable without a mouse. When the markup is clean and meaningful the site will be much easier for disabled people to use.

Better search engine indexing

Search engine crawlers are user-agents that parse raw markup, and are only hindered by a lot of complicated presentation. When the content is well structured it lets the crawlers work more efficiently, making our pages more findable in search engines.

Q: What are web standards?

We know that web standards are the specifications for how web languages should be written and interpreted. All languages have rules, and it is those rules that make communication possible. On the web, we communicate to the browser with HTML, CSS, and JavaScript.

Q: Why are they important?

Standardization makes the web work better. By adhering to agreed-upon standards, we can make our sites faster to build, lighter to deliver, easier to maintain, forward and backward compatible with a wide range of devices, easier to find with search engines, and more accessible to more people.

Well, that’s all nice, but… Q: How do we do that?

This one is a bit harder to answer. While the rules of standards are pretty straightforward, there are a million and one ways to skin that particular cat.

Best Practices

These are unwritten standards, guiding principles to bear in mind as you’re building standards-compliant websites.

Separation

We’ve discussed separation already. At all times, try to maintain a separation of content from presentation. When you write any bit of HTML, ask yourself if what you’re adding is structural or presentational — in other words, does this enhance the meaning and utility of the content, or does it only affect the way it looks? Sometimes you have to compromise and use a little bit of presentational markup to get the desired effect, but do so only as a last resort. Every non-essential, presentational tag takes a little slice off the quality and integrity of the content. In many cases, a visual design that would necessitate bad markup may call for an adjustment of the design.

Accessibility

We’ve touched on this as well. As you’re building pages, always keep accessibility in mind. Consider how a screen-reader might cope with your markup, how the page may render without your style sheet, how it will degrade without JavaScript. Is it still readable and usable? Accessible content also tends to be more usable even to fully-abled people, and will also be more search engine friendly.

Validation

Validating our documents ensures that we’re following the rules. Learn to love the W3C validators: they’re like a grammar-checker for HTML and CSS, and will help you spot easy mistakes like unclosed elements or missing attributes. Validate your markup repeatedly and address the problems the validator points out. While most browsers are capable of coping with minor missteps, relying on their built-in error handling is a slippery slope.

Note that there’s a difference between “errors” and “warnings”; warnings are usually recommendations for things that may warrant a second look. Errors are things that are just plain wrong and should be fixed.

Of course, validation is not the be all and end all. You can make the most atrocious markup validate and it’ll still be atrocious markup. And just as easily, you can write impeccable, poetic markup and lose your validation badge with a single un-escaped ampersand that may be beyond your control. Strive for validity, because an invalid document is generally indicative of sloppy coding.

Semantics

Semantics is the study of meaning. When we speak to each other, the words we use have to mean something. If we don’t know what the words mean, we can’t communicate. It�s as simple as that.

Every element in HTML brings with it an inherent meaning and sense of purpose, and it passes that along to the content it surrounds. When you’re writing HTML and need to choose which tags to wrap around a particular segment of content, consider the meaning and purpose of that content, and select the most appropriate element to support that.

Is this string of words a heading? Mark it up as one. How important is that heading in the context of the whole document? Assign it the appropriate rank.

Is this group of items a list? Make it so. Is the order of the items significant, or will they mean the same thing in any sequence? The list will tell you what kind of list it wants to be.

Paragraphs should be paragraphs, quotes should be quotes, and tables should be tables. When your markup is semantically correct it will be easier to read, easier to change, easier to style, more flexible, more accessible, and more searchable.

Semantic markup is understandable markup. It makes more sense to us when we write it, makes sense to other coders when they have to work with what we’ve written, makes more sense to the browser, and ultimately makes more sense to the end user.

Markup — The Underlying Structure

HTML, as we know, is how we give structure to our content. It�s how a browser can tell the difference between a heading and a paragraph. So in truth, content and structure are different things, like two divisions of the same layer. But they’re inseparable. Content without structure is nonsense, and structure without content… frankly I’m not even sure that’s possible.

If content is the meat of our page, markup is the skeleton that supports it and makes it work.

Anatomy of a document

All web documents must conform to a basic architecture, with a few required components.

Jargon Alert

Before we go on, let�s get some terms straight.

The words “document” and “page” are sometimes used interchangeably, but they really refer to two different things.

Document
The plain text file containing the content and markup which is served to the browser.
Page
The content (text, images, objects) when rendered by a browser.

Just remember that a page is what you see, but the document is what you build.

And it looks something like this.

This is a complete, valid document, with all the vital pieces. Let’s go one by one:

DOCTYPE

This is the document type declaration. It announces to the browser what sort of document it’s about to receive and what set of rules should be used to interpret it.

A complete doctype is required, and must be the very first thing in the document. If it’s missing, or incomplete, or malformed, or if anything other than whitespace appears before it, the entire document is invalid.

Most browsers will still attempt to render the document, for the simple fact that a lot of documents out there neglect to include one. For backwards compatibility, modern browsers have a built in “doctype switch” which lets them fall back on a looser rendering mode when they’re dealing with outdated, poorly written documents. We call this “quirks mode” because it’s intended to compensate for the quirks of bad markup. Rendering in quirks mode is unpredictable and will be inconsistent between different browsers and even different versions of the same browser.

Quirks mode can be easily avoided, of course, by including a complete and correct doctype (which triggers so-called “standards mode”).

html

The html element encompasses the entire document. It’s also called the root element because it is the beginning of the entire document tree.

The html element is required and must be closed to mark the endpoint of the document. Nothing should be outside of this except the doctype.

head

This is the document head, which is also required. It contains information about the document we’re dealing with such as the title, meta elements, style sheets and scripts. The header’s contents aren’t rendered by the browser, with the exception of the title element…

title

Which appears in the window title bar on desktop browsers. The title is also usually the default title when a page is bookmarked, and will be the title that shows in search engine results. A title is required, and can only appear inside the head.

body

After the head, we get to the body element, which contains everything that will be rendered in the browser portal. This is also a required element, which must have a closing tag to mark its ending.

Jargon Alert

Tag
This is the base unit of HTML, and it marks the beginning of an element. Most tags have a matching counterpart to mark the end of an element, called its “closing tag.”
Element
An element is made up of an opening tag and a closing tag, and everything in between. The only exceptions to this are empty elements.
Empty element
Specific elements which do not, and in fact cannot, contain any text content. In HTML, empty elements should not be closed, but in the stricter rules of XHTML all elements must be closed, so empty elements are terminated with a “trailing slash.”

HTML or XHTML?

Come with me on a brief tangent, delving deep into standardista geekery. In the 15 years that HTML has been around there have been several different versions of the language. The most recent (and final) version was HTML 4.01, published as a standard in 1996.

Since then, focus has shifted to XHTML, which is actually a version of XML. Think of XHTML as a predefined set of XML tags that mirror the tags defined in HTML. When writing XHTML you must follow the more stringent rules of XML.

XHTML HTML
All elements must be closed Some elements do not require closing tags
All tags and attributes must be lowercase Tags and attributes are case-insensitive
All attributes must have a value Attribute minimizing is allowed
Attribute values must be quoted Quotes are optional

If you’ve ever worked with XML, you know that it’s very intolerant of validation errors. As soon as an error is encountered, the document will not be parsed any further and the entire thing falls down.

XHTML, just like XML, can be extended, parsed, transformed, and converted a thousand ways, and so XHTML is the recommendation for building future-proof websites. Unfortunately, we�re not quite there yet because of one lingering problem: Internet Explorer for Windows.

XHTML should correctly be served with a MIME type of “application/xml+xhtml”, which means it will be parsed as XML with all the strictness that entails. However, Internet Explorer for Windows, the browser used by more than 80% of the browsing public, does not support documents served as application/xml+xhtml, which effectively prevents us from using XHTML as it was intended. Last I heard, IE7 still won’t handle XHTML served as XML when it is eventually released.

But in the interest of backwards compatibility, an XHTML document can be optionally served as “text/html”, which loosens the rules and treats it as plain HTML. As such, XHTML served as HTML is still perfectly valid and well-supported, even by older browsers, and will still render just dandy. At that point we lose the extensibility of XML, so the real benefits of using XHTML are minimal except to the anal-retentive author who just wants to keep things tidy and future-proof.

Then there’s the matter of the XML declaration, which comes before the doctype in XHTML documents, but which IE chokes on and falls into quirks mode and everything gets rendered badly. So we end up omitting the declaration for IE, thus making our XML malformed and in the end all we’re writing is HTML anyway.

But HTML 4.01 isn’t going anywhere so there’s no reason not to use it if that’s what you prefer. That said, I still like to follow the basic rules of XHTML even when writing HTML, and to follow strict rules even when the doctype is transitional. This means writing tags and attributes in lowercase, quoting attribute values, nesting elements correctly, avoiding deprecated elements and attributes, and closing every non-empty element. Empty elements in HTML don’t require a trailing slash, and in fact a trailing slash can set off an avalanche of validation errors because it will be interpreted as a premature closing of the previous non-empty element.

So, after all that alphabet soup technobabble, the main rule of thumb when choosing a markup language is to simply pick one and stick with it — follow the rules of your chosen language.

But let’s get back to markup and talk about some common elements and how they should be used with semantics in mind.

Headings

A heading usually indicates the beginning of a new section of content on the page. There are six levels, h1 through h6, arranged in order of importance. Visual browsers will, by default, render them at different sizes (with h1 being the largest), but don’t abuse this trait.

h1 Means “this is the most important heading,” NOT “this text is large and bold.”

Since h1 is the most important heading, there should be only one on a page, which will usually be the name of the site or the page title.

But below that we can have any number of subheadings (h2 through h6). These headings create a hierarchy, and so they should come in the correct order if possible — avoid jumping from h2 to h5. In all likelihood you’ll rarely use a heading below h4, but it�s nice to have those extra options. Just remember to use them to indicate the relative importance of the content and its position in the hierarchy. Don�t concern yourself with its size or weight — that’s what CSS is for.

And just a personal pet peeve that I’m compelled to harp on: there is a difference between headings and headers. A heading is a ranked title, a header is a segment of a document that contains data about that document. If you use the same word for both things you’re just asking for confusion.

Paragraphs

This one is pretty self explanatory, but worth mentioning because it’s important for semantically rich content. A paragraph consists of one or more sentences that encompass one complete thought or idea.

In the old days, we’d often fake the appearance of paragraphs by using double line-breaks (<br><br>), but this has no semantic meaning — it doesn’t conceptually distinguish two separate ideas and is purely presentational. Don’t do it.

Lists

A list is composed of one or more related items. While a list of one is valid and may even be semantically correct in some cases, lists usually group several items together.

There are three types of lists in HTML:

<ol> indicates an ordered list
<ul> indicates an unordered list
<dl> indicates a definition list

Items in an ordered list come in a deliberate sequence and will usually be displayed with a number by each item. Items in an unordered list can be arranged in any order without changing their semantic meaning, and will usually be rendered with bullets.

Of course, we can control the appearance of these with CSS, switching Arabic numerals to Roman numerals or English letters, and replacing default bullets with other symbols or fancy images. CSS lets us do almost anything with the presentation, so once again, focus on the meaning of the content first. For example, a navigation menu is essentially a list of links, so why not mark it up as one? It will give it more structure and meaning.

Definition lists are a bit different. Each item in a definition list consists of at least two items:

<dt> is the definition term
<dd> is the definition description

There is a semantic relationship between the term and its definition and these two elements are a bound couple: one cannot live without the other. A term must have at least one definition (but can have more), and a definition can only relate to a single term. With multiple terms in a list, the appearance of a new <dt> terminates the previous sequence of definitions and begins a new segment of the list.

Because of this semantic symbiosis, definition lists are often used for things other than definitions. An image might be wrapped in a <dt>, and a caption describing the image would be the corresponding <dd>. I’ve used definition lists for FAQs, with the question taking the place of the term and its answer posing as a definition.

There’s some debate about the correctness of this kind of usage because it’s straining the limits of the element’s intended purpose. But semantic markup is all about using the “most semantically appropriate element” for the content at hand. In my book, establishing that meaningful connection between the elements is what counts, and we can sometimes conveniently ignore what the “d” in <dl> stands for. In the end, use your own judgement.

Blockquotes

A blockquote is an extended quote, usually from an external source. The contents of the blockquote element can be almost any other HTML, but a blockquote must contain at least one block-level element, usually a paragraph; blockquote by itself cannot have inline children.

By default, most browsers display blockquotes as indented text. Do not abuse this. In the old days we’d often nest multiple blockquotes just to create wider margins around our text, but this is just plain wrong. Use CSS for margins, quotes should be quotes.

Jargon Alert

Block-level
Elements that form containing boxes around their contents and appear on their own line. Some can contain other block-level elements.
Inline
Elements that appear alongside text and do not appear on their own line. Inline elements may only contain other inline elements, and must themselves be contained by a block-level parent.

All the elements I’ve mentioned so far are block-level elements. Headings, paragraphs and lists are the basic building blocks of readable prose.

Images

The img element is inline and is one of those empty elements we mentioned earlier. The src attribute (abbreviated form of “source”) is required, and its value is the URI where the image file lives on the server.

An important attribute for images is alt — it’s an attribute, don’t call it a tag — which provides alternate text to display if the image is unavailable, if images are disabled in the browser, or in text-only browsers like Lynx. It will also be read aloud by screen-reading software, giving unsighted users an informative description of the image. This attribute is required in strict doctypes.

If your image is presentational, meaning it’s only decorative and doesn’t really mean anything in context, you can effectively “hide” it from screen readers with an empty alt attribute. But purely decorative images probably don’t belong on your content layer anyway, they’re better handled by CSS.

IE for Windows inexplicably displays alternate text as a tooltip when the mouse pointer lingers over the image, so some web designers have misappropriated the alt attribute for its tooltip effect. Or even worse, they’ll omit the alt attribute entirely to prevent IE from showing its tooltip when they don�t want one.

Most browsers, including IE, will display the contents of a title attribute as a tooltip (though Opera shows it in the status line). When alt and title are both present, the title trumps the alt. If, for some reason, you don’t want to include a title, unwanted tooltips can be prevented in IE by including an empty title attribute.

Some shady SEO types recommend stuffing keywords in the alt and title attributes because it’s not normally displayed to users, and is thus a way of cloaking words in a place where searchbots will find it. This will likely get you banned from Google, but even worse, it makes the image completely meaningless and inaccessible. You�re hurting the Internet, don�t do it.

Because the alt attribute is intended as a text equivalent for the image, it should describe the image itself, as if you were describing the picture to someone over the phone. The title attribute can be more descriptive of the image�s greater purpose or meaning in the context of the page, like a caption or, indeed, a title.

Other optional but highly recommended attributes are height and width, specified in pixels. These attributes allow a browser to reserve space on the page as the text is drawn around it, and the image will download into that space. If these attributes are missing the image will still display, but the browser won’t know its dimensions until it has finished downloading, which can make content jump around unpleasantly.

There are some other presentational attributes you’ll often see on image elements: align, hspace, vspace, and border. These are all deprecated in XHTML, but even in HTML they’re purely presentational and should be avoided. Their effects are better handled by CSS.

Jargon Alert

Deprecation
Removal of an element, attribute, property or method from standard recommendations. Deprecated features should be phased out.

An example that has been heavily abused is the font element, which has been deprecated since 1996. It�s outdated, it’s invalid, and it’s presentational. Don�t use it.

Emphasis: <em> and <strong>

Quite simply, em indicates emphasis and strong indicates strong emphasis. Every graphical browser I know of displays them as italicized and boldface, respectively.

But they’re more than just styling text, these elements add another level of meaning to the content they surround, so they’re preferable to the presentational i and b elements. Those have a similar visual effect, but it�s only visual and adds no semantic value.

<b> means “this text should appear bold” while <strong> means “this text is strongly emphasized.”

Division

The div element is used to group segments of content into logical divisions. The element itself is semantically neutral, but not entirely meaningless. A div simply says “these things belong together, and are separate from those other things.”

The div element is a content organization tool, not a page layout tool. The fact that they act as very convenient hooks for CSS is just a side benefit. But because divs make it so easy to manipulate blocks of content with CSS, they’re often and easily abused for presentational purposes.

Think about semantics, and when you choose to use a div try to ensure it makes sense to use one. Don’t use a presentational div when you may have a more meaningful element available.

Span

The div�s inline cousin is the span element. It’s semantically neutral as well, and just serves to distinguish one string of text from the text that surrounds it. Don’t misuse this one either, save it for when no other element makes semantic sense.

Tables

In the early days of the web, before CSS was widely supported, designers figured out how to use tables for page layout. Tables have a strong allure for the visual designer because they provide a tidy grid of rows and columns that we can exert precise control over. The problem is that it leads to messy, unmaintainable, inaccessible, presentational markup that is impossible to understand. Using tables for page layout was always an unfortunate hack, and there is no longer any need for it.

But it’s a terrible misconception that web standards forbid the use of tables. Rather, web standards urge the correct use of tables as they are semantically intended: for sorting tabular data.

Tables in HTML consist of several separate elements, some of which are required, and some of them must contain other required elements. Here’s a simplified example of a table of data:

table

A table element lives between, surprise, a pair of <table> tags. A table must consist of at least one row, and that row must contain at least one cell, which makes good sense.

Summary

I’ve included a summary attribute in the opening <table> tag, which is a nice accessibility enhancement; it will be read to a blind user to describe the table they’re about to dive into, and they can skip it if it doesn’t interest them.

thead

The next section is the thead element, which as you can guess is the table header. This is where we label our columns by using the th element.

th

I’ve also added a scope="col" attribute to indicate that this is the heading for a column. Another possible value for the scope attribute is “row” — guess what that means.

tfoot

After we close the table header we come to the table footer. This seems completely unintuitive, but it comes before the rest of the table so a browser can render the footer before it receives all the data. This is good for very long tables with lots of data that will take a while to download.

In long tables it’s customary to repeat the column names in the footer, but here I’m using it simply for a footnote labeling the entire table. Semantically, this would be better accomplished with the caption element and the caption-side property in CSS to place it at the bottom, but that has somewhat dodgy support, so I tend to only use caption for titles above a table. Using tfoot for a table’s footnote is an example of making a minor semantic compromise for the sake of presentation.

The footer, like the header, must contain at least one row and at least one cell.

Colspan

The colspan attribute allows one cell to span more than one column, as you could have guessed.

tbody

Then we come to the table body. The tbody element is usually optional, since the browser simply assumes it’s there and will actually insert one on its own. We can declare it ourselves and take some of that burden off the browser. And as it turns out, if you�re including a thead and a tfoot you must also include a tbody so the browser can tell things apart.

tr

A table row is indicated by the tr element. Not much else to say about that.

td

The row is split into cells with the td element, which stands for “table data.” A row must contain a cell, and a cell must be contained in a row; these elements cannot survive on their own.

Tables are block-level elements, so they naturally begin on a new line. However, they don’t naturally occupy the full available width as other block elements do. A table will, by default, only be as wide as its contents.

There are a lot of extra attributes that can be added to table elements: align, cellspacing, cellpadding, border, background, backgroundcolor, nowrap, and width. These are presentational and not necessary �- their jobs can be outsourced to CSS. Several are even outright deprecated.

A special message about “height”

There is no height attribute for tables. There never has been in any version of HTML, going all the way back to HTML 2 which introduced tables in 1994.

Height is a proprietary attribute invented by Netscape that other browser makers copied. But it’s nonstandard and invalid. A modern browser rendering pages in standards compliance mode will rightfully ignore a height attribute.

The height of a table is dictated by the amount of data it contains. If you must specify a height, do so with CSS. There is no height attribute.

Forms

A form is defined by the <form> tag (noticing a pattern yet?) and is a block-level element. Its required attributes are method and action. Method describes what type of action will be performed when the form is submitted, which is either get or post. Get requests output, post submits input. The action is the URI of the application that will be doing the work, which may be another document on the site, a backend script, or even the same document if it’s been built with programming logic, as is the case with ASP.NET.

The functional parts of a form are its inputs, which come in several types. Most of these are self-explanatory enough that I’m not going to cover each one in depth. Just take note that the input element is empty, so it must be closed with a trailing slash in XHTML. Image inputs require an alt attribute like any other inline image. Since these are almost always buttons, the alternate text should be the button text.

But in the spirit of semantics and accessibility, let�s look at some other form elements that are very handy.

fieldset

The fieldset element defines a set of related fields. It’s a block-level element which most browsers display with a border and a bit of padding, but we can change that with CSS if we like.

legend

The legend element acts as a title for the fieldset, and should describe its purpose or function. It’s not required, but is nice to include. Unfortunately, it can be difficult to style with CSS because its rendering is inconsistent between different browsers. If you want to do something fancy with the legend, you may be better off using a heading instead.

label

The label element is one of the most useful and underrated elements in all of HTML. As its name implies, it defines the label for an input. The two are connected by the input�s id and the label�s for attribute.

In most browsers, an associated label becomes a clickable object that will focus the cursor on the specified control. This is great for checkboxes and radio buttons, since it increases the clickable target area of those tiny little widgets.

General Attributes

Id

Id is what it says it is; an identifier. By giving an element a unique id it becomes easily findable by scripts and by CSS. Any element can have an id attribute, but that id can be used for only one element in a document.

In XHTML, id has supplanted the name attribute — name is deprecated for most elements. An element’s id can act as a target for anchored links.

Ids are not case-sensitive, and can contain any letters and numbers, plus hyphens, underscores, colons and periods: no spaces or other symbols allowed. The only other rule is that an id must begin with a letter — don’t ask me why, that’s just the rule.

Class

The class attribute can be added to any element, and declares that the element belongs to a particular group or is of a particular type. Classes are most often used as CSS selectors but they can also carry additional semantic information, laying extra meaning on top of the element’s natural meaning.

For example, a link with class="external" not only lets us visually style that link differently from other links, but also communicates that this link points to an external resource. Microformats are a growing field that seeks to standardize such semantic conventions, making extensive use of the class attribute.

One element can belong to multiple classes, just separate the class names with white space.

Style

The style attribute allows you to declare CSS properties directly in the markup for one single element. It’s an undesirable mixing of layers so do your best to avoid it.

It’s All In Your Head

To wrap up the HTML portion, let’s cover a few of the non-rendered elements that belong in a document’s header.

script

The script element, as you know, declares a script. The script can either be embedded in the document itself or linked from another location with the src attribute. The type attribute is required for the script element, and in almost all cases its value will be “text/javascript,” except when it’s not. We used to use the language attribute but it’s non-standard and unnecessary in modern browsers.

link

The link element defines a reference to an external resource. It’s most often used to link to an external style sheet, but can also link to other pages or to alternate versions of the same document.

Some have a type attribute, defining the type of document being linked to so the browser knows how to treat it.

Also worth noting is the rel attribute, which identifies the relationship between the current document and the linked document. There�s also a counterpart rev attribute, which defines the reverse relationship, from the linked document to the current one.

The rel and rev attributes can be applied to anchors as well, and their value can be pretty much anything you like. Browsers understand a few common values like the ones here, but we’re free to make up any other value and a browser will simply ignore those it doesn’t understand. Like the class attribute, some microformats use rel and rev to inject even more semantic meta information into an already meaningful element, giving us another data point to manipulate.

meta

The meta element contains additional information about the document, such as the author’s name, copyright information, the document’s character encoding, or the natural language the content is written in.

style

The style element allows you to embed CSS within the document, for use only on that page. This is mixing presentation with our structure, and so is less than optimal, but this at least puts it all together in one place instead of scattered inline throughout our document.

This one also requires a type attribute. The media attribute is optional but recommended (as we also saw on the link element). Media instructs the user agent which set of styles to use for different media types, such as one set for screen and another for print. Without this attribute, it defaults to a value of “all” and the browser will attempt to use the same style sheet for every application.

You can have any number of style elements, but they must be inside the head.

Semantic markup is good for you

As we’ve seen, most elements and attributes have pretty self-explanatory names. They do just what they say on the tin, so it’s usually a pretty easy decision which element to use for a particular slice of content. Using the right element for the right content is what semantics is all about.

Clean markup is understandable to a browser, but it�s also understandable to a human coder. If you know the elements and how they behave, you can look at a well-made document and quickly make sense of it. You’ll work faster and collaborate better.

Constructing our documents with semantics in mind will make our pages stronger, lighter, more flexible, and more accessible. Good markup gives us strong bones to support the content.

While a valid, accessible, semantically rich and well-formed HTML document is a thing of beauty in its own right; she ain’t much to look at. That’s where CSS comes into the picture, and that is a whole other story.