Common Web Site Structuring Mistakes

When browsing through web sites, especially corporate web sites, there are a few things that make me go “Eurghh!”. Surprisingly though, not many of these things are aesthetics related, to me they are more of a structural issue. Here’s my list of Common Web Site Structuring Mistakes in no particular order whatsoever.

  • 100% Flash content
  • Overly Complex Dynamic URLs
  • Hardcoded Presentational Layer
  • Lack Of Contrast
  • False Standards Compliance Advertisement
  • Overly Restrictive robots.txt
  • Not Updating the Front Page
  • Use of Frames/Iframes
  • Non-descriptive Titles
  • Using Popups for Important Pages

Mistake 1: 100% Flash Content

First of all, I don’t have any deeply seeded hatred for all things Flash. Wait a minute… I do! I hate flying widgets and Flash banners with a passion.

However, I’ll be the first to concede that Flash has its uses and if used properly, it’s an excellent presentational tool. The problem with Flash is that it’s often used for all the wrong reasons. Flash advertisements are irritating. I’ve yet to find someone who disagrees with this statement.

Creating an entire web site with Flash is very much possible. Hell, I have to maintain one myself, mostly due to bureaucratic reasons (I am making progress in convincing the relevant parties to convert it to XHTML though). It provides pleasing eyecandy, and if constructed properly, can provide an easily navigable interface.

The major problem with constructing a web site entirely in Flash is that it is not search engine friendly. Search engines don’t “understand” Flash content, to put it simply. That is why it is important to have descriptive text content to go along with your Flash content.

The Flash content is for humans whereas the text content is for search engine spiders. If for some reason you don’t want the text content to be seen by your visitors, you can use the CSS property display: none to hide it from them. Search engines would still be able to see it though, and that is what matters the most.

Mistake 2: Overly Complex Dynamic URLs

Granted, this was an old-school search engine indexing limitation. All popular search engines now have the ability to spider dynamic URLs. However, I’ve yet to see a search result that has more than five embedded embedded variables.

To illustrate this, let me give some examples:

  • URL A: http://www.example.com/prod.php?item=1&category=12&vendor=13&code=88&promo=xmas
  • URL B: http://www.example.com/prod.php?item=1
  • URL C: http://www.example.com/1/brand-new-gadget/

If you were to bookmark the URLs above, which one would you be most comfortable with? I’d figure that most people go for URL C. It’s short, simple and descriptive. Another plus is, if you somehow forgotten to bookmark it earlier, search engines tend to find it easier compared to URLs with dynamic URLs such as URL A and URL B.

By the way, URL A has five embedded variables whereas URL B only has one. More importantly, stay away from them whenever possible. If you have some basic programming background, you might want to check out mod_rewrite if you have web sites hosted on Apache. If you’re on IIS, then I’m afraid I can’t help you much. I’ve yet to seen a free URL rewriting module for it.

Mistake 3: Hardcoded Presentational Layer

This is quite a common phenomenon. My only question is, “WHY?!”. I mean, if this was 1997 then yeah, tables are a perfectly good way to place elements of a web page. And yes, <font color=”blue” size=”10″>this is way cool</font>.

The good news is, it’s no longer 1997 and we have nice toys like CSS so that we no longer need to mash our presentational layer along with our content. The bad news is, too many people are still trapped in the mid-90s. This is further compounded by the fact that most WYSIWYG HTML editor doesn’t output valid XHTML or integrate proper CSS support. Just when you thought it couldn’t get any worse, tens of thousands of “budding web designers” sharpen their “skills” on such tools, and see web standards as “getting in the way of getting ‘real’ work done”.

My advice is if you’re serious about authoring web sites, then you have to be serious about fulfilling W3C standards as well. In all honesty, if you’re arsed enough to actually learn something, it wouldn’t hurt to learn it properly now, would it? For instance, you could still drive a car on first gear alone… but is that the proper way to drive?

Mistake 4: Lack Of Contrast

This one literally drives me blind! I’ve seen more than enough dark gray fonts on black background as well as light silver fonts on a white background to last me a lifetime. Believe me, there’s nothing “l33t” nor “artistic” about those stupid colour schemes! There are also web sites that are way too colourful for even a kindergarten activity room.

The human eyes assist in making sense of a presentational message, be it pictures or text, by comparing the primary patterns with its surroundings. The easier it is to differentiate the “main message” from the “backdrop”, then the easier it becomes for our brain to make sense of what the eyes are feeding it.

If you feel that you have a daft sense of colour, you might want to check out Steel Dolphin Color Scheme Tool. It will help you pick a colour scheme for your web site based on a primary colour of your choice.

I’d personally go for the Complimentary, Split Complimentary or the Triadic schemes most of the time. You don’t really have to strictly follow the generated scheme. Slight hue variations of the generated scheme might be more to your tastes. The important thing is to experiment… sanely!

Mistake 5: False Standards Compliance Advertisement

Seen any of these images recently:

  • Valid XHTML 1.0 Button
  • Valid CSS Button

You won’t believe how many pages that display the buttons above actually validates! Try clicking those images the next time you see them. Chances are, many of the web sites that display them don’t really use valid XHTML and/or CSS. To me, this is one of the biggest forms of fraud that is plaguing the internet, especially the Malaysian blogosphere.

I mean, if you know that your pages don’t validate, then why the heck do you still stick those links on your site? It’s like sticking an “EVO IV” sticker on your Proton. Only an idiot wouldn’t notice the difference, but to the rest of the population, you are the idiot.

So please, before you proudly display those standards compliance images on your web pages, double check to make sure they are really compliant.

Mistake 6: Overly Restrictive robots.txt

First of all, in order to understand robots.txt I’d recommend that you visit this page.

As you can see robots.txt is a very important tool that will determine how your pages are indexed by search engine spiders. Now, the problem is that a lot of people use downloadable CMS scripts or other site management tools and build their web sites on those scripts. Some of these scripts bundle their own robots.txt, and more often than not the default settings are way too restrictive for most web sites.

Another funny thing about robots.txt is that in theory, it’s supposed to “protect” certain areas of your web sites from nosey search engine robots and other crawlers. However, more often than not, it provides nosey visitors hints on where to start digging on your web sites for “goodies”.

Let’s take the example of PHPNuke, one of the most popular CMSs available on the web. Take a look at its robots.txt contents. You’ll notice in there that important areas of PHPNuke are disallowed to robots. However, to us humans, we now know where they are located.

Perhaps a more interesting example would be Whitehouse.gov’s robots.txt. My oh my oh my…

Mistake 7: Not Updating the Front Page

The front page of your web site, also oftenly called the home page, is the doorway to your other pages. It should provide links not only to key pages of your web pages, but preferably a summary of important areas which would entice a visitor to click visit these areas. However, there’re many web sites out there that don’t make proper use of its front page.

Imagine the following scenario; You have a web site at http://www.example.com and a “News” section at http://www.example.com/news/. In your news section you have an entry about your company’s recent acquisition of a competitor at http://www.example.com/news/2005/12/27/we-acquired-xyz-company/.

Wouldn’t you think that this is very notable news and should be announced to the whole wide world? Then would you “hide” it under a few layers of links? Don’t you think that it should be placed prominently on your front page?

Therefore, it is of absolute importance to update your front page with current news, eventhough you might have a dedicated page just for such purposes.

Mistake 8: Use of Frames/Iframes

This is another web technology that’s considered passe. Whatever a <frame> or an <iframe> can do, <div>s can do it about just as well. If you need dynamic updating of its contents, then a little bit of AJAX can emulate that behaviour pretty well.

If you need more convincing on why you should drop frames in your web pages, consider the following scenario; A page that uses two frames will need at least three HTML files:

  1. The frameholder (where you define the <frameset> element).
  2. Contents of the first frame.
  3. Contents of the second frame.

In addition to this, link targets need to be defined properly. A simple mistake could result in an external link to be opened in a site menu frame. I’m sure that you have encountered this problem every once in a while. It can be quite irritating to see a web page squashed in a small frame.

Furthermore, search engine spiders tend to index the contents of your <noframes> tag better than the actual frames themselves!

Mistake 9: Non-descriptive Titles

Perhaps one of the more easier issues to fix. The contents of your <title> element is very important. It should provide a brief description of the page it’s on. Many lazy web authors template the titles of all their pages, thus making it the same for all pages in their web site.

A common misconception about the page title is that it’s unimportant because it’s merely displayed in your browser window’s title area but not on the page itself. Hence, the preference to template it rather than making it dynamic.

Some even went to the extent of making it cartoonishly dynamic by manipulating the title tag with JavaScript to do irritating ASCII animations! This will irritate most users to no end, so please, stay clear of doing so!

From a visitor’s point of view, the page title is virtually invisible. However, for search engines, the title tag is all important. It should concisely describe the page that’s being returned as a search query result. Here are some examples:

  • Super Duper Company Pte Ltd Web Site (across the entire site): Templating, a very bad way to use titles
  • Super Duper Company Pte Ltd. The best company which only sells super, duper stuff like super duper cars, super duper trucks, super duper whatever, all the super duper stuff you want!: Title spamming, definitely a bad way to use titles. Furthermore, search engines only index on average the first 64 characters of the title anyway.
  • Super Duper Company – Description of Current Page: This is one of the more popular title formats. Clear, descriptive, concise… However, there’s a better way to structure a title.
  • Description of Current Page – Super Duper Company: In my opinion, this is the best title structure a web site can use. The page description should appear before the company name because that is what a search result will show first. In addition, the title should focus more on the current content rather than the overall description of the web site.

Mistake 10: Using Popups for Important Pages

This is yet another scientifically proven web irritation that is used for all the wrong reasons. The common reasoning used by incompetent web designers is that popups shift the focus temporarily from the main content to the popup’s.

What this group of designers fail to see is that after years of being plagued by popups, even the most inexperienced web surfer developed a habit of automatically closing popups (especially unrequested ones) or enabling popup blocking features in their browsers. Hell, most modern browsers already enabled this option by default.

To make matters worse, a lot of corporate websites use popups to announce important news. Sadly, this information is unlikely to reach the targetted recepients, either through the “close all popups by reflex syndrome” or through popup blocking browser features/software.

Footnotes

I hope that you obtained some useful hints and tips on designing a more structurally sound web site from what I’ve written. The purpose of this article is not to preach. Hell, I’ve made almost all of the mistakes I’ve mentioned above (the only exception is mistake #1, probably because I can’t develop anything useful in Flash, let alone an entire web site).

I appreciate any comments and/or feedback. You may do so by using the comment form below, or if you prefer, you can send me private feedback via this contact form. I will strive to respond in a timely manner.

Please Digg this article if you enjoy it.

10 responses to “Common Web Site Structuring Mistakes”.

  1. farking Says:

    some good example 🙂

  2. suanie Says:

    this is good… i’ll be referencing off it for some stuff that i do… but your xmas background … susah sikit to read ar…

  3. WTJ Says:

    well said

  4. Site Admin Azmeen Says:

    Sorry for the Xmas theme… removed now.

    I should have tested it on CRTs as well as LCD monitors.

    Thanks for the comments everyone 🙂

  5. HTNet Says:

    Alternative to Using robots.txt

    In my post Common Web Site Structuring Mistakes, I did mention that robots.txt tend to deceive its function of restricting access to certain areas of your web site by search engine crawlers, more commonly known as robots. Instead, the robots.txt file …

  6. Jim Says:

    > Mistake 5: False Standards Compliance Advertisement

    Take this site, for example. You use an XHTML doctype, but you aren’t writing XHTML. Two problems are immediately obvious – your Javascript would break, and you are only using your Javascript to hide more non-compliance from the validator anyway.

    > Another funny thing about robots.txt is that in theory, it’s supposed to “protect” certain areas of your web sites from nosey search engine robots and other crawlers. However, more often than not, it provides nosey visitors hints on where to start digging on your web sites for “goodies”.

    No, it’s supposed to stop wasting resources on robots that would otherwise recursively follow an infinite/excessive number of links. It’s nothing to do with keeping people from being nosy.

    > Mistake 8: Use of Frames/Iframes

    iframes are the only way to make certain types of Javascript usable, due to a bug in Internet Explorer. Avoiding iframes altogether would significantly decrease the quality of many websites.

    > The contents of your title tag is very important.

    You mean “element”, not “tag”.

  7. Site Admin Azmeen Says:

    > Take this site, for example. You use an XHTML doctype, but you aren’t writing XHTML. Two problems are immediately obvious – your Javascript would break, and you are only using your Javascript to hide more non-compliance from the validator anyway.

    I maintain that all my XHTML are valid. Your point about the javascripts are correct though, I should have enclosed them in proper <![CDATA. However I am NOT using Javascript to “hide more non-compliance”, as most of the Javascript on this site are from external parties, and there’s not much editing I can do on them.

    Using the “comment in javascript” method is a perfectly logical method since the javascript contents are for human uses only, and if the user-agent doesn’t support it, it won’t break the page’s function in any way whatsoever.

    > No, it’s supposed to stop wasting resources on robots that would otherwise recursively follow an infinite/excessive number of links. It’s nothing to do with keeping people from being nosy.

    I never said it was for keeping people from being nosy. Furthermore, all “real” robots won’t follow an infinite/excessive number of links anyway; because 1) people can’t link infinitely, and 2) modern robots have limits to the number of links they will crawl on a page.

    > iframes are the only way to make certain types of Javascript usable, due to a bug in Internet Explorer. Avoiding iframes altogether would significantly decrease the quality of many websites.

    What sort of Javascript would be so important? And you do all this because of a buggy browser which is the bane of all web page authors due to its poor support of W3 standards? Avoiding IE altogether would significantly improve the quality of many people’s browsing experience.

    > You mean “element”, not “tag”.

    Ooops, corrected. Thanks for pointing it out.

  8. Jim Says:

    > However I am NOT using Javascript to “hide more non-compliance”,

    You certainly are. You are using a Strict document type, and because you can’t use the target attribute with Strict document types, you dynamically add the attribute with Javascript so that the validator won’t see it. The attribute is still there, and you are still using it with a document type that doesn’t allow it. Just because the validator can’t see it, it doesn’t mean it’s any more compliant.

    > as most of the Javascript on this site are from external parties, and there’s not much editing I can do on them.

    I’m talking about the inline Javascript present on this site.

    > Using the “comment in javascript” method is a perfectly logical method

    Let’s put this in perspective. Browsers as old as Netscape 2.0 don’t need that comment hack. However it’s a practice that breaks with XHTML, something you claim to be authoring.

    If you are authoring XHTML, then it’s a completely nonsensical thing to do. If you are authoring HTML, it’s unnecessary. Either way, it doesn’t strike me as perfectly logical, especially as it’s less efficient than external scripts with dynamic pages such as yours anyway.

    > I never said it was for keeping people from being nosy.

    I quote:

    > Another funny thing about robots.txt is that in theory, it’s supposed to “protect” certain areas of your web sites from nosey search engine robots and other crawlers.

    Fact is, it’s about resource allocation, not privacy protection.

    > Furthermore, all “real” robots won’t follow an infinite/excessive number of links anyway

    That’s not the point. Recursively following links to the point where the bot gives up is not only an utter waste of resources, but can prevent your actual content from being indexed, in favour of blank pages.

    > people can’t link infinitely

    Sorry, you’re wrong. One example I recall causing problems was common weblog software that allowed you to browse the archives by date. Of course, the calendar had a ‘next month’ and ‘previous month’ link on it, and in the absence of robots.txt, a bot will dutifully follow those links until it gives up a few thousand years from now. These kinds of infinite loops aren’t difficult to create, either by design or by accident.

    > What sort of Javascript would be so important?

    When you link to a fragment identifier on the current page, Internet Explorer needs an iframe hack to update the history list. This is very important when doing Ajax-type things. Otherwise, the user hits their back button, and ends up in totally the wrong place.

    > And you do all this because of a buggy browser which is the bane of all web page authors due to its poor support of W3 standards?

    It’s better than causing problems for most users by sticking my head in the sand.

  9. Site Admin Azmeen Says:

    > You certainly are. You are using a Strict document type, and because you can’t use the target attribute with Strict document types, you dynamically add the attribute with Javascript so that the validator won’t see it. The attribute is still there, and you are still using it with a document type that doesn’t allow it. Just because the validator can’t see it, it doesn’t mean it’s any more compliant.

    I don’t use the target attribute at all! If you would’ve just told me the function that did it I would have just disabled it. It’s from an external plugin I used and yes, I’ve disabled it immediately.

    I have absolutely no need for any rel=’external’ links to open in a separate window. And if you even bothered clicking on my rel=’external’ links, they don’t anyway.

    As for the robots.txt thing, again you choose to quote me. If you had just spent a little bit more than 2 seconds reading it you’ll see that I wrote nosey robots as you oh so accurately copy and pasted, and not nosy people like you implied.

    > That’s not the point. Recursively following links to the point where the bot gives up is not only an utter waste of resources, but can prevent your actual content from being indexed, in favour of blank pages.

    It is a point, because it practically cannot happen. Furthermore, no robot would index blank pages.

    > Sorry, you’re wrong. One example I recall causing problems was common weblog software that allowed you to browse the archives by date. Of course, the calendar had a ‘next month’ and ‘previous month’ link on it, and in the absence of robots.txt, a bot will dutifully follow those links until it gives up a few thousand years from now. These kinds of infinite loops aren’t difficult to create, either by design or by accident.

    For one thing, that’s a software, not a person. For another, in all weblog software I’ve seen having this feature, it definitely can be disabled. For yet another thing, most people won’t want to display such a long history list nor are able to blog until “a few thousand years from now”. Heck, even the internet has been in existance only since when, the late 80s?

    Sure mistakes do happen, but up to the point it accumulated into a list spanning “a few thousand years” before the author realises it? You got to be kidding me.

    > When you link to a fragment identifier on the current page, Internet Explorer needs an iframe hack to update the history list. This is very important when doing Ajax-type things. Otherwise, the user hits their back button, and ends up in totally the wrong place.

    Well, since I’ve done a few Ajax related stuff myself, if somehow your “user” feels the need to use the “Back” button, you should focus on why they needed to do so in the first place, rather than work around it. ie. probably the UI is unintuitive.

    > It’s better than causing problems for most users by sticking my head in the sand.

    To me, I call it putting on constructive pressure for the developer to improve. To each his own. I’m not writing to “win” anything anyway, just to share my views. Some might happen to share the same views and others won’t. Both are perfectly acceptable to me.

  10. Jim Says:

    > I don’t use the target attribute at all!

    Not any more, not now you’ve deleted that Javascript that was adding it.

    > As for the robots.txt thing, again you choose to quote me. If you had just spent a little bit more than 2 seconds reading it you’ll see that I wrote nosey robots as you oh so accurately copy and pasted, and not nosy people like you implied.

    That’s irrelevant to the point. The point you were making was that robots.txt was intended to protect privacy and fails at that. That’s an absurd point because robots.txt isn’t meant to protect privacy.

    > It is a point, because it practically cannot happen.

    I gave you a perfectly reasonable example of how it could.

    > Furthermore, no robot would index blank pages.

    “Blank” in terms of “no content”, not “blank” as in zero bytes. Did my example not make that clear? Yes, robots can and will read pages with no content, and because you admit that they will not carry on reading pages indefinitely, you have to concede that when a blank page is read, it reduces the number of pages with content that will be read.

    > For one thing, that’s a software, not a person.

    Er, so? A search engine doesn’t care if the page was generated dynamically or whether it was written by hand.

    > For another, in all weblog software I’ve seen having this feature, it definitely can be disabled.

    Again – so? I’m giving an example of why robots.txt exists. The fact that some software can be configured to avoid this scenario is not relevant to the point.

    > For yet another thing, most people won’t want to display such a long history list nor are able to blog until “a few thousand years from now”.

    I’m sorry, did you miss my point entirely? Let me explain it again.

    1. You claimed that robots.txt is somehow intended to protect privacy.

    2. I disagreed and said that it was to preserve resources, for example stopping robots from following an infinite/excessive number of links.

    3. You said that you can’t have an infinite number of links.

    4. I gave an example of a calendar being able to generate an infinite number of links.

    Now how on earth is somebody not being able to blog for thousands of years remotely relevant to that? My point is that it’s easy for software to have an infinite number of links, not that a person can keep writing until they have more content than a robot can deal with.

    > Well, since I’ve done a few Ajax related stuff myself, if somehow your “user” feels the need to use the “Back” button, you should focus on why they needed to do so in the first place, rather than work around it. ie. probably the UI is unintuitive.

    Using the back button is perfectly reasonable and doesn’t mean there’s been a UI mistake.

    > To me, I call it putting on constructive pressure for the developer to improve.

    At the users’ expense.