A brief history of HTML
Coverage: HTML 1.0, HTML 3.2, HTML 4.0, HTML 4.01, cHTML, XHTML 1.0, XHTML 1.1, modularisation and profiles
Background
HTML was created in 1992 (the first web page was written in 1990) by Tim Berners-Lee as a way of sharing documents between computers. It provided a simple, public language for describing things such as ‘paragraphs’ and headings’, as well as introducing ‘links’ which allow a document to contain clickable links to other documents. The linking feature (and the document exchange between computers) was only possible because of the existing Internet.
The original HTML was defined in a general language definition language called SGML. It is important to note that the purpose of the language was to define what the elements on a page were, and the relationship between them - not the appearance (rendering) of a page. This meant that the structure could appear on a screen, or on paper, or be read out - it was device and medium independent.
The use of HTML was amazingly popular and spread quickly. The combination of this language and a protocol for exchanging Hypertext Documents resulted in the World-Wide Web, which is one way of using the internet. The first recognisable browser (MOSAIC) was written at NCSA (National Centre for SuperComputer Applications {{check}}) and first appeared in 1993. Because it came from an academic environment, all the original tools were for Unix or Apple (or NeXT) systems.
Companies quickly began developing browsers for the web. In 1994 members of NCSA left to form Mosaic (which became Netscape). There were many other companies developing browsers for all sorts of computers.
Microsoft joined in during 1995 when they decided that the Internet was important and released Internet Explorer 1.0 then 2.0 then 3.0 (which worked) during a 12 month period.
With the spread of the web, it became very important that everyone agreed exactly what HTML was. This meant the development of publicly available, agreed standards which defined the language. A first draft specification can be found at http://www.w3.org/MarkUp/draft-ietf-iiir-html-01.txt . This was written in 1993, and includes a DTD giving a formal description of the language. In 1994, the Worldwide Web Consortium (W3C) was founded to take development forward independent of any particular player or company.
HTML era
The draft definition of HTML 1.0 is clean and simple. It is recognisable to pretty much everyone who writes HTML these days as a subset of what we currently have.
The W3C consortium started working on a draft of an updated version of HTML to become recommended as an International Standard. Unfortunately, international consortia move slowly, while the companies working on browsers were quick to spot additional desires of users, and to add non-standard features to their browsers.
This period (often referred to as the ’Browser wars’) saw Netscape (the market leader) and Microsoft often head-to-head in adding directly competing features, copying each other and moving away from the ‘open standard’ model of the web. Authors began to have to write aspects of their pages differently for the different browsers.
In terms of HTML, the main changes were concerned with rendering control and interaction. Elements to do with fonts, colours, images, tables and forms were all added through an ad-hoc process by the competing companies. As a consequence, the W3C HTML 2.0 was obsolete before it had even been written. The role of W3C at this point was mostly to try and get the companies to agree on a common set of elements to try and bring back the standards-based idea of the web.
HTML 3.2 was published in 1997 as a W3C Recommendation. Because of the way it was created, it is much fatter and less consistent than earlier versions. It moves away from the idea of making structure separate from rendering, and hence makes HTML less accessible and less device-independent.
HTML 4.01 was published in 1999 and attempts to address one of these issues: rendering style. It introduced the ability to use a separate language to define styles (such as CSS - defined in 1996). This made it possible to write content where the elements/structure were separated from the rendering once again. It did not remove the other elements that had been introduced in HTML 3.2, so you can use it either way and it is still a bloated system.
To try and encourage authors to move away from features which were likely to be phased out, W3C produced 3 versions of the DTD. In addition to the normal DTD (generally referred to as ‘transitional’), there was a ‘strict’ DTD which omitted all the presentational elements. This results in a much cleaner system. There is also a separate DTD called ‘frameset’ for those authors who still wanted to use frames in documents. Best practice is the ‘strict’ frameset.
XHTML and XML
Up to this point, all versions of HTML were defined in SGML. In a parallel development, the W3C were working on a version of SGML which was more amenable to the requirements of a web-based computational environment. This Language Definition Language, first recommended is 1998, is XML. Many people have written tools for parsing and manipulating this language, and hence languages defined in this language can be treated in versatile ways with comparative ease.
Once XML was accepted, a first task was to re-describe HTML in XML. The result is XHTML. At one level, the result can be considered equivalent, since it uses the same elements as the language it is replacing (with small variations where XML is more precise about syntax than SGML). It even retains the 3 DTDs (strict, transitional, frameset).
However, being an XML defined language means that two very important differences apply to user agents (browsers) and other tools written with XHTML.
The first difference is that these tools are normally written as general XML language processors, so they can process not only XHTML, but other XML languages. This gives the potential to extend the browsing experience by adding other languages within your web pages. Examples could be generally agreed languages (e.g. SVG - scalable Vector Graphics) or languages which you have defined yourself. This extensibility is a key feature.
The second difference is that this facilitates modularity of the language definitions. Instead of defining one big DTD for XHTML, you can break it down into a number of modules, and then restrict the language by defining subgroups of modules. XHTML 1.1 is a modular version of the specification.
Special versions
Prior to modularisation, there was a desire for a smaller version of HTML with fewer features, which was suited specifically to devices such as Mobile Phones (and ‘small appliances). Driven primarily by companies in Japan, a subset of HTML called cHTML (compact HTML) was defined in 1998. This omitted anything that would not work on a small black and white, text-based screen - such as images, font control, tables, colour etc. In many ways it is much closer to the original HTML 1.0 specification than any other version. Although this never became an International standard, cHTML was instrumental in the success of i-Mode devices which became hugely important in the Japanese market.
The requirement for subset languages on small devices (such as phones, TV digital boxes and PDAs) combined with the introduction of modularity to make it possible to introduce principled subsets of XHTML for special cases. Currently there are XHTML 1.1. Basic (from W3C) and XHTML Mobile Profile (from OMA).
Other mark-up languages are also beginning to add a ‘Basic’ and even a ‘Tiny’ profile for mobiles.
Relationship to WML
It is important to point out that WML is a mark-up language which, while clearly related to HTML, is not a derivative or subset. It was developed and managed through a different mechanism than W3C, and is specifically intended with wireless devices in mind.
WML is part of a much bigger package of protocols and specifications called WAP (Wireless Application Protocol) that defines the way in which information is handled across the mobile networks, and is now managed through the Open Mobile Alliance (OMA). Previously it was managed through wapForum.
WML first appeared as part of the WAP 1.0 specification in 1999?. WAP 1.2 (2000) and 1.3 (2002) were significant revisions. Current version is WML 2.0 (associated with WAP 2.0) {{Date??}}
While some items (such as P and A) are the same in WML, it has a number of significantly different elements. It also has 2 very different features. Firstly, it is not ‘1 document=1 page’ like HTML. Instead a document (called a Deck) consists of many cards (which can each be thought of as a ‘screenful’) and the device downloads the complete deck in a single request and then navigates internal links locally within the handheld. The second major difference is that it has a defined bytecode/compressed format which means that it can be transmitted much more efficiently than HTML, this saving time and cost.
{{See article - For details}}