Hypertext Markup Language (HTML) was not the first descriptive
text language to be used. Computer programmers have long
used formatting codes, control codes or macros in software to
direct document formatting. By the 1960s, generic coding began
with descriptive tags rather than cryptic names. One example
would be heading instead of format-17. Many people were working
on similar ideas during the 1960s including scientists at
IBM. A Generalized Markup Language (GML) was developed as
a means of allowing the text editing, formatting, and information
retrieval subsystems to share documents. GML introduced the
concept of a formally-defined document type with an explicit
nested element structure. Of course, GML was implemented for
the mainframe computers circa late 1960s. At that time IBM
was the world’s second largest publisher and they produced over
90 percent of their documents with GML. Over the next few
years, several new concepts were developed such as short references,
concurrent document types, and link processes.
During the 1970s the American National Standards
Institute (ANSI) established a committee to develop a standardized markup language. This became the Standardized
General Markup Language (SGML) which was eventually
adopted in 1986 by the International Standards Organization
(ISO). SGML offers a detailed system for marking up documents
so that their appearance is independent of specific software
applications. It is a stable and well-defined meta-language
that allows other markup languages to be created. SGML is
very powerful and flexible due to the many options included.
Early adopters of SGML were the U.S. Internal Revenue
Service (IRS) and the US Department of Defense.
However, it soon became apparent that SGML’s sophistication
was unsuitable for quick and easy Web publishing. A simplified
markup language was needed so that anyone could learn
it quickly. A result was the Hypertext Markup Language
(HTML), which is basically one specific SGML document
type, or Document Type Definition (DTD). Early Web browsers
supported HTML and it quickly became the de-facto language
of the burgeoning Web and was in large part, a significant reason
for the rapid growth of the Internet’s popularity.
As good as HTML is, there are still problems with it. In many
cases it is too simple. It served the purpose in the early days of
the Web when almost everything was text-based documents but
ran out of horsepower when Web authors started using multimedia
and advanced page designs. Image maps (images with
embedded hyperlinks), text attributes, tables, frames, and
dynamic pages all added complexity. Competition among browser
developers guaranteed incompatibilities with proprietary features
or solutions to the same feature. Over the years Microsoft
has added tags that work only in Internet Explorer, and Netscape
added tags that work only in Navigator and guess what: the Web
author is caught in the middle! Standards were attempted but
never really got full support industry wide. The biggest problem
is that HTML is not extensible. This gave way to Java and
JavaScript and Active Server Pages. Each new addition to HTML
such as these and Cascading Style Sheets (CSS) add flexibility in
Web designs but these are really just patches to mask the problem—
no standard extensibility. It is ironic that HTML grew out
of SGML which is fully extensible.
As extensible as SGML is, it is also extremely complex and
time consuming to customize a set of documents. A new
approach was needed to bridge the gap between SGML and
HTML. The answer is Extensible Markup Language (XML), a
proposal in late 1996 to the World Wide Web Consortium
(W3C). XML was designed with the power of SGML, avoiding
the complexity. HTML is merely one SGML document type,
XML is a new meta-language, a simplified version of the parent
language itself. Yet, XML has the power to define other
markup languages.