Project Gutenberg Thesaurus 1911
Thesaurus-1911 Version 1.02
(supplemented: July 18, 1991)
An electronic thesaurus derived from the version of Roget's Thesaurus
published in 1911. This thesaurus has been prepared by MICRA, Inc.
(May 1991).
- MICRA, Inc. makes no representation that the original 1911
printed work on which this is based is now in the public domain in any
particular country. However, MICRA, Inc. makes no proprietary claims
regarding this electronic version of the 1911 thesaurus. If the 1911
work is currently public domain, this electronic version can also be
treated as public domain.
- If any commercial use is made of this work prior to January 1,
1993, it is suggested that an appropriate donation be made to MICRA,
Inc. to assist in preparing other such texts which may be useful.
Supplement
Note that this version of Thesaurus-1911 has been supplemented with
over 1,000 words not present in the original 1911 edition, but many
modern words are still missing. About 1500 verbs (out of 6500) which
can be found in an 80,000-word spell-checker are absent from this
work. The deficiency of nouns is probably much worse, especially on
technical topics. Of 40,000 unique words contained in the original
text, 12,000 are not recognized by a spell-checker. Most of these are
foreign words (primarily Latin), and many are obsolete. In this
version, these words are marked as such by comments in square
brackets. Although this version has been proof-read, there are
doubtless numerous residual transcription errors, some of which may be
obvious even without reference to the original text. We will be
grateful if any of these are brought to our attention; the corrections
will appear in subsequent versions.
The original arrangement has also been modified slightly in several
places, in particular by splitting one entry into two. A version of
the 1911 thesaurus which is almost identical to the original (only a
small number of additions to the original work) has also been prepared
by MICRA, Inc., and also carries no restrictions from MICRA. Copies
of that version or this one may be purchased for $40.00 from MICRA,
Inc., or from the Austin Code Works, Austin Texas.
The following only applies to the non-hypertext version
- In this file, comments which are not a proper part of the thesaurus
itself are contained within arrow brackets thus: <-- comment -->.
- Section headings, which are not an actual part of the thesaurus
proper, are included between percent (%) markers.
- Occasional references to numbers starting with "@" are the embryonic
beginnings of a reorganized version, mentioned below. A few comments are
also included within curly brackets {}.
- Last edit 12-20-91.
Differences between this and the 1911 thesaurus
The following additional differences will be noted between this version
and the original edition of the printed 1911 thesaurus:
- the space-saving abbreviations in the original, using hyphens to
represent common words, prefixes or suffixes, have been expanded into the
full words or phrases.
- the side-by-side format for words and their opposites has been
abandoned. Words are listed in order of their entry number.
- each main entry (1035 entries) has a pound sign "#" in front of the
number to facilitate computerized search.
- greek words and phrases are transliterated and (in the etext),
included between brackets in the format <gr/greek word/gr>.
In this HTML-ised format, such phrases are marked up in italics.
- where italics occurred in the original, italics are used in the
Microsoft Word format file. In the plain ASCII file, this formatting is
lost.
- in the original book, words which were obsolete (in 1911) were
marked with a dagger. In this version, those words are marked with a
vertical bar ("|").
Some of the words which were still current in 1911, but are no longer
found in a current college-size dictionary (presently obsolete words), or
which are no longer used in the specific indicated sense, have been marked
with a bar followed by an exclamation point "|!". However, this marking
process has just commenced, and only a small portion of the words which are
now obsolete have been thus marked. Most though not all of the foreign-language
phrases are now obsolete.
The "obsolete" notation
([obs3] in the ASCII version, [obs] in this HTML version)
indicates that the previous word (or some word in the previous phrase)
is not recognized by the word processor's
spelling checker, and also is either NOT in a modern college-sized
dictionary, or is noted there as being "ARCHAIC".
- the approximate location of the bottom of each page in the original
1911 printed book is indicated by a comment of the form: <-- p. 23 -->.
To search for a page, note that there are two spaces between the "p." and
the page number. (this was removed for the hypertext edition).
- This file contains only the main body of the thesaurus. Neither
outline nor index are contained here. The outline with an overview of the
organization of the concepts is contained in a separate file,
"outline.doc", on the distribution disk.
Quality
This first edition of this supplemented 1911 thesaurus (June 1991) is
very much less complete than the latest editions of commercial thesauri,
and is probably not suitable for use as an adjunct to word-processing
programs, but it has no proprietary claims attached to it by MICRA, Inc.,
and does not contain any material published commercially after 1911.
Future (copyrighted) versions of this thesaurus are planned, which
will be reorganized in a hierarchical fashion to maximize the ability to
take advantage of inheritance of semantic characteristics from higher
categories. The objective is to create a database of words organized by
semantic categories, suitable for use in natural-language understanding
programs. This is a very small-scale project, which will not be
competitive with large academic or commercial efforts such as the CYC
project, but is intended to provide a convenient resource for
experimentation in natural-language processing for individuals or small
groups. Anyone who is currently engaged in or contemplating a similar
thesaurus or dictionary project, who would be willing to collaborate on
this project, is encouraged to contact us, so that unnecessary duplication
of effort can be avoided. We would also appreciate being notified of
typos, errors, or omissions in any version. Send inquiries or comments to:
Patrick Cassidy
MICRA Inc.
735 Belvidere Ave.
Plainfield, NJ 07062-2054
voice: (908) 668-5252
fax: (908) 668-5904
(If no one answers, please leave a message.)