Abstract of the Book

A Nontechnical Informal View of the Book

THE ECONOMICS OF INFORMATION

Subtitle

GNOSTIC METHODS FOR THE TREATMENT OF UNCERTAIN DATA

Quantitative information (``how many'', ``how much'', ``at what cost'' etc.)
is one of the necessary elements for cognition^{[3]}. Information of this nature results from
a numeric data form which depicts (*quantifies*) the quantitative features of
real objects and processes. Quantification is always perturbed by
uncertainties, therefore methods that can suppress
the imbedded uncertainty must be used to extract information from such data. The need
for information has been growing at an increasing rate and both data and their processing
are expensive.

Following the classical definition of an *economic good* as a good which is scarce
relative to the total amount desired and *economic efficiency* as producing such goods
at the lowest possible cost, then the idea of Economics of Information
as the production of the maximum output of information given the cost (of data and of their
treatment) is reasonable and leads to economic efficiency as well.

The first requirement of a methodology to treat uncertain data is that it must: **extract
the maximum amount of information from a given collection of data.** Such an objective
can be achieved through the use of **gnostic methods.**

This notion of economics of information can be considered only if the amount of information is measurable. Because the harvesting of information is directly related to decreasing uncertainty it is necessary to have at hand and to be able to use a scientific model of uncertainty.

There are several concepts of uncertainty and of its *paradigm*^{[4]}, the most popular of
which is the statistical paradigm. However, the statistical paradigm---as are other related
concepts---is tied to the uncertainty of mass events: statistical evaluation of a quantity
of information is possible only for a large ``family'' of events, not for a single event nor for
a small number of events. Moreover, in order to successfully undertake such a task, an a priori
model must be available and when the model does not fit the reality of the data, significant damage
to the quality of the estimates can occur.

It is natural to think that each piece of data is ``seeded'' with its own quantity of
uncertainty.
Therefore it follows that the data's informative value should be evaluated datum by datum so as
to make the most use of the contribution made by each element. A mathematical model of **individual**
data uncertainty is a principal distinctive feature of *mathematical gnostics*, also known as the
gnostic theory of uncertain data. As with all mathematical theories, gnostics is a
system of definitions and axioms; theorems and other results derived from the axioms are proved by
means of consistent mathematical methods. Mathematics lives in its own abstract environment where
everything is subjected to specific laws which are independent of the real world and its regularities.

But the modeling of real objects and processes is also one of the important tasks of mathematics. Hence, mathematical models must be based on assumptions (definitions, axioms) which correspond to the reality of the subject being considered. Limitations of this nature substantially cut down the freedom ordinarily enjoyed by mathematicians in developing ``pure'' mathematics. Serious problems begin with the notion of the data itself.

Data are numbers but their nature as mathematical objects differs markedly from abstractly
defined
numbers. In reality, data are the numerical images of quantities which exist objectively and as
such, they are the result of special technologies called *counting* (which produces integers)
or *measuring* (resulting in rational numbers). Taken together, these technologies, defined as
quantification, have strict rules which must be observed to ensure the consistency of the mapping
``*real quantity* <=> *number*.'' These rules, associated with the creation of
markets and trading, have evolved over thousands of years and resulted in the development of
measurement theory which demonstrates that real data form special mathematical structures. It is
one of the characteristics of gnostic theory that it respects this special nature of data by
creating a suitable model of *ideal quantification*, ie of quantification which is perfectly
precise.

Another important feature of real data is that they are more or less contaminated by
uncertainties.
To take this into account, gnostics models real data as couplets of the true and uncertain elements
of observed data. The first gnostic axiom states that real data form a bi-dimensional structure
the abstract ancestor of which is called the **2-algebra** in mathematics.

To establish a theory of uncertain data, it is necessary to accept one of several potential
interpretations of the nature of uncertainty. Over three centuries ago Gottfried von Leibniz
explained that the difficulty in generalizing from samples of data arises from nature's complexity,
not from its waywardness. This view of the world may also be taken to mean that data uncertainty
is not caused by ``randomness'' but by the omission of important factors which affect the data
values. Gnostics is based on the idea that the uncertainty manifested in data is caused by
**insufficient knowledge** of the process which produced the data. A pragmatic consequence of
this
point of view is obvious: insufficient knowledge can be cured by better information while
randomness is beyond control.

The uncertain components imbedded in data are thus the consequence of real and recognizable (but
not yet identified) factors. These also enter the quantification process and they could, at least
theoretically, also be quantified by using an ideal quantification technology. For this reason,
gnostics designs the model of real quantification as a **symmetric** couplet of ideal
quantification processes.

Uncertainty reveals itself as data error. To minimize uncertainty, it is necessary to measure its value. The magnitude of such error is taken as the distance between the true (unknown) and the observed data value. To measure a distance, one needs geometry. There are a great number of different geometries, not only the Euclidean one which is popularly taught to every child at school. Which then of these many geometries should be used for measuring uncertainty? An important feature of gnostics is that the data itself derives the ``proper'' geometry. This is the idea of ``letting the data speak for themselves.'' Note that a geometry used for measuring uncertainty in gnostics is ordinarily of a non-Euclidean type and it applies non-linear formulae for measuring errors/distances.

The changes in data values caused by uncertainty can be viewed as a virtual movement along a
path within a bi-dimensional space. The form of this quantification path uniquely results from the
first axiom and it is proved that it is endowed by maximal features: when the datum driven by the
uncertainty passes along the path, the effect of the uncertainty is larger than it would be along
any alternative path between the same end points. Such a feature is similar to certain Laws of
physics and it is called the ``variational principle.'' It can be thought of as a game between an
Observer and Nature. The Observer wishes to discover Nature's secret by applying quantification.
Nature defends itself by introducing uncertainty; her strategy is to maximize the damage to the
data value given the uncertainty `invested.' Nature's `move' is hostile but the strategy is not
malicious: it is honest, fixed once forever. This provides the opportunity for the Observer's
countermove: he wishes to return as closely as possible to the original unspoiled data value, to
*estimate* it. Having learned from Nature, he applies the opposite strategy: to return the
contaminated datum to its original state, he chooses an estimation path which leads to the opposite
extreme: to minimize the damage caused by uncertainty. The quantification and estimation paths
form the gnostic cycle which discloses a fact of fundamental significance: contamination of data
by uncertainty can be minimized (by using formulae which are based on the optimum gnostic cycle)
but **never completely removed**.

These variation features are proved for a single uncertain data value. Notions of **entropy
and
information of an individual datum** are introduced in an natural manner to measure the amount of
the datum's uncertainty. They both are also subjected to the variation principle which means that
application of the gnostic formulae **minimizes both the information loss and the increase in
entropy caused by the uncertainty.** This optimality, proved for individual data, is then extended
to data samples.

A gnostic analysis of the consequences of the first axiom reveals a surprising correspondence
between this virtual movement of data and the real movement of free particles within the framework
of relativistic mechanics. This correspondence is really far-reaching: it binds not only the
virtual kinematics of the uncertainty with the real kinematics of relativistic particles but also
ties the dynamic characteristics of individual moving particles such as energy and momentum to the
entropy and information of individual data. The algebraic statement of the first axiom of gnostics
also leads to these characteristics. Moreover, this correspondence gnostics <->
relativistic theory is invariant with respect to the class of (Lorentz's) transformations of
coordinates^{[5]}.

The theoretical consequences of this correspondence are of fundamental importance because they
prompt the second axiom of gnostics, the **composition law** for uncertain data^{[6]}. This motivation is
strong because it is derived from the energy-momentum conservation law of relativistic physics.
This composition for uncertain data is necessary to ensure that composed data are consistently
mapped onto the composed physical objects.

The correspondence between gnostics and relativistic mechanics might seem far fetched, but only
at first sight. A correspondence has existed for centuries between classical (Newtonian) mechanics
and the main characteristics of uncertainty that are used in statistics: the arithmetic mean of
data corresponds to the center of mass of a system of (Newtonian) particles, variance of data
corresponds to the kinetic energy of the particles and covariances correspond to particles'
momenta. Under these circumstances, the additive composition law used in statistics for both data
and the first and second statistical momenta can be viewed as having been based on the composition
law of Newtonian mechanics. This correspondence did not come about by chance: to fit the assumed
(elliptical) orbits of the planets to the observations of their actual positions, it was
**natural** to minimize errors expressed in terms of the energy and momenta of the deviations of
the
theoretical orbits from the observed points. Let us be consistent and also accept the adjective
``natural'' for the gnostic axioms because they not only conform to the nature of data but also
lead to such unexpected consequences as entropy and information changes caused by uncertainty and
their strong correspondence with the (experimentally verified) relativistic mechanics.

These and other theoretical consequences of the two gnostic axioms lead to results which can be summarized in the following way:

- Using the formulae and procedures of mathematical gnostics maximizes the information quality of results.
- Probability and entropy as measures of uncertainty are
**results**of the gnostic theory and are applicable even for individual data. This frees gnostic methods from the necessity of using a priori models of data. - The variational features of the Ideal Gnostic Cycle show that a close correspondence exist between thermodynamics and information processes.
- A second order partial differential equation which describes the conversion of the information conveyed by an individual datum into its entropy (during quantification) and back into information (during estimation) is proved. The old and well-known idea of `Maxwell's demon' thus obtains a precise mathematical form.
- The impossibility of the estimation process to completely remove the uncertainty introduced during quantification can be interpreted as a parallel to the Second Law of Thermodynamics: a machine which generates information cannot exist and the mining of information from uncertain data is irreversible process.
- The Lorentz-invariant isomorphism of the quantification process with an operation of relativistic mechanics demonstrates the integrity of the idea that information is as much of a dimension of natural processes as space and time coordinates.
- The natural, inherent robustness of gnostic characteristics of data uncertainty is proved: they provide a choice between robustness with respect to outlying or to inlying data (emphasis on either central or peripheral data).
- A close connection between entropy and information content and the probability distribution
of an
**individual**datum is proved. - A demonstration that in the case of a very weak uncertainty (nearly precise data) the gnostic characteristics of uncertainty approach the statistical ones. In other words: the basic statistical formulae provide the same results as those obtained from gnostics in the case of ``good'' data. However, in the case of ``bad'' data, gnostic formulae should be used so as to minimize the uncertainty.

The application of these theoretical results to both the individual datum as well as to data samples is based on the composition axiom mentioned above which enables the following gnostic methods to be derived:

- Robust estimation of scale and location parameters of data samples.
- Robust estimation of bounds for the data support.
- Robust and objective test of whether a datum is a ``true member'' of a data sample. The result of this test is unique, and dependent only on the data.
- Robust test for the homogeneity of a data sample.
- Robust estimation of covariances and correlations.
- Robust test for degrees of similarity between data samples.
- Several types of estimated probability distribution functions (d.f.) and densities for a data sample suitable for both uncensored and censored data:
- A global d.f. which characterizes the overall behavior of a
**homogeneous**data sample. This d.f. also manifests unique features of robustness which can be chosen: - robustness with respect to the
**outlying**data (strong disturbances), - robustness with respect to the
**inlying**data (inner ``noise''). - A local d.f. which is suitable for application to
**inhomogeneous**data samples. - Robust ``cross-section'' filtering of data of a sample.
- Marginal (unidimensional) cluster analysis and decomposition of an inhomogeneous data sample into several homogeneous ones.
- Robust filtering and decision making for time series.
- Robust uni- and multivariable, linear and non-linear modeling. Models can be either explicit or implicit types, characterizing the interdependences between variables or between the probabilities of variables.
- Robust multidimensional cluster analysis and decomposition of multidimensional inhomogeneous data samples into homogeneous ones.
- Robust ordering of multidimensional objects which is not based on the subjective choice of a multidimensional criterion function.

The gnostic theory and the theoretical background of its methods are presented in Part I (The Gnostic Theory of Individual Uncertain Data) and in Part II (The Gnostic Theory of Data Samples) of the book.

Part III provides examples of the application of gnostic methods to real data taken from financial statement analysis and from equity and foreign currency markets. The study includes the cross-sectional financial analysis of an industry, the analysis and prediction of a multivariable series of economic indicators and indexes, estimation of the actual (inner) price of shares, decision making as applied to time series etc.

The application fields of gnostic methods are as broad as those of statistics. Indeed, worthwhile results can be obtained by using this methodology to analyze production quality assessment, automatic control systems working under conditions of heavy-duty, issues in medicine, biology, psychology, as well as military monitoring systems. The theoretical and software parts of the book have a universal applicability, however, the focus of the illustrations is on economic problems since these data are, on the one hand, the least reliable as well as expensive to acquire, and on the other hand, decisions based on their interpretation have serious financial consequences. Examples of the power of the gnostic methodology when it is applied to economic problems include (but are not limited to) advanced financial statement analysis which in combination with the robust multidimensional models allow among other tasks:

- a really `true and fair' estimate of the financial position of firms to be obtained,
- the under- or overestimation of a market stock price to be recognized and forecast a quarter ahead with a probability of success of over 0.7,
- multidimensional ordering of financial positions of a group of firms (objective---mathematical---rating of firms),
- on-line multidimensional monitoring of a firm's financial dynamics to derive timely signals of dangerous situations,
- financial managers to obtain reliable recommendations as to proper intervention to control the firm's finances,
- market analyzes for optimum decision making for a firm's senior management,
- clusters of economically comparable firms for both intra- and interindustrial comparisons to be identified and to use the cluster's models for efficient decision making,
- the quality of production or of services to be assessed so as to both robustly and sensitively identify problems and also to efficiently recommend ways to solve them.

A characteristic feature of gnostics is its pragmatism: the theory has been used to create algorithms for solving important problems in various application fields, examples of which are illustrated in Part III.

**Footnotes**

[1] | Former senior scientist of the Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague |

[2] | Visiting Scholar, The George Washington University, School of Business and Public Management |

[3] | Cognition is used in the sense of the process used to obtain knowledge. |

[4] | The notion of ``paradigm'' is understood to mean the prevailing opinion of the scientific community as to the arrangement and operation of things within a certain scientific field. |

[5] | Mathematical manipulations to translate coordinates in one frame of reference to coordinates in a second frame which were conceived in the 19th century by the Dutch mathematician Hendrik A. Lorentz to explain optical and electromagnetic phenomena. In our case, the frames are respectively that of the unknown ideal value and of the observed value. The Lorentz's transformation rotates the radius vector pointing to the image of the datum, but only during the quantification process. For estimation, the transformation is a Euclidean rotation. |

[6] | A composition law describes how elements belonging to a mathematical structure are to be combined (generally by addition and multiplication of elements or of their functions). |

- Gnostic theory of uncertain data
- My home page
- E. Hala Laboratory of thermodynamics
- Main page of the Institute of Chemical Process Fundamentals
- Z. Wagner's official personal page (with contact address)
- Z. Wagner - Private pages (not related to my scientific activities)
- What are software patents good for?

Friday, 09-Sep-2005 10:19:28 CEST