Abstract of the Book

A Nontechnical Informal View of the Book

by Pavel Kovanic[1] and Marcel B. Humber[2]




Quantitative information (``how many'', ``how much'', ``at what cost'' etc.) is one of the necessary elements for cognition[3]. Information of this nature results from a numeric data form which depicts (quantifies) the quantitative features of real objects and processes. Quantification is always perturbed by uncertainties, therefore methods that can suppress the imbedded uncertainty must be used to extract information from such data. The need for information has been growing at an increasing rate and both data and their processing are expensive.

Following the classical definition of an economic good as a good which is scarce relative to the total amount desired and economic efficiency as producing such goods at the lowest possible cost, then the idea of Economics of Information as the production of the maximum output of information given the cost (of data and of their treatment) is reasonable and leads to economic efficiency as well.

The first requirement of a methodology to treat uncertain data is that it must: extract the maximum amount of information from a given collection of data. Such an objective can be achieved through the use of gnostic methods.

This notion of economics of information can be considered only if the amount of information is measurable. Because the harvesting of information is directly related to decreasing uncertainty it is necessary to have at hand and to be able to use a scientific model of uncertainty.

There are several concepts of uncertainty and of its paradigm[4], the most popular of which is the statistical paradigm. However, the statistical paradigm---as are other related concepts---is tied to the uncertainty of mass events: statistical evaluation of a quantity of information is possible only for a large ``family'' of events, not for a single event nor for a small number of events. Moreover, in order to successfully undertake such a task, an a priori model must be available and when the model does not fit the reality of the data, significant damage to the quality of the estimates can occur.

It is natural to think that each piece of data is ``seeded'' with its own quantity of uncertainty. Therefore it follows that the data's informative value should be evaluated datum by datum so as to make the most use of the contribution made by each element. A mathematical model of individual data uncertainty is a principal distinctive feature of mathematical gnostics, also known as the gnostic theory of uncertain data. As with all mathematical theories, gnostics is a system of definitions and axioms; theorems and other results derived from the axioms are proved by means of consistent mathematical methods. Mathematics lives in its own abstract environment where everything is subjected to specific laws which are independent of the real world and its regularities.

But the modeling of real objects and processes is also one of the important tasks of mathematics. Hence, mathematical models must be based on assumptions (definitions, axioms) which correspond to the reality of the subject being considered. Limitations of this nature substantially cut down the freedom ordinarily enjoyed by mathematicians in developing ``pure'' mathematics. Serious problems begin with the notion of the data itself.

Data are numbers but their nature as mathematical objects differs markedly from abstractly defined numbers. In reality, data are the numerical images of quantities which exist objectively and as such, they are the result of special technologies called counting (which produces integers) or measuring (resulting in rational numbers). Taken together, these technologies, defined as quantification, have strict rules which must be observed to ensure the consistency of the mapping ``real quantity <=> number.'' These rules, associated with the creation of markets and trading, have evolved over thousands of years and resulted in the development of measurement theory which demonstrates that real data form special mathematical structures. It is one of the characteristics of gnostic theory that it respects this special nature of data by creating a suitable model of ideal quantification, ie of quantification which is perfectly precise.

Another important feature of real data is that they are more or less contaminated by uncertainties. To take this into account, gnostics models real data as couplets of the true and uncertain elements of observed data. The first gnostic axiom states that real data form a bi-dimensional structure the abstract ancestor of which is called the 2-algebra in mathematics.

To establish a theory of uncertain data, it is necessary to accept one of several potential interpretations of the nature of uncertainty. Over three centuries ago Gottfried von Leibniz explained that the difficulty in generalizing from samples of data arises from nature's complexity, not from its waywardness. This view of the world may also be taken to mean that data uncertainty is not caused by ``randomness'' but by the omission of important factors which affect the data values. Gnostics is based on the idea that the uncertainty manifested in data is caused by insufficient knowledge of the process which produced the data. A pragmatic consequence of this point of view is obvious: insufficient knowledge can be cured by better information while randomness is beyond control.

The uncertain components imbedded in data are thus the consequence of real and recognizable (but not yet identified) factors. These also enter the quantification process and they could, at least theoretically, also be quantified by using an ideal quantification technology. For this reason, gnostics designs the model of real quantification as a symmetric couplet of ideal quantification processes.

Uncertainty reveals itself as data error. To minimize uncertainty, it is necessary to measure its value. The magnitude of such error is taken as the distance between the true (unknown) and the observed data value. To measure a distance, one needs geometry. There are a great number of different geometries, not only the Euclidean one which is popularly taught to every child at school. Which then of these many geometries should be used for measuring uncertainty? An important feature of gnostics is that the data itself derives the ``proper'' geometry. This is the idea of ``letting the data speak for themselves.'' Note that a geometry used for measuring uncertainty in gnostics is ordinarily of a non-Euclidean type and it applies non-linear formulae for measuring errors/distances.

The changes in data values caused by uncertainty can be viewed as a virtual movement along a path within a bi-dimensional space. The form of this quantification path uniquely results from the first axiom and it is proved that it is endowed by maximal features: when the datum driven by the uncertainty passes along the path, the effect of the uncertainty is larger than it would be along any alternative path between the same end points. Such a feature is similar to certain Laws of physics and it is called the ``variational principle.'' It can be thought of as a game between an Observer and Nature. The Observer wishes to discover Nature's secret by applying quantification. Nature defends itself by introducing uncertainty; her strategy is to maximize the damage to the data value given the uncertainty `invested.' Nature's `move' is hostile but the strategy is not malicious: it is honest, fixed once forever. This provides the opportunity for the Observer's countermove: he wishes to return as closely as possible to the original unspoiled data value, to estimate it. Having learned from Nature, he applies the opposite strategy: to return the contaminated datum to its original state, he chooses an estimation path which leads to the opposite extreme: to minimize the damage caused by uncertainty. The quantification and estimation paths form the gnostic cycle which discloses a fact of fundamental significance: contamination of data by uncertainty can be minimized (by using formulae which are based on the optimum gnostic cycle) but never completely removed.

These variation features are proved for a single uncertain data value. Notions of entropy and information of an individual datum are introduced in an natural manner to measure the amount of the datum's uncertainty. They both are also subjected to the variation principle which means that application of the gnostic formulae minimizes both the information loss and the increase in entropy caused by the uncertainty. This optimality, proved for individual data, is then extended to data samples.

A gnostic analysis of the consequences of the first axiom reveals a surprising correspondence between this virtual movement of data and the real movement of free particles within the framework of relativistic mechanics. This correspondence is really far-reaching: it binds not only the virtual kinematics of the uncertainty with the real kinematics of relativistic particles but also ties the dynamic characteristics of individual moving particles such as energy and momentum to the entropy and information of individual data. The algebraic statement of the first axiom of gnostics also leads to these characteristics. Moreover, this correspondence gnostics <-> relativistic theory is invariant with respect to the class of (Lorentz's) transformations of coordinates[5].

The theoretical consequences of this correspondence are of fundamental importance because they prompt the second axiom of gnostics, the composition law for uncertain data[6]. This motivation is strong because it is derived from the energy-momentum conservation law of relativistic physics. This composition for uncertain data is necessary to ensure that composed data are consistently mapped onto the composed physical objects.

The correspondence between gnostics and relativistic mechanics might seem far fetched, but only at first sight. A correspondence has existed for centuries between classical (Newtonian) mechanics and the main characteristics of uncertainty that are used in statistics: the arithmetic mean of data corresponds to the center of mass of a system of (Newtonian) particles, variance of data corresponds to the kinetic energy of the particles and covariances correspond to particles' momenta. Under these circumstances, the additive composition law used in statistics for both data and the first and second statistical momenta can be viewed as having been based on the composition law of Newtonian mechanics. This correspondence did not come about by chance: to fit the assumed (elliptical) orbits of the planets to the observations of their actual positions, it was natural to minimize errors expressed in terms of the energy and momenta of the deviations of the theoretical orbits from the observed points. Let us be consistent and also accept the adjective ``natural'' for the gnostic axioms because they not only conform to the nature of data but also lead to such unexpected consequences as entropy and information changes caused by uncertainty and their strong correspondence with the (experimentally verified) relativistic mechanics.

These and other theoretical consequences of the two gnostic axioms lead to results which can be summarized in the following way:

The application of these theoretical results to both the individual datum as well as to data samples is based on the composition axiom mentioned above which enables the following gnostic methods to be derived:

  1. Robust estimation of scale and location parameters of data samples.
  2. Robust estimation of bounds for the data support.
  3. Robust and objective test of whether a datum is a ``true member'' of a data sample. The result of this test is unique, and dependent only on the data.
  4. Robust test for the homogeneity of a data sample.
  5. Robust estimation of covariances and correlations.
  6. Robust test for degrees of similarity between data samples.
  7. Several types of estimated probability distribution functions (d.f.) and densities for a data sample suitable for both uncensored and censored data:
    1. A global d.f. which characterizes the overall behavior of a homogeneous data sample. This d.f. also manifests unique features of robustness which can be chosen:
      1. robustness with respect to the outlying data (strong disturbances),
      2. robustness with respect to the inlying data (inner ``noise'').
    2. A local d.f. which is suitable for application to inhomogeneous data samples.
  8. Robust ``cross-section'' filtering of data of a sample.
  9. Marginal (unidimensional) cluster analysis and decomposition of an inhomogeneous data sample into several homogeneous ones.
  10. Robust filtering and decision making for time series.
  11. Robust uni- and multivariable, linear and non-linear modeling. Models can be either explicit or implicit types, characterizing the interdependences between variables or between the probabilities of variables.
  12. Robust multidimensional cluster analysis and decomposition of multidimensional inhomogeneous data samples into homogeneous ones.
  13. Robust ordering of multidimensional objects which is not based on the subjective choice of a multidimensional criterion function.

The gnostic theory and the theoretical background of its methods are presented in Part I (The Gnostic Theory of Individual Uncertain Data) and in Part II (The Gnostic Theory of Data Samples) of the book.

Part III provides examples of the application of gnostic methods to real data taken from financial statement analysis and from equity and foreign currency markets. The study includes the cross-sectional financial analysis of an industry, the analysis and prediction of a multivariable series of economic indicators and indexes, estimation of the actual (inner) price of shares, decision making as applied to time series etc.

The application fields of gnostic methods are as broad as those of statistics. Indeed, worthwhile results can be obtained by using this methodology to analyze production quality assessment, automatic control systems working under conditions of heavy-duty, issues in medicine, biology, psychology, as well as military monitoring systems. The theoretical and software parts of the book have a universal applicability, however, the focus of the illustrations is on economic problems since these data are, on the one hand, the least reliable as well as expensive to acquire, and on the other hand, decisions based on their interpretation have serious financial consequences. Examples of the power of the gnostic methodology when it is applied to economic problems include (but are not limited to) advanced financial statement analysis which in combination with the robust multidimensional models allow among other tasks:

A characteristic feature of gnostics is its pragmatism: the theory has been used to create algorithms for solving important problems in various application fields, examples of which are illustrated in Part III.


[1] Former senior scientist of the Institute of Information Theory and Automation of the Czech Academy of Sciences, Prague
[2] Visiting Scholar, The George Washington University, School of Business and Public Management
[3] Cognition is used in the sense of the process used to obtain knowledge.
[4] The notion of ``paradigm'' is understood to mean the prevailing opinion of the scientific community as to the arrangement and operation of things within a certain scientific field.
[5] Mathematical manipulations to translate coordinates in one frame of reference to coordinates in a second frame which were conceived in the 19th century by the Dutch mathematician Hendrik A. Lorentz to explain optical and electromagnetic phenomena. In our case, the frames are respectively that of the unknown ideal value and of the observed value. The Lorentz's transformation rotates the radius vector pointing to the image of the datum, but only during the quantification process. For estimation, the transformation is a Euclidean rotation.
[6] A composition law describes how elements belonging to a mathematical structure are to be combined (generally by addition and multiplication of elements or of their functions).

Friday, 09-Sep-2005 10:19:28 CEST