Статья опубликована с лицензией Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0) – Лицензия «С указанием авторства – Некоммерческая».

Вернуться к содержанию

Historical informatics

Правильная ссылка на статью:

Thaller M. Modeling of History: Today and in the Future // Историческая информатика. 2017. № 3. С. 7-19. DOI: 10.7256/2585-7797.2017.3.24731 URL: https://nbpublish.com/library_read_article.php?id=24731

Modeling of History: Today and in the Future / Моделирование истории: сегодня и в будущем

Таллер Манфред

доктор исторических наук

профессор, кафедра обработки историко-культурной информации, Кельнский университет

D-50931, Германия, Nordrhein-Westfalen область, г. Köln, ул. Albertus-Magnus-Platz, 2

Thaller Manfred

Doctor of History

Professor, Department of Historical and Cultural Information Processing, University of Cologne

2, Albertus-Magnus-Platz, Köln, Nordrhein-Westfalen, Deutshland, D-50931,

manfred.thaller50@mail.ru

DOI:

10.7256/2585-7797.2017.3.24731

Дата направления статьи в редакцию:

10-11-2017

Дата публикации:

17-11-2017

Аннотация: Статья посвящена интерпретации термина «моделирование», который имеет давние традиции в разработке компьютерных приложений в исторических исследованиях. Так, международная конференция Ассоциации "History and Computing", состоявшаяся в Москве в 1996 году, в качестве основной темы выбрала моделирование. Автор отмечает, что особая роль в разработке этой концепции принадлежит Уилларду Маккарти, который создал понимание того, что моделирование является центральным пунктом всех попыток использовать информационные технологии в гуманитарных науках в целом. Автор статьи избегает общности «цифровых гуманитарных наук» и ограничивает рассмотрение применения информационных технологий лишь историческими исследованиями аналитического характера. В статье рассматриваются приоритетные подходы к моделированию в истории, в том числе методологические аспекты моделирования, модели как реализация вычислительных алгоритмов, модели как компьютерные устройства, модели текста, модели смысла, модели для компьютеризованных исторических исследований. Впервые обсуждаются все апробированные подходы к моделированию в истории. Автор отмечает, что термин «моделирование» теперь очень заметен, но всё еще не вполне ясен - оригинальная концепция Маккарти пока не может быть самым четким определением моделирования как предпосылки для применения компьютерных методов в гуманитарных науках.

Ключевые слова:

моделирование, информационные технологии, компьютинг, симуляция, разметка, квантификация, цифровые гуманитарные науки, текстовой контекст, семантические технологии, модели смысла

Abstract: The article interprets the term “modeling” which has had a long history related to the development of computer applications in historical research. For instance, the international conference held by the Association “History and Computing” in Moscow in 1996 announced modeling the key topic. The author notes a special role of Willard McCarty who formed our understanding of modeling as a key point of all attempts to use information technologies in humanities as a whole. The author avoids the general character of “digital humanities” and limits the study of information technologies application by analytical historical studies. The article addresses foreground approaches to modeling (methodology aspects of modeling as well), computational algorithm models, models as computer devices, text models, models of meaning and models for computerized historical studies. It is the first time when all time-tested approaches to modeling in history are discussed. The author notes that the term “modeling” is well known but is still vague. McCarty’s original conception cannot be the most distinct definition of modeling as a precondition to use computer methods in humanities.

Keywords:

models of meaning , semantic technologies, textual content, digital humanities, quantification, marking, simulation, computing, information technologies, modeling

“Modelling” is a term, which has a long tradition in the development of computer applications in the historical studies. Leaving aside the appearance of the terms in individual papers, one of the first volumes produced by the “workshops” of the international Association for History and Computing was dedicated to it ^[6], and the international conference of the Association in Moscow in 1996 had “modelling” as its conference theme ^[2].

In the wider interdisciplinary domain of applications of information technology to the Humanities it appears prominently first in the very visible Companion to Digital Humanities of 2004, in the chapter written by Willard McCarty ^{[10, pp. 254 – 270]}, and the same author established the importance of the term in the following year with his highly influential Humanities Computing ^[11]. Indeed he created the implicit and explicit understanding in the meantime, that “modelling” is at the heart of all attempts to employ information technology at any but the most trivial levels in the Humanities in general. As is frequently the case with things on which an implicit consensus has been established, that made the term modelling very prominent, but not necessarily very clear –the original concept of McCarty may still be the clearest definition of modelling as a prerequisite for computational methods in the Humanities, despite its ubiquity in recent literature.

We would like in the following to avoid the generality of the “Digital Humanities”, which are probably too vague as term of reference, and restrict ourselves to the application of information technology to historical studies - and restrict ourselves even more, by considering only those which claim analytic implications. For this domain we would like to differentiate between understandings of the term “model” as they have been used throughout the development of history and computing during the last few decades.

I. The epistemic ubiquity of models

The type of historical research which has always been most suspect for traditional, historians has doubtlessly been Cliometrics, the application of methods derived from the canon of the economic sciences towards the past. It is inseparably connected to Robert William Fogel, winner of the 1993 Nobel Prize in Economic Sciences. As one of the most visible protagonists of Cliometrics he engaged in a discussion in the eighties with one of the most outspoken critics of all attempts to open historical research for interdisciplinary approaches, especially approaches involving quantitative methods, Geoffrey R. Elton. This resulted in a book confronting their methodological viewpoints, which contains the following quote from Elton, attacking Fogel:

Models do dictate the terms of reference, define the parameters, direct the research, and thus are very liable to pervert the search for empirical evidence by making it selective. ... One would feel happier if those models were derived from a study of the evidence and not borrowed from supposedly scientific work in the social sciences – if, that is, historical method were allowed to control the borrowing. ^[3]

The interesting thing in this quotation is not the unsurprising discovery, that Elton dislikes economic models, but that he indeed is willing to accept the need for models in principle; albeit only such he considers constructed according to his understanding of historical methodology.

We do not have the space to follow this decidedly non-quantitative and not formalized understanding of modelling throughout the methodological literature of historical research – though we cannot avoid to point to the Max Webers's ideal type – Idealtypus – boldly claiming a sociologist for history who wrote his doctoral thesis on The History of Commercial Partnerships in the Middle Ages and his Habilitation on Roman Agrarian History and its Significance for Public and Private Law.

We feel encouraged to skip proving the statement, that modelling, in one form or the other, is deeply embedded into historical analysis, as we recognize that linguists who are not bogged down into syntax, but focus on semantics, have claimed, that all our thinking is enabled by the capability to understand metaphors, which are the most lightweight type of model ^[8], or the statement of cognitive science, that all our cognitive abilities rest on the fundamental capability to connect separate conceptual spaces, making sense of one by interpreting it in light of the other ^[4].

As soon as we dig into such broad fields as cognitive science, we are of course relatively far away from the practical needs of quantitative or any kind of formal analysis. To move back to it: one of the earliest pioneers of computer applications in archaeology, Jean-Claude Gardin, describes the impact of the requirements for the application of any kind of computer application to the Humanities as follows:

The reproduction of certain types of reasoning on the computer imposes a preliminary analysis of mental processes in terms and at a level of precision which is rarely encountered in the Humanities. It often results in cruel discoveries as to the credibility of theories or ,constructions' which are the products of such reasoning ... ^[5]

And some of his later work could be summarized as claiming that the point of computer applications in archaeology is not so much the resulting analysis, but the more precise formulation of the categories on which this analysis rests. My summary of the arguments in Jean Claude Gardin: Le Calcul et la Raison, Paris, 1991.

As a first intermediate summary:

(1) There is good reason to assume, that we cannot meaningfully understand reality, past or present, if we do not have some conceptual notion how individual phenomena probably interact, a conceptual model.

(2) Any attempt to apply computer methods to help in that process requires a precision, which goes beyond the kind of model we permanently apply unconsciously.

II. Models as computational trivia

What all courses in statistics and programming have in common, is that the notion of a “variable” turns up sometime during the very first lecture or chapter. Defining your variables can easily be seen as the acquisition of the additional precision required from a model fit for computational purposes, beyond the conceptual one, as diagnosed in the previous paragraph.

Indeed, most historians (or, indeed, humanists) who apply a statistical procedure or a computational technique for the first time, get so intrigued by the requirement to define their variables that the resulting set of them is quite frequently given a prominent place in conference papers of researchers new to the field. In the eighties and nineties it is almost impossible, to open the proceedings of a conference without looking at the schema of the database employed by a project or the variables used for it. Similarly, from the nineties onwards, there are very few proceedings where one does not find examples of the markup schemes used in a project. We are not changing the subject here: the decision to mark up a specific property – a topographical name, for instance – is exactly the same as the decision to define a variable in a statistical data set for the purpose of examining the geographical dimension of a historical phenomenon.

And many of the authors of both, data base tables in the eighties, markup schemes more recently, will claim that the set of tables for their data base or the markup scheme of their collection of texts, represents the “model” they use in their study. This is, of course a misunderstanding. When we look at phenomena of social history, the “model” we try to implement by the variable “occupation”, is not the set of terms allowed in a controlled vocabulary, but the abstract dimension, for which we consider an occupation to be an indicator. The “model” which leads to the definition of a variable “occupation” is represented by the decision of the researcher between a concept of the society being either governed by strata or classes or the abstract categories of another theory of societal interactions. Whether the variable used for that purpose is a field of twenty characters or a code number is a mainly technical decision; this does not constitute the increase of precision required by the application of computational technologies. Similarly the question, whether you encode two different characteristics of the portion of a text by two different XML tags, or by two attributes of the same tag, is independent of the reason why you want to indicate the presence of the textual property represented by these two characteristics in the first place.

Well … The decision to encode an occupation by a numerical code rather than a character string, may of course be an indication, whether you assume to know already at the beginning of the study, which categories you will encounter when you examine a historical source or whether you decide to postpone the assignment of a term to an abstract category to a later stage of your analysis, when you know a bit more about the terms which actually occur. The decision to encode a textual property by an attribute of a general tag is a decision for a solution, which makes the introduction of additional characteristics easier; the decision to use different tags effectively represents a claim that you know all relevant characteristics which will appear before you start.

Or, to summarize:

(3) Schemes of variables and markup implement a conceptual model, they are no model.

(4) Technical details of the definition of a variable or a markup scheme nevertheless depend on conceptual assumptions.

III. Models as computational devices

A set of variables is no model, but it may implement it. The consistency of this implementation is reflected by the possibilities a model opens up.

In social demography / history of the family, for example, you can usually at least describe a phenomenon, like the influence of a position in the societal system upon the age of marriage. You may be able to test hypotheses about this influence, if the derivation of age from the sources is sufficiently consistent, that you can be sure, that age differences are not only statistically significant, but beyond the numerical fuzziness created by the habit of rounding ages in demographically relevant sources right up to the end of the 19th century. Test these hypotheses, that is, by the usual statistical methods based on probability theory and the notion of the significance of a result derived from it.

There is of course a long tradition of tests going well beyond that: Already in 1978 Kenneth Wachter, Eugene Hammel and Peter Laslett published the results of a micro simulation, in which they simulated the demographic developments in historical communities and compared the frequency of family types predicted against the empirically observed occurrence of these types ^[17]. The difference in scope between the basic testing of isolated hypotheses and the testing of a complete model by a simulation can scarcely be overestimated. Nevertheless, one has to observe, that the number of examples of such studies is quite small. And those that exist have made little impact: While the world we have lost ^[9]justified the existence of the Cambridge group as a centre of family history and demography for decades, the simulation study we mentioned was not even noticed much in the family history community ^[15].

This is somewhat frustrating from a computer science sense, as only in a simulation a “model”, as a consistent test of assumptions about the interaction of the set of variables representing each observation, gets sufficient computational substance to observe the dynamics of a development. Data models which allow one to study a snapshot of a historical development are necessary to do anything with information technology, but they just model a static view, or a series of static views, not a dynamic development or process.

One of the reasons, that the microsimulations of 1978 never received the visibility of the presentation of snapshots of a changing system, has of course been, that to understand them required the willingness to engage in a rather challenging methodological discussion of quantitative results. It is interesting, that more recently multimedia simulations which test intuitive assumptions have enjoyed much greater visibility: The best known example for this is still the Virtual St. Paul's Cathedral Project [ https://vpcp.chass.ncsu.edu/ - accessible September 12th 2017 ] which uses a combination of a visual “model” of the (pre-1666) St. Paul's Cathedral and an acoustic model of the effects of its geometry upon a sermon preached in the context of environmental noise, to recreate a soundscape of a historically significant event.

The concept of a “model” is more complicated here as it looks at first. Specifically as two models are combined, which are quite different. On the one hand, we have a “model” as a set of assumptions about the acoustic results of an environment with, among others: echo effects upon the voice of a speaker; the distribution of noise in such an environment created by large groups of people listening, but not being completely quiet; the effect of other environmental sources of sound. This is a strictly dynamic model, which implements assumptions about the interaction of variables depicting a process. Here a verification of previous assumptions is at least partially possible. If speakers cannot be heard by the perceived audience, according to our knowledge of acoustics, the reasons for their influence upon such an audience must be different from the rhetorical brilliance ascribed to them.

On the other hand, there is a “model” in the project, which derives a 3D projection of the geometry of a building. This “model”, however is in no way the model of a process, but simply a geometrical drawing, covered by various 2D textures. Unlike the acoustic model, it does not generate a result from a set of assumptions about the way in which the object has been created but simply displays a graphic. 3D models, which could compare the results of a simulation of a building process, reflecting contemporary technology or assumptions of architectural intend, are still far off. So the fact that you can show an elaborate 3D model of a vanished building is no proof that contemporary building techniques were able to build it.

All of which we have mentioned because:

(5) Models may simply be understood as a framework for the process, by which part of the reality is depicted in the digital sphere: a 3D image on the screen just renders a (possibly only hypothetical) geometrical description.

(6) Closely related models may, however, also be seen as basis of a dynamic process, which progresses from a well understood starting condition and delivers a prediction of a result, the effect of which can be compared with the assumptions of previous interpretations.

IV. Models of text

As we mentioned in the introduction, the extremely high visibility the term “modelling” enjoys currently in the interdisciplinary discussions comes mainly from the Digital Humanities, predominantly connected to philological studies. Indeed, McCarty's diagram showing the stages of modelling between a Humanities question and the support for its solution by computer science starts from a “cultural artefact (poem, painting &c)” and leads via the “artefact as system of discrete components and relations” to the “machine as an operational model of the system” ^{[11, figure 4.2. p.197]}.

This is an important observation, as it may indicate a difference between what modelling means to a historian as opposed to modelling following McCarty. For a historian, at least in the definition of history employed by the author, a “cultural artefact” is not studied as a system but as an indication of the state of the societal or cultural system which produced it. McCarty would probably protest against that interpretation of his intentions, as his scope of modelling certainly is much wider and considers also the conceptual models employed by a discipline used for the interpretation of information derived from artefacts. But that it is so easy to understand a hierarchy of models starting with an artefact and arriving at a model of the “system” represented by the variables describing that artefact, is probably the reason, why the bulk of the current discussion in the Digital Humanities finds it extremely difficult to differentiate between modelling and encoding: indeed, many discussions about modelling in the Digital Humanities lead directly into rules of how to apply the encoding instructions of the Text Encoding Initiative, which is valuable for many things, but has so far no recognizable underlying model of what constitutes a text, which would be independent of the description of the tags one might embed into it [ I notice with interest, that the energetic defence of the TEI against all misrepresentations most recently presented by James C. Cummings at the DH2017 conference, does still not claim, that it has an underlying abstract model: James C. Cummings: “A World of Difference. Myths and Misconceptions about the TEI”, in: Digital Humanities 2017. Conference abstracts, pp. 208-210, https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf accessible Sept. 12th, 2017].

This reflects a tradition which philological studies certainly had for a long time. Whether they still do, depends on the representative of these disciplines you talk to, some of them arguing emphatically that it is a thing of the past: the focus on the canon of the great masterworks of literature. The more you subscribe to the notion, that a literary artefact is unique, the more obvious it is, that a model of that artefact must emphasize the uniqueness of this specific one. Only if you are interested in that literary artefact as the result of an intellectual climate – or indeed, process within it – in a specific stage of development, there can be an interest in a model which goes beyond the individual item. This was described already in the early nineties: Jerome McGann in his influential Radiant Textuality mentions, that the great scepticism of literary scholars against the notion of an encoding standard of any type was derived from their understanding, that it was the very definition of a literary work that made it different from any other ^{[12, pp. 139 ff.]}.

The discussion about the encoding of texts as a pre-requisite for their analysis, or at least processing by computers, has therefore been focusing mainly on the most appropriate way of preparing an individual text for such processing. Which in the loose categorisation of models we have derived so far, would definitely be headed under “models as computational trivia”. A very interesting development beyond that, when one looks at the epistemological effects of information technology within philology, is the focus on “distant reading”, which arose during recent years.

Summarizing a school of research within seven lines is always dangerous. The following paragraph is my interpretation, not necessarily that of one of the representatives of “distant reading” as a currently highly visible trend in the Digital Humanities.

You can appreciate and analyse a work of literature as a unique item. To understand it better, you may look at other literary items secondarily, be the other contemporary literary creations or precursors or successors in a tradition. Literary studies so far are described by that. On the other hand, you can try to get a feeling for what is common in a large body – thousands or tens of thousands of texts – of literature as the result of a process responsible for their production and use that understanding to interpret the position of an individual literary item. The latter is my attempt at defining “distant reading”.

“Distant reading” as such is only possible with the help of information technology; to make it possible, you have to have thousands or tens of thousands of texts available in machine readable form – and there is no way to get trends from them, unless you apply quantitative and statistical summaries of the textual features in those millions of pages.

“Distant reading” therefore starts with the statement, that traditional literary studies are actually ignoring most of the existing literature ^[13]. As here the general concepts, beyond the individual item, are of primary concern, it is not really astonishing, that the author who invented distant reading as a concept is also the author who produced so far the most consistent attempt at models of textual content beyond the individual text ^[14].

Whether out of this more recent development, a more general abstract model of texts arises, which is as close to the implementation of individual technical solutions as the TEI, remains to be seen. From the point of view of quantitative methods it is so far a bit disconcerting, that there is veritable flood of studies which currently try to implement distant reading primarily by various visualizations. Disconcerting, as one should remember, that the once famous title How to Lie with Statistics ^[7] did strictly speaking not treat statistical methods at all, but only the ways, how to visualize the results produced by them. On a more abstract level, that the current visualizations are usually not grounded in probability theory is no real consolation either, as the various tools are based on idiosyncratic heuristics and are not method invariant. That the huge majority of end-user visualizers seems to be ignorant of the problem of method invariance does not really improve the situation.

Historians – or some at least - actually have been aware of the problem, that the sources they consulted are only the tip of the iceberg: Theodore Zeldin's monumental history of France between 1848 and 1945 ^[18] consisted of 2000 pages, in which he went through numerous strands of French history, in politics, society, education and many more. In all of these he described the traditional view and then did show in some detail, that this view was based on an extremely tiny (and presumably highly biased) description of the existing sources. Unfortunately at his time information technology was simply not up to an attempt at “distant history”. Such an attempt – minus the zeal for questionable visualizations – would be a major hope for historical research.

Summarizing:

(7) As textual scholars have so far focused on the uniqueness of texts, an abstract model of text beyond rather trivial considerations of processing does not exist.

(8) Understanding, that information technology allows us to do away with constraints of textual canons, may help us to get such models.

V. Models of Meaning

The Semantic Web ^[1] is one of the greatest promises of information technology. It describes a world, where information in the internet is smoothlessly integrated on the fly, all existing sources of information automatically and dynamically referencing each other. And then, maybe it has been one of the greatest promises. Of the various layers of technologies, which were supposed to realize it, the first four – Unicode + URIs; XML + xmlschema; RDF + rdfschema; ontologies – have been operative within four or five years after the seminal paper in 2001. Layer five – logic – exists in academic papers and layers six and seven – proof; trust – are only marginally less shadowy now than seventeen years ago.

This seems to be a harsh statement. When one looks at the program of recent conferences in all branches of history and the Humanities, the number of papers on various semantic technologies and derived activities – linked open data, most prominently – abounds. Ontologies for many domains of knowledge exist and continue to be developed further. Nevertheless, the wide visions of 2001 look only slightly less visionary today. But maybe we should ignore the vision and concentrate on the question of why the semantic technologies as such are so obviously popular with Humanities scholars and only later come back to what restrictions the less encouraging wider view may impose on their further development within history and the Humanities.

The most prominent achievement of the semantic technologies within the Humanities is certainly the CIDOC Conceptual Reference Model (CRM) [ http://www.cidoc-crm.org/versions-of-the-cidoc-crm accessible Sept. 12th 2017. No authors are given, as the authors of the current version, which is not directly addressable, change over time] , which arose in the cultural heritage domain, as an international standard for the controlled exchange of cultural heritage information. More prosaically: as a possibility to connect information contained in the various types of catalogs for libraries, archives and museums. It has in the meantime been extended well beyond the originally addressed users and is now also used to encode “knowledge bases”, which are only loosely connected to the original finding aids of the individual institutions, say structured biographies of persons, who can be found as authors in the catalog entry for a book, or an archival document, or as artist of a piece in a museum – or as a person mentioned in the text of a book, scanned and converted by OCR into a full text available for analysis. These descriptions are organized in a way promising that as soon as the biography of an author is changed in one of these biographical knowledge bases, all the catalogs (or other knowledge bases) which refer to this person can immediately use the more complete information.

A service which should be available to everybody who knows how to address the individual ontologies – i.e., catalogs and supporting knowledge bases. Everybody – that is, also the individual researcher, who wants to let his or her private data base look for information on objects and persons appearing as part of a research project, without the necessity to look for the individual reference object explicitly. As a promise: whensoever I enter a new person into my data base, the software administering it will look worldwide, what information is known about this person.

The enthusiasm for this approach and the undoubted success it has achieved within larger information systems of the cultural heritage domain (much more rarely in private data bases, so far), is rooted in the conceptual model underlying the different ontologies. That the CRM is by far the most important one for history and the Humanities is related to an extremely intelligent basic decision: while the definition of what constitutes the basic unit of any kind of dictionary or catalog can be widely variant, the CRM simply assumes that it orders “somethings” which have existed in time and been involved in different events over the period of their existence.

This extremely simple model of a “something”, which has a history, has proven astonishingly flexible and allows indeed for very successful lookup services.

We have mentioned initially however, that the original vision of the semantic web is stuck since some time; stuck actually at exactly the level, where the undoubted successes of the CRM are happening: the layer, where the semantic information ordered in the form of ontologies shall be used by higher order services for inferences.

The problem here, at least in the opinion of this author, is a simple, but subtle one. An ontology is a model describing some fragment of reality with the help of categories (in the case of the CRM: “classes” and “properties”) which can be assigned values. I.e., they are variables from a computational point of view. And, as we have observed in section II above, the values a variable can take, reflect a conceptual model of the phenomenon described. So an ontology – even one as powerful as the CRM is undoubtedly – describes relationships between instances of variables, which are meaningful only within the semantics of the person or process selecting the values of these variables.

In the case of the semantic technologies, this problem is theoretically solved – as the value assigned to a class or property can itself be a reference to another ontology. So, if all social historians agree upon one conceptual model of the societal system reflected in occupational terms, the semantic technologies provide a way, to implement that model. If they do not agree, the two ontologies cannot be interconnected correctly, however. Or rather: with an effort only, which so far has proven prohibitive.

In the world at large, this is probably the reason, why the “semantic web” as such is stuck. How large the scope of necessary semantic agreement between historians is to arrive at true integration remains to be seen.

Most importantly:

(9) The semantic technologies, most notably the part of them connected to the creation of ontologies, provide a model for the representation of semantics interconnective at the technical level.

(10) In practical implementation, ontologies provide this technical interconnectivity only at for categories at a relatively high level, while the values of the technically precisely modelled categories rather soon are ones, which reflect an often only implicit conceptual, not a precise and explicit technical, model.

VI. Models for computer supported historical research

Many of the types of “models” – or maybe, more loosely: the usages of the term “models” – which occur and have occurred in the last few decades of applying information technologies to the Humanities in general and historical research in particular, are very close parallels to problems in information technology in general. In the last section we already mentioned, that the problem that inconsistencies between idiosyncratic semantic descriptions in an ontology have a tendency to be pushed down under several conceptually clean and unambiguous layers of surface categories. This is the more problematic, the wider the scope – which explains why what may be promising in the relatively narrow domain of history or the Humanities may be hopeless in the still-not-semantic web.

I would not want to close however, without pointing to a topic which over the years has interested me seemingly more than most other people: the question, whether there are some properties of information in historical research, which are different from information as processed in information technology more generally. Some of that can become quite abstract and possibly look esoteric at first glance ^[16]. So let me restrict myself to a rather small, seemingly trivial example.

Time. In the first newsletter, with which I started my career in 1979, I described the necessity to implement solutions in historical data bases to handle temporal information – calendar data – differently than in contemporary data bases. Historical sources contain dates in strange formats – quoting the feast of a saint, rather than the day of a month; in many sources dates have to be modified when used in computations – say in sources mixing Julian and Gregorian dates; virtually all historical data bases contain time spans – June 15th – July 10th 1870; may disciplines use epochs – second half of 16th century. In 1979 I proposed a technical solution for this within the software I was working upon.

Since than I have listened to and read papers describing solutions to subsets and supersets of the same problems innumerable times, usually by authors who were not even aware that others had been working on these problems before.

This endless reinvention of a wheel which could be rolling smoothly since a long time can be stopped only, if we arrive at a situation, where the technical model for the concept of time implemented in computer technology – “an integer offset from a day zero” – is modified to allow the kind of temporal formats and queries, which historical disciplines need. And this model has to be hidden at the same low level in the technology stack, as the current one is. Only then can historians use the concept of time necessary for them as easily, as time for current purposes can be used in computer systems today.

As mentioned, this is an intentionally trivial example for a problem which can become quite fundamental: how far is the model of information underlying current information technology appropriate for handling information as handled in historical studies?

Or:

(11) Information technology today is built upon a model of the information to be processed, which is derived from engineering and the hard sciences.

(12) Only if we manage to replace or extend it by a more general model also reflecting the requirements from information as handled by the Humanities in general and history in particular, will we progress beyond existing limits.

VII. Summary

“Modelling” is a term which has enjoyed great popularity in the discussions of all applications of information technology during the last decade, which has not necessarily contributed to the clarity of its meaning. The various ways in which we have proposed to use it in the sections above can be seen as an attempt at clarification. They could also be seen as an attempt to find a red line guiding through the development of the field during those decades.

That conceptual models are a prerequisite for thinking about the past cannot really be doubted by most schools of thinking in historical methodology. The difference between historians using computational tools for analytic purposes and such who don’t is that the former are forced to use a greater precision in the variables in which they implement their models than the latter may be the major difference between the approaches.

While most sets of variables used in historical research so far implement models, which allow only to study relationships within a snapshot of an historical process described by these models, simulation uses models to test not a snapshot, but a conceptual model of the process producing that snapshot. Such models are more difficult to implement, though they have existed for a long time. That they are difficult to implement, may not be the major reason for their scarcity, however: if they are difficult to implement, to make them appreciated by most audiences is even more difficult. This may change radically, when we use such models in a way where they create results to be communicated by multi-media.

Beyond these developments, which are deeply rooted in classical quantitative methods, the more recent decades have seen two new modelling problems appear. Original attempts to encode texts by agreed upon tags lacking a conceptual model may be developed further into a model of text, which goes beyond the individual textual witness to an understanding of text as the result of a societal process. Attempts to make rather mundane finding aids interconnected may lead to a more general model of the semantics needed to describe the past. Both developments will be major driving forces of the future of historical computing.

How easy these developments will be, will largely depend whether we find ourselves able as a discipline to adapt current models of information underlying information technology as a whole to models supporting data of the past more generally.

Библиография

1. Berners-Lee, Tim and Hendler, James and Lassila, Ora: “The Semantic Web”, in: Scientific American Magazine, May 17th, 2001, https://www.scientificamerican.com/magazine/sa/2001/05-01/#article-the-semantic-web accessible Sept. 12th 2017.
2. Borodkin, Leonid and Doorn, Peter (eds): Data Modelling, Modelling History, Moscow, 2000.
3. Elton, Geoffrey R.: „Two kinds of History“, in: Robert W. Fogel and Geoffrey R. Elton: Which Road to the Past? Two Views of History, New Haven, 1983, pp. 73-129, here: pp. 119-120.
4. Fauconnier, Gilles and Turner, Mark: The Way we Think. Conceptual Blending and the Mind's Hidden Complexities, 2002.
5. Gardin, Jean Claude et al., Artificial Intelligence and Expert Systems, Chichester etc., 1988.
6. Greenstein, Daniel (ed.): Modelling Historical Data. Towards a Standard for Encoding and Exchanging Machine-Readable Texts. St. Katharinen, 1991.
7. Huff, Darrel: How to Lie with Statistics, New York, 1954.
8. Lakoff, George and Johnson, Mark: Metaphors we Live by, Chicago, 1980.
9. Laslett, Peter: The world we have lost, New York, 1965.
10. McCarty, Willard: “Modelling: A Study in Word and Meanings”, in: Susan Schreibmann, Ray Siemens and John Unsworth: A Companion to Digital Humanities, Blackwell, 2004.
11. McCarty, Willard: Humanities Computing, Palgrave, 2005.
12. McGann, Jerome: Radiant Textuality, Palgrave, 2001.
13. Moretti, Franco: “The Slaughterhouse of Literature”, in F. Moretti: Distant Reading, London, New York, 2013, pp. 63-89. Originally published in 2000.
14. Moretti, Franco: Graphs, Maps, Trees, London, New York, 2005.
15. Ruggles, Steven: Prolonged Connections. The Rise of the Extended Family, Madison etc., 1987, pp. 74 ff.
16. Thaller, Manfred: “Between the Chairs: An Interdisciplinary Career”, in: Historical Social Research Supplement 29 (2017), pp. 7 – 106; here: pp. 82-94. DOI: 10.12759/hsr.suppl.29.2017.7-106
17. Wachter, Kenneth W. and Hammel, Eugene A. and Laslett, Peter: Statistical Studies of Historical Social Structure, New York etc., 1978.
18. Zeldin, Theodore, A History of French Passions, Vol. I and II, Oxford 1973 and 1977.

References

Журналы

Книги

Modeling of History: Today and in the Future / Моделирование истории: сегодня и в будущем