Automatic Detection of Corpus Incoherence Through Causal Knowledge Graph

In this paper, we describe a method to detect corpus incoherence on the points of view of multiple related indicators.

We have described in other papers available on www.causalitylink.com how, through our universal data model of finance concepts and our natural language processing system, we are able to extract four types of data structures from a growing corpus of 84 million documents that are important for this paper: indicators, trends, events and causal links.

The indicators and trends describe the permanent variations of what can be called the “signal”, that is the myriad of ever-changing data that are used to describe the financial world: the GDP of countries; the revenues of companies, by country or product; the demand, production and price of commodities; and a large number of other variables. These data can correspond to past measures, or forecasts from different authors, so they all have a date at which they were published, and a date at which they became true, or will become known.

The causal links represent the “model” that people have built over time of causal relationships between these indicators and expressed in the documents we analyzed. Causal links can represent accounting rules, such as “the growth of the sales of Ford explain its increasing profit”, manufacturing constraints, such as “the increase in the price of steel has increased the costs of goods sold by Ford”, and many other relationships that humans have discovered between indicators.

The automatically generated knowledge graph linking our indicators (in the tens of millions as of today) through causal links (about 6 million today) is a large knowledge graph that can be used for multiple purposes, to explain and potentially predict the movements of the different indicators.

One such usage is the subject of this paper: the detection of some types of transient incoherence in the graph.

To read the full piece, click here.