InfoChem's "ChemProspector - Intelligent search of chemical structures in documents" is one of the winners of the "THESEUS MITTELSTAND" technology contest. THESEUS is a flagship initiative of the German Ministry of Economics and Technology. The goal of the contest is openening possibilities for small and medium enterprises to create the Internet of the future.

Why ChemProspector?

Chemical compounds are mostly patented as generic chemical structures (so-called Markush structures) in order to cover the widest possible range of different substances. A Markush structure usually consists of an invariable core structure with attached variable groups (e.g., R1, R2, M or X) and listings with all possible substituents for the variables R1, R2 etc. The total number of possible permutations often covers an enormous number of different substances.
With ChemProspector InfoChem wants to develop a platform for storage and retrieval of generic chemical structures contained in documents that can be stored in local and in external databases.

Project description

The goal of the ChemProspector project is the automatic abstraction of chemical information from patents, including the Markush structures they contain. The core structures from the graphical depictions on the one hand and the generic fragments included in the text on the other hand will be recognized, and the chemical structures automatically abstracted and merged in an appropriate data format. These representations will be stored in a structured database and the chemical structures in it (exact structures, substructures and similar structures) will be searchable through an intuitive graphical user interface. This approach facilitates finding relevant patents.

However, the patent information by itself is often insufficient because it does not cover the publication of a chemical compound in the scientific literature, or possible synthesis methods, or physicochemical data. Therefore an additional objective of the project described here is the creation of an integrated research platform, which, in addition to the graphical user interface mentioned above, enables the retrieval of chemical structures from other chemically relevant data sources, for example, documents stored in-house in company intranets, as well as in external systems. Substances that have already been published in patents or elsewhere will be retrieved with ChemProspector faster and more comprehensively than with the systems available up to now. ChemProspector can substantially reduce expensive, unnecessary development of new drugs or chemicals, and minimize economic and legal risks.

Last modification: June 10, 2015.

