NOESIS


Noesis' Technology

Outline

Noesis develops data analysis software that extracts knowledge from chemical and biological data produced during the drug discovery process, communicates that knowledge in a format easily understood by human experts and exports that knowledge in a standard form so that it can be easily stored and shared within an interactive knowledge management system. In a subsequent step the knowledge, newly exctracted and/or retrieved from a knowledge-base, is exploited to improve the design of downstream drug discovery experiments. The purpose of the systems developed is to enable human experts to increase their understanding of the mechanism of binding of ligand-target complexes and support them during the process of lead discovery and optimization, specifically in deciding which compounds to select, synthesize and test given -if available- numerous biological and chemical endpoints.

More analytically our aims are:

  1. Design and implement algorithms for chemical data organization. All organization methods used employ substructure-mining techniques and scaffold-based reasoning which guarantees the generation of interpretable results in line with chemist's thinking and expectations
  2. Design and implement algorithms for molecular library design. The produced library of compounds is representative of the pharmacologically interesting space indicated by the results of a screening experiment. The library may also be designed so that it optimizes additional constraints of the compounds contained. The constraints may be as simple as molecular weight or more complex like measured (or calculated) toxicity or even compound docking score. The compounds may be selected from larger collections of real or virtual compounds, or, recommended by the software i.e. combinatorial library generation.
  3. Design and implement algorithms for ligand design. The algorithms perform virtual synthesis of compounds with increased chances of exhibiting favorable biological behavior in later experimental rounds.
  4. Design and implement a knowledge management system where knowledge may be stored permanently in a consistent format, queried and retrieved for use on an as needed basis. The system is provided with elements of knowledge produced by our in-house analyses in place but can also be further populated or modified by expert users.

The core information technology is developed by combining knowledge extraction and optimization (mono and multi-objective) algorithms with traditional statistical methods and analytical techniques. A database facility that can store valuable knowledge, such as user defined chemical and biological constraints and/or information about successful drugs to related pharmaceutical targets, is in place to assist the compound selection and virtual synthesis of compounds. Our methodology produces screening collections with compounds (real or virtual) of improved pharmacological (lead-like) profile and thus, increases the chance of resulting hits surviving further preclinical development.


Key Technology Features

Multi-Objective Optimization

Multi-objective optimization (MOOP) methods introduce a new approach for optimization that is founded on compromises and trade-offs among the various objectives. The aim of MOOP methods is to discover a set of satisfactory compromises and, through them, the global optimal solution by optimizing numerous dependent properties simultaneously. The major benefit of MOOP methods is that local optima corresponding to one objective can be avoided by consideration of all the objectives simultaneously, thereby escaping single objective dead-ends and leading to a more efficient overall process. Compared with single-objective optimization approaches where each of a series of objectives is optimized sequentially, MOOP is proven to produce a more representative set of equivalent solutions faster and thereby allowing users to make informed decisions related to all objectives under consideration.

Multi-objective Optimization

Manipulating Molecular Graphs

Molecules are in reality 3D flexible structures assuming one of various possible conformations based on environmental conditions. Besides size and shape, the surface of a molecule is characterized by a range of electrostatic properties which are the result of the type of the underlying atoms and bonds of the molecule. Commonly, molecules are represented as labeled, undirected 2D or 3D graphs where atoms correspond to vertices, and chemical bonds correspond to edges. In this context, drug discovery can be viewed as a search for a graph with the appropriate size and shape, and the right chemical features at the correct positions. The constraints imposed on the specific shape and features of a drug candidate graph are defined by the morphology of the receptor (pharmaceutical target) targeted by the drug.

Often, computational tools manipulating molecular data resort to the generation of a descriptor vector for each molecule using some predefined (e.g. MACCS) or adaptively calculated (e.g. atom-pairs), set of molecular desriptors. This method, although simple and widely popular suffers from the loss of topological information about the actual shape of the molecular graph and the topological relation of the calculated descriptors among them. An alternative method is to use graphs to represent and manipulate molecules. The latter method operates directly on molecular graphs without simplifications that tend to loose information about the structure represented. However, graph-based operations incur a higher computational cost and thus software employing such techniques needs to be designed carefully with optimal performance requirements in mind.

Noesis' uses graph representations of molecules throughout its software. Of prime importance is the application of "Chemical Substructure Mining". Chemical Substructure Mining amounts to processing numerous undirected, labeled graphs and discovering common subgraphs of substantial size. The method takes as input a set of molecules and detects substructures frequently occurring in large subsets of the ligands. The identified substructures have multiple uses including organizing large chemical datasets and scaffold-based clustering, subgraph-based molecular similarity assesment, pharmacophore and privileged substructures identification (combined with appropriate profiling of the substructures) and scaffold-based modeling and molecular library design. Noesis software does not use any predefined list of chemical fragments in its search. Rather, our methods adaptively learn the substructures from the compounds in the set using a novel method that enables the processing of large chemical datasets while provides robust performance.

Scaffold-based Reasoning

Scaffold-based Reasoning involves exploiting identified substructures and related accessible knowledge to focus chemical space search to regions satisfying user requirements. The process assumes that substructure mining and scaffold identification has taken place. For Noesis, scaffold-based reasoning consists of (1) the systematic extraction of knowledge related to scaffolds (2) the approriate representation and storage of knowledge (3) the retrieval of all scaffold-related, potentially useful knowledge in the KM system and its utilization to prepare a solution meeting all problem objectives. Among the benefits of the approach are:

  • the exploration of the chemical space defined by the scaffolds and the knowledge about them to guide the search and retrieve solutions faster
  • the establishment and use of an integrated knowledge management system related to scaffolds, and
  • the generation of interpretable results in line with chemists' thinking

At Noesis we use a scaffold-based reasoning methodology for molecular library design as well as for ligand design via proprietary algorithms able to search for solutions satisfying multiple objectives concurrently. For more information please contact info (at) noesisinformatics.com

 
 
Untitled Document
2007-2010 © Noesis ChemoInformatics