Proteochemometrics - a new bioinformatics
approach to the study of molecular recognition
Jarl ES Wikberg
Pharmaceutical Biosciences
Uppsala University and Genetta Soft Aktiebolag,
Sweden
Over the last few years we have developed a new
technology termed proteochemometrics. It
constitutes a collection of bioinformatics
methods that are applied over groups of proteins
and which can provide models that give deep
insight into molecular recognition processes.
Proteochemometrics has found use in mapping
molecular recognition processes of drug receptors
and in the recognition processes and kinetics of
enzymes. It has also been used for structure and
physico-chemical property based a priori drug
design and for a priori protein engineering.
Proteochemometrics does not require information
of the 3D structure of the proteins (although 3D
information is quite useful when available).
Instead, it focuses on the sampling of the
biological activity of collections of related
target proteins and using physico-chemical
property characterizations of the targets derived
from their primary amino acid sequences, and the
structural and/or physicochemical properties of
interacting entities. This is formulated in
mathematical derivations termed “interaction
spaces”. Such interaction spaces can be easily
sampled using standard techniques of molecular
biology.
Due to the fact that simple standard protocols
for data acquisition can be used,
proteochemometrics has the potential to be
readily applied over large groups of proteins and
even entire proteomes. We have assigned the term
“large scale proteochemometrics” to the projects
applying the technology over broad classes of
proteins, the ultimate goal being to provide
functional interaction maps over the entire
proteomes over the species. A key factor in such
a project is the application of so called
experimental design, where the sampling of the
interaction space is performed in an optimal
fashion limiting the studied entities to a small
and practically manageable fraction, while
maintaining the predictions for the remaining
entities over as large volume as possible of the
interaction space of biological interest.
With the aid of proteochemometrics we can e.g.
provide detailed maps (i.e., down to the
physicochemical property and amino acid level) of
the ligand binding sites of a drug receptor.
Using proteo-chemometrics we can also design and
simultaneously predict the biological activities
of new organic compounds (e.g., drug candidates)
over many different targets. We can also predict
the functional properties of a protein, something
that finds use in functional genomics and protein
engineering. For instance, we used proteo-
chemometrics for the prediction of the functions
of orphan G-protein coupled receptors; the number
of correct predictions amounting to more than 97
% compared to the properties determined
experimentally long time after the predictions
had been done.
The lecture will give a technical background of
the proteochemometric technology, and give
example of its application and detail the key
steps required to apply it on a broad scale.
7.05.2004