Composition Profiler

Home | Run Profiler | Examples | Help

Composition Profiler is a tool for semi-automatic discovery of enrichment or depletion of amino acids, either individually or grouped by their physico-chemical or structural properties, such as aromaticity, charge, polarity, hydrophobicity, flexibility, surface exposure, solvation potential, interface propensity, normalized frequency of occurrence in α helices, β structures, and coils, linker and disorder propensities, size and bulkiness.

The program takes two samples of amino acids as input: a query sample and a reference sample. The latter provides a suitable background amino acid distribution, and should be chosen according to the nature of the query sample, for example, a standard protein database (e.g. SwissProt, PDB Select 25, DisProt, or surface residues of monomeric proteins), a representative sample of proteins from the organism under study, or a group of proteins with a contrasting functional annotation. The results of the analysis of amino acid composition differences are summarized in textual and graphical form.

The graphical output is a bar chart composed of twenty data points (one for each amino acid), where bar heights indicate enrichment or depletion. The output is designed to assist the discovery of statistically significant composition anomalies by color-coding and sorting residues according to their physico-chemical properties. For example, if the property being tested is flexibility, the tool will group rigid amino acids on the left hand side of the plot and flexible amino acids on the right hand side of the plot.

As an exploratory data mining tool, our software can be used to guide feature selection for protein function or structure predictors. For classes of proteins with significant differences in frequencies of amino acids having particular physico-chemical (e.g. hydrophobicity or charge) or structural (e.g. alpha helix propensity) properties, Composition Profiler can be used as a rough, light-weight visual classifier.

Acknowledgements

Composition Profiler software was developed by Vladimir Vacic and Stefano Lonardi (University of California, Riverside), and Vladimir N. Uversky and A. Keith Dunker (Indiana University School of Medicine, Indianapolis).

In citing Composition Profiler, please refer to:

Vacic V, Uversky VN, Dunker AK, Lonardi S. (2007) "Composition Profiler: A tool for discovery and visualization of amino acid composition differences." BMC Bioinformatics. 8:211.

Composition Profiler was rewritten from scratch in python, using numpy for sampling and statistical computation, and flask for the web front end. This greatly simplified the codebase and made installation a breeze.

(The original version was written in ruby with some routines written in C and using numerical approximation functions from the Stephen L. Moshier's Cephes Math Library).

Also, computers became much faster since 2007, making bootstrap and permutation sampling an efficient way to approximate non-anaytical distributions (and more accurate to boot).

What a difference 18 years make.

Source Code

Composition Profiler source code is available on GitHub https://github.com/vvaccicc/cprofiler.