In silico surfaceome paper was recommended by F1000
The article "The in silico human surfaceome" has recently been published in PNAS and now has been recommended by the F1000 member Alejandro Schaffer to be of special significance in its field.
Alejandro Schaffer's recommendation:
Predicting which proteins are expressed on the cell surface is important for drug development because a disproportionate number of drugs targeting single proteins actually target surface proteins, such as G-protein-coupled receptors (GPCRs). The authors used the machine learning technique random forests to predict which human proteins are expressed at the cell surface, from among a set of over 5000 plausible candidates (mostly proteins likely to have transmembrane helices). With high condence, as measured by a score called SURFY, they assigned 2886 proteins as surface proteins, which comprised 2756 predicted transmembrane proteins and 130 GPI-anchored proteins. Another 528 proteins merit further investigation because they had intermediate SURFY scores and could not be condently assigned to either the ‘on surface’ class or to the ‘not on surface’ class. Testing with a high-condence set of true positives and true negatives showed that the new SURFY method performs much better than a previous method published in the same journal {1}. Two positive controls suggesting that the set of assigned proteins may be reasonably complete are that a) almost all GPCRs were assigned as ‘on surface’ and b) almost all proteins assigned a CD code were predicted as ‘on surface’. The most important feature in proteins with high SURFY scores was the presence of a larger number of possible N-glycosylation sites, which are dened by the three-position motif N-X-S/T (where X means any amino acid and S/T means serine or threonine). Of interest to cancer researchers, this paper includes secondary analyses showing which ones among the 2886 predicted on-surface proteins are expressed in each of 610 cancer cell lines. The proportion of surface proteins expressed in 1/610 cell lines is high (23%) and the proportion of surface proteins expressed in almost all cell lines is low (10%). These ndings and additional analyses in the paper suggest that the expression of cell surface proteins is a useful tool with which to distinguish cancer types and subtypes.