Some Notes describing the construction and presentation of the table
Harnessing the Cellular Immune System to the Gene Prediction Cart. About the table:
The table shows the results obtained when comparing the NPP sequences to human sequences from all relevant databases. NPPs were extracted from the SYFPEITHI database, and an extensive search was carried out to find them in the public databases. We found that they either matched known genes, demonstrating that these genes are expressed at the protein level. Or, in a few cases, they were found on hypothetical or unpredicted genes, where they serve to prove that these genes exist and are expressed. We reviewed most of the NPPs, and we have summarized what we can learn from each of these NPPs in the last column of the table.This information can be used to update the annotation of the human genome.
The rules we followed in building the table:
The protein definition is according to the SYFPEITHI database of NPPs.
Each peptide is assigned a "hit string" that describes its matches in the following databases (in this order):
-: no hit (hits with more than one mismatch were
not considered).
The top three hits for each database can be viewed via the hyperlinks of the "hit string".
The Top Hit Details column shows representative hits (up to three per database) according to the following rules:
The hits from the first database according to the above order are always shown.
If the best hit for a given database has a mismatch, and a perfect hit was found in a successive database, its hits are presented as well.
If the first hits presented were from the human genome, and hits were found in either nr or nt, they are presented as well, in order to provide definitions.
Model RefSeq hits aren't shown if curated RefSeq hits exist
Some notes regarding the table:
When running BLAST, we only retrieved the top ten hits. However, it should be noted that sometimes there were more than ten hits with up to one mismatch. Therefore, note that for peptides for which three perfect hits are shown, it is likely that there are more.
All peptides were also searched for in the UCSC human genome draft , however these hits are not presented. When a discrepancy arose between the UCSC hits and the NCBI hits, it is referred to in the Information column.
Some peptides overlap, and all are shown.
Note that the top three hits can be from the same sequence (e.g. NPPs APDTRPAP,STLHLVLRL).
We consentrated on peptides with at least one perfect match in one of the 8 databases however there are 68 peptides that were found mismatched only (with no perfect hit in any of the databases). These sequences were not systematically reviewed, and at least some of them might be informative. For example, the peptide NYGGGNYGSGSY.
We observed some discrepancies between the annotated mapping of mRNA in the NCBI database and the mapping of the same mRNA in the UCSC database (for example,RRLRNHMAV ). We didn't look for these discrepancies (with regard to the mapping of the peptides).