UniProtKB/Swiss-Prot protein knowledgebase release 57.4 statistics
1. INTRODUCTION
Release 57.4 of 16-Jun-09 of UniProtKB/Swiss-Prot contains 470369 sequence entries,
comprising 166709888 amino acids abstracted from 180531 references.
1563 sequences have been added since release 57.3, the sequence data of
362 existing entries has been updated and the annotations of
430466 entries have been revised.
Number of fragments: 8407
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 27711
Protein existence (PE): entries %
1: Evidence at protein level 65026 13.8%
2: Evidence at transcript level 65985 14%
3: Inferred from homology 323911 68.9%
4: Predicted 13990 3%
5: Uncertain 1457 0.3%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 11798
The first twenty species represent 105168 sequences: 22.4 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5202
2x: 1714
3x: 889
4x: 567
5x: 414
6x: 319
7x: 238
8x: 195
9x: 174
10x: 102
11- 20x: 556
21- 50x: 357
51-100x: 180
>100x: 891
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20330 Homo sapiens (Human)
2 16140 Mus musculus (Mouse)
3 8338 Arabidopsis thaliana (Mouse-ear cress)
4 7384 Rattus norvegicus (Rat)
5 6552 Saccharomyces cerevisiae (Baker's yeast)
6 5672 Bos taurus (Bovine)
7 4957 Schizosaccharomyces pombe (Fission yeast)
8 4341 Escherichia coli (strain K12)
9 3805 Bacillus subtilis
10 3793 Dictyostelium discoideum (Slime mold)
11 3239 Caenorhabditis elegans
12 3060 Xenopus laevis (African clawed frog)
13 2989 Drosophila melanogaster (Fruit fly)
14 2498 Danio rerio (Zebrafish) (Brachydanio rerio)
15 2210 Pongo abelii (Sumatran orangutan)
16 2196 Oryza sativa subsp. japonica (Rice)
17 2125 Gallus gallus (Chicken)
18 1984 Escherichia coli O157:H7
19 1782 Methanocaldococcus jannaschii (Methanococcus jannaschii)
20 1773 Haemophilus influenzae
21 1744 Salmonella typhimurium
22 1661 Escherichia coli O6
23 1656 Shigella flexneri
24 1466 Mycobacterium tuberculosis
25 1403 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
26 1353 Sus scrofa (Pig)
27 1331 Salmonella typhi
28 1266 Pseudomonas aeruginosa
29 1202 Mycobacterium bovis
30 1151 Macaca fascicularis (Crab eating macaque) (Cynomolgus monkey)
31 1012 Synechocystis sp. (strain PCC 6803)
32 990 Archaeoglobus fulgidus
33 987 Yersinia pestis
34 933 Vibrio cholerae
35 915 Salmonella paratyphi A
36 912 Staphylococcus aureus (strain N315)
37 912 Staphylococcus aureus (strain Mu50 / ATCC 700699)
38 909 Acanthamoeba polyphaga mimivirus (APMV)
39 906 Rhizobium meliloti (Sinorhizobium meliloti)
40 886 Staphylococcus aureus (strain COL)
41 883 Oryctolagus cuniculus (Rabbit)
42 883 Staphylococcus aureus (strain MW2)
43 878 Staphylococcus aureus (strain MSSA476)
44 875 Staphylococcus aureus (strain MRSA252)
45 863 Salmonella choleraesuis
46 861 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
47 852 Yersinia pseudotuberculosis
48 851 Shigella sonnei (strain Ss046)
49 812 Escherichia coli O9:H4 (strain HS)
50 806 Shigella boydii serotype 4 (strain Sb227)
51 803 Escherichia coli O139:H28 (strain E24377A / ETEC)
52 800 Ashbya gossypii (Yeast) (Eremothecium gossypii)
53 799 Escherichia coli (strain UTI89 / UPEC)
54 787 Vibrio parahaemolyticus
55 784 Shigella dysenteriae serotype 1 (strain Sd197)
56 782 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
57 781 Candida albicans (Yeast)
58 773 Kluyveromyces lactis (Yeast) (Candida sphaerica)
59 769 Pasteurella multocida
60 765 Aquifex aeolicus
61 762 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
62 761 Canis familiaris (Dog)
63 756 Neurospora crassa
64 746 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
65 745 Staphylococcus epidermidis (strain ATCC 12228)
66 732 Shigella flexneri serotype 5b (strain 8401)
67 730 Candida glabrata (Yeast) (Torulopsis glabrata)
68 730 Streptomyces coelicolor
69 726 Photorhabdus luminescens subsp. laumondii
70 725 Vibrio vulnificus
71 718 Bacillus halodurans
72 710 Vibrio vulnificus (strain YJ016)
73 706 Bacillus anthracis
74 706 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
75 703 Escherichia coli (strain SMS-3-5 / SECEC)
76 699 Yersinia pestis bv. Antiqua (strain Nepal516)
77 697 Staphylococcus aureus (strain NCTC 8325)
78 692 Yersinia pestis bv. Antiqua (strain Antiqua)
79 691 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
80 688 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
81 687 Mycoplasma pneumoniae
82 686 Escherichia coli (strain DH10B)
83 684 Escherichia coli O1:K1 / APEC
84 682 Pan troglodytes (Chimpanzee)
85 677 Enterobacter sp. (strain 638)
86 675 Pseudomonas syringae pv. tomato
87 673 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
88 672 Anabaena sp. (strain PCC 7120)
89 665 Pseudomonas putida (strain KT2440)
90 654 Mycobacterium leprae
91 652 Staphylococcus aureus (strain USA300)
92 651 Escherichia coli O45:K1 (strain S88 / ExPEC)
93 650 Escherichia coli O8 (strain IAI1)
94 649 Yersinia pestis (strain Pestoides F)
95 648 Escherichia coli (strain SE11)
96 648 Escherichia coli O157:H7 (strain EC4115 / EHEC)
97 647 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
98 647 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
99 645 Escherichia coli O7:K1 (strain IAI39 / ExPEC)
100 642 Zea mays (Maize)
101 641 Escherichia coli
102 641 Bradyrhizobium japonicum
103 640 Salmonella enteritidis PT4 (strain P125109)
104 634 Salmonella heidelberg (strain SL476)
105 633 Salmonella paratyphi A (strain AKU_12601)
106 630 Salmonella newport (strain SL254)
107 629 Staphylococcus aureus (strain bovine RF122 / ET3-1)
108 629 Salmonella schwarzengrund (strain CVM19633)
109 628 Serratia proteamaculans (strain 568)
110 628 Salmonella agona (strain SL483)
111 625 Bacillus cereus (strain ATCC 14579 / DSM 31)
112 623 Salmonella dublin (strain CT_02021853)
113 617 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
114 614 Treponema pallidum
115 613 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
116 612 Shewanella oneidensis
117 607 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
118 606 Salmonella gallinarum (strain 287/91 / NCTC 13346)
119 605 Ralstonia solanacearum (Pseudomonas solanacearum)
120 600 Methanobacterium thermoautotrophicum
121 598 Rhizobium loti (Mesorhizobium loti)
122 596 Staphylococcus haemolyticus (strain JCSC1435)
123 594 Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)
124 591 Staphylococcus saprophyticus subsp. saprophyticus
125 590 Photobacterium profundum (Photobacterium sp. (strain SS9))
126 588 Listeria monocytogenes
127 588 Klebsiella pneumoniae (strain 342)
128 586 Enterobacter sakazakii (strain ATCC BAA-894)
129 584 Emericella nidulans (Aspergillus nidulans)
130 584 Xanthomonas campestris pv. campestris
131 584 Rickettsia prowazekii
132 583 Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
133 583 Helicobacter pylori (Campylobacter pylori)
134 580 Listeria innocua
135 576 Lactococcus lactis subsp. lactis (Streptococcus lactis)
136 576 Yersinia pseudotuberculosis serotype IB (strain PB1/+)
137 576 Yarrowia lipolytica (Candida lipolytica)
138 575 Debaryomyces hansenii (Yeast) (Torulaspora hansenii)
139 575 Neisseria meningitidis serogroup B
140 573 Bacillus cereus (strain ATCC 10987)
141 572 Buchnera aphidicola subsp. Acyrthosiphon pisum
142 567 Brucella melitensis
143 566 Brucella suis
144 564 Helicobacter pylori J99 (Campylobacter pylori J99)
145 562 Buchnera aphidicola subsp. Schizaphis graminum
146 552 Bacillus thuringiensis subsp. konkukian
147 551 Neisseria meningitidis serogroup A
148 548 Xanthomonas axonopodis pv. citri (Citrus canker)
149 544 Bacillus cereus (strain ZK / E33L)
150 544 Pseudomonas syringae pv. syringae (strain B728a)
151 541 Pseudomonas aeruginosa (strain UCBPP-PA14)
152 540 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
153 540 Oceanobacillus iheyensis
154 540 Vibrio fischeri (strain ATCC 700601 / ES114)
155 539 Yersinia pestis bv. Antiqua (strain Angola)
156 539 Caulobacter crescentus (Caulobacter vibrioides)
157 539 Clostridium acetobutylicum
158 533 Pseudomonas fluorescens (strain Pf0-1)
159 531 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
160 524 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
161 522 Listeria monocytogenes serotype 4b (strain F2365)
162 517 Bordetella bronchiseptica (Alcaligenes bronchisepticus)
163 515 Xylella fastidiosa
164 513 Streptococcus pneumoniae
165 507 Buchnera aphidicola subsp. Baizongia pistaciae
166 507 Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
167 506 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
168 505 Thermotoga maritima
169 503 Bordetella parapertussis
170 503 Sodalis glossinidius (strain morsitans)
171 502 Chromobacterium violaceum
172 501 Bordetella pertussis
173 498 Haemophilus ducreyi
174 495 Rickettsia conorii
175 494 Brucella abortus
176 491 Pseudomonas aeruginosa (strain PA7)
177 488 Staphylococcus aureus (strain Newman)
178 488 Deinococcus radiodurans
179 488 Pseudomonas entomophila (strain L48)
180 485 Clostridium perfringens
181 484 Geobacillus kaustophilus
182 483 Mycoplasma genitalium
183 483 Haemophilus influenzae (strain 86-028NP)
184 482 Xanthomonas campestris pv. campestris (strain 8004)
185 482 Bacillus clausii (strain KSM-K16)
186 480 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
187 479 Corynebacterium glutamicum (Brevibacterium flavum)
188 478 Burkholderia pseudomallei (Pseudomonas pseudomallei)
189 478 Shewanella sp. (strain MR-7)
190 477 Mannheimia succiniciproducens (strain MBEL55E)
191 476 Streptomyces avermitilis
192 475 Shewanella sp. (strain MR-4)
193 473 Methanosarcina acetivorans
194 472 Oryza sativa subsp. indica (Rice)
195 469 Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
196 468 Staphylococcus aureus (strain Mu3 / ATCC 700698)
197 467 Brucella abortus (strain 2308)
198 467 Thermosynechococcus elongatus (strain BP-1)
199 462 Bacillus amyloliquefaciens (strain FZB42)
200 462 Aspergillus fumigatus (Sartorya fumigata)
201 461 Pyrococcus horikoshii
202 461 Enterococcus faecalis (Streptococcus faecalis)
203 458 Burkholderia sp. (strain 383) (Burkholderia cepacia
204 458 Pseudomonas putida (strain F1 / ATCC 700007)
205 457 Pyrococcus abyssi
206 456 Burkholderia mallei (Pseudomonas mallei)
207 456 Erwinia tasmaniensis (strain DSM 17950 / Et1/99)
208 455 Acinetobacter sp. (strain ADP1)
209 454 Rhodopseudomonas palustris
210 454 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
211 454 Methanosarcina mazei (Methanosarcina frisia)
212 453 Xanthomonas campestris pv. vesicatoria (strain 85-10)
213 453 Shewanella sp. (strain ANA-3)
214 452 Shewanella frigidimarina (strain NCIMB 400)
215 452 Halobacterium salinarium (Halobacterium halobium)
216 450 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
217 450 Lactobacillus plantarum
218 450 Rickettsia felis (Rickettsia azadi)
219 450 Pseudomonas putida (strain GB-1)
220 448 Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus)
221 445 Ralstonia eutropha (Cupriavidus necator
222 444 Thermoanaerobacter tengcongensis
223 444 Streptococcus mutans
224 442 Ovis aries (Sheep)
225 442 Shewanella baltica (strain OS185)
226 442 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
227 439 Staphylococcus aureus (strain JH1)
228 439 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
229 439 Chlamydia trachomatis
230 438 Pyrococcus furiosus
231 437 Streptococcus pyogenes serotype M6
232 437 Rickettsia bellii (strain RML369-C)
233 436 Methylococcus capsulatus
234 434 Nicotiana tabacum (Common tobacco)
235 434 Hahella chejuensis (strain KCTC 2396)
236 434 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
237 433 Staphylococcus aureus (strain JH9)
238 433 Caenorhabditis briggsae
239 431 Pseudomonas aeruginosa (strain LESB58)
240 431 Campylobacter jejuni
241 430 Pseudomonas mendocina (strain ymp)
242 430 Pseudoalteromonas haloplanktis (strain TAC 125)
243 428 Shewanella baltica (strain OS195)
244 427 Borrelia burgdorferi (Lyme disease spirochete)
245 427 Colwellia psychrerythraea (strain 34H / ATCC BAA-681) (Vibrio psychroerythus)
246 426 Shewanella sp. (strain W3-18-1)
247 426 Aeromonas salmonicida (strain A449)
248 426 Shewanella putrefaciens (strain CN-32 / ATCC BAA-453)
249 425 Proteus mirabilis (strain HI4320)
250 424 Mycobacterium paratuberculosis
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 16278 ( 3%)
Bacteria 286134 ( 61%)
Eukaryota 153806 ( 33%)
Viruses 14151 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20331 ( 13%) ( 4%)
Other Mammalia 44210 ( 29%) ( 9%)
Other Vertebrata 15433 ( 10%) ( 3%)
Viridiplantae 27824 ( 18%) ( 6%)
Fungi 23846 ( 16%) ( 5%)
Insecta 6574 ( 4%) ( 1%)
Nematoda 3932 ( 3%) ( 1%)
Other 11656 ( 8%) ( 2%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 7782 1001-1100 3222
51- 100 35975 1101-1200 2194
101- 150 50830 1201-1300 1721
151- 200 50701 1301-1400 1688
201- 250 49822 1401-1500 1326
251- 300 43243 1501-1600 606
301- 350 42826 1601-1700 480
351- 400 37550 1701-1800 395
401- 450 30612 1801-1900 380
451- 500 24950 1901-2000 313
501- 550 17626 2001-2100 187
551- 600 12695 2101-2200 257
601- 650 10711 2201-2300 265
651- 700 7541 2301-2400 166
701- 750 6355 2401-2500 126
751- 800 4511 >2500 983
801- 850 3840
851- 900 4430
901- 950 3319
951-1000 2334
The average sequence length in UniProtKB/Swiss-Prot is 354 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 1997
4.1 Table of the frequency of journal citations
Journals cited 1x: 646
2x: 280
3x: 131
4x: 107
5x: 76
6x: 60
7x: 36
8x: 45
9x: 34
10x: 23
11- 20x: 156
21- 50x: 157
51-100x: 93
>100x: 153
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 17051 Journal of Biological Chemistry
2 7948 Proceedings of the National Academy of Sciences of the U.S.A.
3 4895 Journal of Bacteriology
4 4458 Gene
5 4340 Biochemical and Biophysical Research Communications
6 4240 Nucleic Acids Research
7 3849 FEBS Letters
8 3662 Biochemistry
9 3607 The EMBO Journal
10 3255 Molecular and Cellular Biology
11 3104 Nature
12 3058 European Journal of Biochemistry
13 2916 Biochimica et Biophysica Acta
14 2881 Journal of Molecular Biology
15 2527 Cell
16 2459 Genomics
17 2097 Biochemical Journal
18 1997 Science
19 1878 Journal of Virology
20 1690 Molecular Microbiology
21 1498 Journal of Cell Biology
22 1461 Plant Molecular Biology
23 1302 Virology
24 1298 Molecular and General Genetics
25 1285 Genes and Development
26 1270 Nature Genetics
27 1252 Human Molecular Genetics
28 1213 Plant Physiology
29 1171 The American Journal of Human Genetics
30 1137 Oncogene
31 1137 Journal of Biochemistry
32 1070 Development
33 993 Human Mutation
34 952 Journal of Immunology
35 947 Molecular Biology of the Cell
36 941 Genetics
37 845 Infection and Immunity
38 841 Structure
39 832 Journal of General Virology
40 797 The Plant Cell
41 792 Archives of Biochemistry and Biophysics
42 747 Yeast
43 743 Blood
44 724 Molecular Cell
45 717 Microbiology
46 674 Developmental Biology
47 668 The Plant Journal
48 661 Journal of Cell Science
49 636 FEMS Microbiology Letters
50 629 Cancer Research
51 585 Human Genetics
52 583 Current Biology
53 576 Nature Structural Biology
54 563 Mechanisms of Development
55 520 Current Genetics
56 507 Acta Crystallographica, Section D
57 501 Applied and Environmental Microbiology
58 501 Journal of Neuroscience
59 498 Protein Science
60 487 Toxicon
61 485 Journal of Clinical Investigation
62 476 Neuron
63 466 Mammalian Genome
64 434 American Journal of Physiology
65 429 Immunogenetics
66 426 The Journal of Experimental Medicine
67 424 Molecular Endocrinology
68 413 Molecular and Biochemical Parasitology
69 391 Journal of Neurochemistry
70 372 Endocrinology
71 371 The Journal of Clinical Endocrinology and Metabolism
72 369 Journal of Molecular Evolution
73 361 DNA and Cell Biology
74 352 DNA Sequence
75 346 Molecular Biology and Evolution
76 339 Bioscience, Biotechnology, and Biochemistry
77 329 Journal of Medical Genetics
78 321 Proteins
79 310 Brain Research. Molecular Brain Research
80 289 Biological Chemistry Hoppe-Seyler
81 273 Cytogenetics and Cell Genetics
82 273 Peptides
83 272 Comparative Biochemistry and Physiology
84 267 Journal of Investigative Dermatology
85 267 Plant and Cell Physiology
86 267 Antimicrobial Agents and Chemotherapy
87 255 Nature Cell Biology
88 253 Molecular Pharmacology
89 253 Experimental Cell Research
90 248 Biology of Reproduction
91 245 Journal of General Microbiology
92 236 Genome Research
93 228 Virus Research
94 227 Neurology
95 223 RNA
96 218 Developmental Dynamics
97 215 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
98 201 DNA Research
99 199 Developmental Cell
100 198 Molecular Plant-Microbe Interactions
101 197 European Journal of Immunology
102 194 Biochimie
103 187 Annals of Neurology
104 187 Planta
105 184 European Journal of Human Genetics
106 183 Tissue Antigens
107 180 Eukaryotic cell
108 178 Genes to Cells
109 175 Journal of Human Genetics
110 170 Immunity
111 166 Molecular and Cellular Endocrinology
112 165 The New England Journal of Medicine
113 164 American Journal of Medical Genetics
114 163 Molecular Phylogenetics and Evolution
115 161 Archives of Microbiology
116 159 DNA
117 153 Hemoglobin
118 152 Insect Biochemistry and Molecular Biology
119 148 Bioorganicheskaia Khimiia
120 147 Investigative Ophthalmology and Visual Science
121 146 Diabetes
122 145 Molecular Reproduction and Development
123 140 Glycobiology
124 139 Molecular Immunology
125 136 Archives of Virology
126 135 Animal Genetics
127 135 EMBO Reports
128 133 General and Comparative Endocrinology
129 130 International Journal of Cancer
130 129 Clinical Genetics
131 128 Nature Structural and Molecular Biology
132 128 The FASEB Journal
133 128 Molecular and Cellular Neuroscience
134 123 Molecular Genetics and Metabolism
135 122 British Journal of Haematology
136 121 The FEBS Journal
137 119 Agricultural and Biological Chemistry
138 117 Molecular Genetics and Genomics
139 116 Journal of Cellular Biochemistry
140 114 Biological Chemistry
141 113 Journal of Protein Chemistry
142 112 Thrombosis and Haemostasis
143 111 Journal of Lipid Research
144 110 American Journal of Medical Genetics. Part A
145 108 Journal of the American Chemical Society
146 107 Journal of Neuroscience Research
147 106 Nature Immunology
148 105 Neuroscience Letters
149 104 Circulation Research
150 104 Journal of Molecular Endocrinology
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 835238 1.78
Journal 663060 355628 1.41 1
Submitted to EMBL/GenBank/DDBJ 159845 148651 0.34 2
Submitted to other databases 10320 9048 0.02 3
Book citation 624 613 <0.01 4
Plant Gene Register 557 545 <0.01 5
Thesis 390 388 <0.01 6
Unpublished observations 290 286 <0.01 7
Patent 146 144 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 275280
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 1959656 4.17
ALLERGEN 455 455 <0.01 26
ALTERNATIVE PRODUCTS 18180 18180 0.04 12
BIOPHYSICOCHEMICAL PROPERTIES 2605 2605 0.01 22
BIOTECHNOLOGY 245 243 <0.01 28
CATALYTIC ACTIVITY 192237 175775 0.41 5
CAUTION 6334 6205 0.01 19
COFACTOR 85129 78245 0.18 7
DEVELOPMENTAL STAGE 8197 8197 0.02 16
DISEASE 4583 3146 0.01 20
DISRUPTION PHENOTYPE 1842 1842 <0.01 23
DOMAIN 27968 24791 0.06 11
ENZYME REGULATION 7175 7175 0.02 18
FUNCTION 343696 329686 0.73 2
INDUCTION 10499 10499 0.02 15
INTERACTION 11587 11587 0.02 14
MASS SPECTROMETRY 4096 3100 0.01 21
MISCELLANEOUS 28503 26255 0.06 10
PATHWAY 110216 100774 0.23 6
PHARMACEUTICAL 81 81 <0.01 29
POLYMORPHISM 740 712 <0.01 24
PTM 32782 26635 0.07 8
RNA EDITING 576 576 <0.01 25
SEQUENCE CAUTION 12048 12048 0.03 13
SIMILARITY 545626 446020 1.16 1
SUBCELLULAR LOCATION 269535 264836 0.57 3
SUBUNIT 194877 194877 0.41 4
TISSUE SPECIFICITY 31398 31398 0.07 9
TOXIC DOSE 400 392 <0.01 27
WEB RESOURCE 8046 6355 0.02 17
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 2897364 6.16
ACT_SITE 115573 68595 0.25 8
BINDING 164635 47568 0.35 4
CA_BIND 3604 1459 0.01 35
CARBOHYD 92024 23768 0.20 12
CHAIN 476839 466069 1.01 1
COILED 17004 11453 0.04 26
COMPBIAS 45373 24295 0.10 18
CONFLICT 113517 39694 0.24 9
CROSSLNK 4317 2838 0.01 34
DISULFID 90166 23711 0.19 13
DNA_BIND 10359 9526 0.02 29
DOMAIN 133849 78185 0.28 6
HELIX 112915 11601 0.24 10
INIT_MET 13171 13171 0.03 27
LIPID 10209 6532 0.02 30
METAL 234870 58055 0.50 3
MOD_RES 141455 50909 0.30 5
MOTIF 29496 19071 0.06 22
MUTAGEN 27311 6556 0.06 24
NON_CONS 1568 629 <0.01 36
NON_STD 345 270 <0.01 38
NON_TER 11421 8665 0.02 28
NP_BIND 88134 59561 0.19 14
PEPTIDE 8053 5008 0.02 32
PROPEP 10037 8411 0.02 31
REGION 78174 43539 0.17 16
REPEAT 83858 12445 0.18 15
SIGNAL 32087 32077 0.07 21
SITE 33081 19390 0.07 20
STRAND 116480 10970 0.25 7
TOPO_DOM 109856 22435 0.23 11
TRANSIT 6070 5984 0.01 33
TRANSMEM 313713 64226 0.67 2
TURN 27868 9326 0.06 23
UNSURE 1053 344 <0.01 37
VAR_SEQ 37918 16179 0.08 19
VARIANT 73817 15813 0.16 17
ZN_FING 27144 11567 0.06 25
Total number of feature keys: 38
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 10074921 21.42
2DBase-Ecoli 84 84 <0.01 104 2D gel databases
Aarhus/Ghent-2DPAGE 126 96 <0.01 101 2D gel databases
AGD 806 800 <0.01 79 Organism-specific databases
ANU-2DPAGE 23 23 <0.01 111 2D gel databases
ArrayExpress 54662 54662 0.12 31 Gene expression databases
Bgee 35635 35582 0.08 36 Gene expression databases
BindingDB 297 297 <0.01 95 Other
BioCyc 159611 151459 0.34 16 Enzyme and pathway databases
BRENDA 65131 62338 0.14 28 Enzyme and pathway databases
BuruList 312 312 <0.01 94 Organism-specific databases
CAZy 5492 4884 0.01 54 Protein family/group databases
CGD 535 533 <0.01 84 Organism-specific databases
CleanEx 30246 29597 0.06 38 Gene expression databases
COMPLUYEAST-2DPAGE 59 59 <0.01 106 2D gel databases
Cornea-2DPAGE 67 67 <0.01 105 2D gel databases
CYGD 6628 6522 0.01 52 Organism-specific databases
dictyBase 3908 3793 0.01 65 Organism-specific databases
DIP 9026 8976 0.02 47 Protein-protein interaction databases
DisProt 397 394 <0.01 88 3D structure databases
DOSAC-COBS-2DPAGE 150 150 <0.01 100 2D gel databases
DrugBank 5317 1626 0.01 55 Other
EchoBASE 4159 4124 0.01 62 Organism-specific databases
ECO2DBASE 351 299 <0.01 92 2D gel databases
EcoGene 4330 4327 0.01 61 Organism-specific databases
EMBL 782022 461034 1.66 3 Sequence databases
Ensembl 68530 67194 0.15 27 Genome annotation databases
euHCVdb 55 44 <0.01 107 Organism-specific databases
FlyBase 4716 4343 0.01 58 Organism-specific databases
Gene3D 210339 174523 0.45 13 Family and domain databases
GeneCards 21175 19892 0.05 39 Organism-specific databases
GeneDB_Spombe 5003 4954 0.01 57 Organism-specific databases
GeneFarm 2571 2550 0.01 71 Organism-specific databases
GeneID 421861 402743 0.90 6 Genome annotation databases
GenomeReviews 322889 304098 0.69 9 Genome annotation databases
GermOnline 41953 41343 0.09 34 Gene expression databases
GlycoSuiteDB 280 280 <0.01 96 PTM databases
GO 1954928 438072 4.16 1 Ontologies
Gramene 4131 4131 0.01 63 Organism-specific databases
H-InvDB 11258 9564 0.02 46 Organism-specific databases
HAMAP 269327 269190 0.57 11 Family and domain databases
HGNC 19432 19262 0.04 41 Organism-specific databases
HOGENOM 208250 208250 0.44 14 Phylogenomic databases
HOVERGEN 76755 76755 0.16 25 Phylogenomic databases
HPA 6106 4958 0.01 53 Organism-specific databases
HSC-2DPAGE 85 85 <0.01 103 2D gel databases
HSSP 84961 84961 0.18 24 3D structure databases
IntAct 20453 20453 0.04 40 Protein-protein interaction databases
InterPro 1259651 443283 2.68 2 Family and domain databases
IPI 86672 62450 0.18 23 Sequence databases
KEGG 394044 372718 0.84 8 Genome annotation databases
LegioList 743 741 <0.01 80 Organism-specific databases
Leproma 657 654 <0.01 83 Organism-specific databases
ListiList 1169 1161 <0.01 77 Organism-specific databases
MaizeGDB 469 464 <0.01 86 Organism-specific databases
MEROPS 8382 8124 0.02 49 Protein family/group databases
MGI 16021 15970 0.03 43 Organism-specific databases
MIM 15699 12391 0.03 45 Organism-specific databases
MypuList 202 202 <0.01 99 Organism-specific databases
NextBio 48401 48399 0.10 33 Other
NMPDR 125956 125926 0.27 17 Genome annotation databases
OGP 378 378 <0.01 90 2D gel databases
OMA 321967 321967 0.68 10 Phylogenomic databases
Orphanet 3443 2030 0.01 68 Organism-specific databases
PANTHER 171548 157826 0.36 15 Family and domain databases
Pathway_Interaction_DB 4569 1666 0.01 60 Enzyme and pathway databases
PDB 60284 14654 0.13 30 3D structure databases
PDBsum 60284 14654 0.13 29 3D structure databases
PeptideAtlas 5167 5167 0.01 56 Proteomic databases
PeroxiBase 668 656 <0.01 82 Protein family/group databases
Pfam 619250 432244 1.32 4 Family and domain databases
PharmGKB 15839 15827 0.03 44 Organism-specific databases
PHCI-2DPAGE 245 245 <0.01 98 2D gel databases
PhosphoSite 19335 19335 0.04 42 PTM databases
PhosSite 267 267 <0.01 97 PTM databases
PhotoList 726 726 <0.01 81 Organism-specific databases
PIR 113996 104145 0.24 21 Sequence databases
PIRSF 70983 70983 0.15 26 Family and domain databases
PMAP-CutDB 1396 1396 <0.01 74 Other
PMMA-2DPAGE 52 52 <0.01 108 2D gel databases
PptaseDB 34 34 <0.01 109 Protein family/group databases
PRIDE 36154 36154 0.08 35 Proteomic databases
PRINTS 121792 104599 0.26 18 Family and domain databases
ProDom 118627 115367 0.25 19 Family and domain databases
ProMEX 434 434 <0.01 87 Proteomic databases
PROSITE 418146 265669 0.89 7 Family and domain databases
PseudoCAP 1205 1196 <0.01 75 Organism-specific databases
Rat-heart-2DPAGE 28 28 <0.01 110 2D gel databases
Reactome 4621 2750 0.01 59 Enzyme and pathway databases
REBASE 354 345 <0.01 91 Protein family/group databases
RefSeq 437641 403015 0.93 5 Sequence databases
REPRODUCTION-2DPAGE 1030 942 <0.01 78 2D gel databases
RGD 7270 7266 0.02 50 Organism-specific databases
SagaList 384 383 <0.01 89 Organism-specific databases
SGD 6640 6537 0.01 51 Organism-specific databases
Siena-2DPAGE 102 102 <0.01 102 2D gel databases
SMART 115729 89183 0.25 20 Family and domain databases
SMR 51251 51251 0.11 32 3D structure databases
SubtiList 3742 3740 0.01 66 Organism-specific databases
SWISS-2DPAGE 1182 1182 <0.01 76 2D gel databases
TAIR 8421 8307 0.02 48 Organism-specific databases
TCDB 3107 3072 0.01 70 Protein family/group databases
TIGR 33270 32519 0.07 37 Genome annotation databases
TIGRFAMs 248186 231929 0.53 12 Family and domain databases
TubercuList 1494 1458 <0.01 73 Organism-specific databases
UniGene 86835 79839 0.18 22 Sequence databases
VectorBase 349 338 <0.01 93 Genome annotation databases
World-2DPAGE 503 503 <0.01 85 2D gel databases
WormBase 3728 3643 0.01 67 Organism-specific databases
WormPep 3965 3230 0.01 64 Organism-specific databases
Xenbase 3372 3302 0.01 69 Organism-specific databases
ZFIN 2430 2414 0.01 72 Organism-specific databases
Total number of cross-referenced databases: 111
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.22 Gln (Q) 3.95 Leu (L) 9.67 Ser (S) 6.56
Arg (R) 5.52 Glu (E) 6.75 Lys (K) 5.86 Thr (T) 5.33
Asn (N) 4.06 Gly (G) 7.06 Met (M) 2.42 Trp (W) 1.08
Asp (D) 5.43 His (H) 2.27 Phe (F) 3.87 Tyr (Y) 2.92
Cys (C) 1.38 Ile (I) 5.97 Pro (P) 4.72 Val (V) 6.85
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4442 entries are encoded on a mitochondrion, and 3503 are encoded on a plasmid.
12079 entries are encoded on a plastid,
of which 21 are encoded on apicoplasts,
11522 on chloroplasts,
43 on organellar chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 66353