ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by kr flag YPRC Korea Mirror sites: Australia  Brazil  Canada  China  Switzerland
Search for

                    SWISS-PROT RELEASE 28.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 28.0  of SWISS-PROT  contains 36000 sequence entries, comprising
   12'496'420 amino acids abstracted from 33903 references. This represents
   an increase of 10.1% over release 27. The recent growth of the data bank
   is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427
   25.0       04/93              29955            10 214 020
   26.0       07/93              31808            10 875 091
   27.0       10/93              33329            11 484 420
   28.0       02/94              36000            12 496 420


   1.2  Source of data

   Release 28.0  has been  updated using protein sequence data from release
   38.0 of  the PIR (Protein Identification Resource) protein data bank, as
   well as translation of nucleotide sequence data from release 37.0 of the
   EMBL Nucleotide Sequence Database.

   As an  indication to  the source  of the sequence data in the SWISS-PROT
   data bank we list here the statistics concerning the DR (Database cross-
   references) pointer lines:

   Entries with pointer(s) to only PIR entri(es):            4624
   Entries with pointer(s) to only EMBL entri(es):           5593
   Entries with pointer(s) to both EMBL and PIR entri(es):  25048
   Entries with no pointers lines:                            735


<PAGE>


      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 27


   2.1  Sequences and annotations

   About 2700 sequences have been added since release 27, the sequence data
   of 379  existing entries  has been  updated and  the annotations of 5700
   entries have been revised.

   In particular  we have  started a  process  to  'clean-up'  the  various
   representations of domains in the feature lines (especially the usage of
   the feature  keys "DOMAIN",  "REPEAT", "DNA_BIND",  and "SITE"). We also
   have undertook  an  overall  revision  of  the  CC  topics  "SUBCELLULAR
   LOCATION", "SUBUNIT  and "CAUTION".  Most of  the work  has already been
   carried out for this release and we plan to finish this major annotation
   revamping for the next release.

   2.2  What's happening with the model organisms

   As we  announced in  the last  release we  have  selected  a  number  of
   organisms that  are the  target  of  genome  sequencing  and/or  mapping
   projects and for which we intend to:

   -  Be as  complete as  possible. All sequences available at a given time
      should be  immediately included  in SWISS-PROT.  This  also  includes
      sequence corrections and updates.
   -  Provide a high level of annotations.
   -  Cross-references to specialized database(s) that contain, among other
      data, some  genetic information  about the  genes that code for these
      proteins.
   -  Provide specific indices or documents.

   Thanks to  a collaborative  effort with Douglas Smith and Bill Loomis of
   UCSD we  have added  a fifth  organism, Dictyostelium  discoideum (slime
   mold), to  our list of model organisms. Many new sequences were added at
   this  release  and  a  new  document  file  (DICTY.TXT)  lists  all  the
   D.discoideum sequence entries in SWISS-PROT and their corresponding gene
   names.

   At this release we also have started our collaboration with the group at
   the Sanger  Genome Center  in Hinxton  (UK) and  we have  added 516  new
   C.elegans sequences;  most of  which are  translation of sequencing data
   from the genome project. A new document file (CELEGANS.TXT) list all the
   C.elegans sequence  entries in  SWISS-PROT and  their corresponding gene
   names and, when appropriate, their cosmid-derived names.

   Here is the current status of the five model organisms:

   Organism         Database                Index file        Number of
                    cross-referenced                          sequences
   --------------   ----------------------  --------------    ---------
   C.elegans        WormPep                 CELEGANS.TXT            672
   D.discoideum     DictyDB                 DICTY.TXT               183
   D.melanogaster   FlyBase                 In preparation          600
   E.coli           EcoGene                 ECOLI.TXT              2555
   S.cerevisiae     LISTA (in preparation)  YEAST.TXT              1731


<PAGE>


   2.3  The Expasy World-Wide Web server

        2.3.1  Background information

   The World-Wide Web (WWW), which originated at CERN, is a powerful global
   information  system   merging  networked   information   retrieval   and
   hypertext. It  gives access, using hypertext links, to the documents and
   information contained  in all the existing WWW servers around the world,
   as well  as to  the data  obtainable through other information retrieval
   systems like WAIS, Gopher, X500, etc. To access a WWW server, one has to
   run on a local computer a client program (a WWW browser), which displays
   hypertext documents.  The user  can then either request a keyword search
   or jump  to another  document by following a hypertext link. WWW has the
   outstanding advantage  of extending  the hypertext  model to  the  whole
   world (by allowing hypertext jumps to documents anywhere on the internet
   network) and  by being  device and  user-interface independent (browsers
   exist for  a variety  of computers  and user-interfaces,  including Unix
   workstations  running  XWindows,  MacIntoshes  and  PCs  with  Microsoft
   Windows).

   The ExPASy  WWW server  allows access, using the user-friendly hypertext
   model, to  the SWISS-PROT  and SWISS-2DPAGE  databases and,  through any
   SWISS-PROT protein  sequence entry,  to other  databases such  as  EMBL,
   PROSITE, REBASE,  FlyBase, PDB,  OMIM and Medline. Using a browser which
   is able  to display  images one  can also  remotely access 2D gels image
   data from SWISS-2DPAGE.

   A WWW  server can  be accessed  on  the  internet  through  its  Uniform
   Resource Locator  (URL), the addressing system defined by the WWW model.
   The URL for the ExPASy WWW server is:

                           http://expasy.hcuge.ch/
   or
                            http://129.195.254.61/

   To access a WWW server, you need to run a browser (or client) program on
   your local computer. Browsers exist for a variety of machines and may be
   obtained by  anonymous ftp. ExPASy can be used with any WWW browser, but
   we recommend  NCSA Mosaic.  It is  a very  flexible and powerful browser
   with  a  graphical  user  interface;  available  for  Unix  boxes  using
   X11/Motif; for  Apple McIntoshes  and for Microsoft Windows. You can get
   it from the FTP site: ftp.ncsa.uiuc.edu.

   To access  all the  data available  from SWISS-2DPAGE,  the user's local
   computer needs  to run  an image  viewing program.  For most browsers on
   Unix workstations  the default  program is  xv, a shareware application.
   Similar Windows  or Apple  shareware or  public domain  applications are
   also available.

   For more  information on  the  ExPASy  WWW  server,  you  can  read  the
   following article:

      Appel R.D., Sanchez J.-C., Bairoch A., Golaz O., Miu M., Pasquali C.,
      Reynaldo Vargas J., Hughes G.J., Hochstrasser D.F.
      Electrophoresis 14:1232-1238(1993).



<PAGE>


   Or you can contact Dr. Ron Appel:

      Email: appel@cih.hcuge.ch
      Fax: +41-22-372 61 98

        2.3.2  Changes to the WWW ExPASy server

   There has been quite a number of changes to the server in the last three
   months. We want to list specifically the following enhancements:

   -  It is  now possible to retrieve the Medline abstract of any reference
      in SWISS-PROT.
   -  Full text searches of SWISS-PROT have been implemented.
   -  The data  available on the server includes the latest full release of
      SWISS-PROT as well as the cumulative weekly updates.
   -  Most SWISS-PROT  documents such  as the  new indices  for  the  model
      organisms, are available as hypertext documents.
   -  The SWISS-2DPAGE  part of  the server  has been greatly enhanced with
      new functionalities.

   2.4  Changes in the DR line

   We have  added cross-references  to the  Dictyostelium discoideum genome
   database (DictyDB)  (see section  2.2  of  these  notes).  These  cross-
   references are present in the DR lines:

   Data bank identifier:  DICTYDB
   Primary identifier:    Unique identifier  attributed by  DictyDB to  the
                          gene coding for the protein.
   Secondary identifier:  The gene  designation (name).  A "-"  is  present
                          when no gene name has yet been assigned.
   Example:               DR   DICTYDB; DD01047; MYOA.


   2.5  Weekly updates of SWISS-PROT

   Since release 24, we provide weekly updates of SWISS-PROT.

   [Note: due  to the fact that we were in the process of 'cleaning up' the
   annotations of  many entries  (see section  2.1), we temporarily stopped
   providing weekly updates from December 1993 to February 1994.]

   The weekly  updates are  available by  anonymous FTP.  Three  files  are
   updated at each update:

   new_seq.dat    Contains all the new entries since the last full release.
   upd_seq.dat    Contains the entries for which the sequence data has been
                  updated since the last release.
   upd_ann.dat    Contains the  entries for  which one  or more  annotation
                  fields have been updated since the last release.

   Currently these  files are  available on  the  following  anonymous  ftp
   servers:





<PAGE>


   Organism       ExPASy (Geneva University Expert Protein Analysis System)
   Address        expasy.hcuge.ch  (or 129.195.254.61)
   Directory      /databases/swiss-prot/updates

   Organism       National Center for Biotechnology Information (NCBI)
   Address        ncbi.nlm.nih.gov (or 130.14.20.1)
   Directory      /repository/swiss-prot/updates

   Organism       EMBL ftp server
   Address        ftp.embl-heidelberg.de (or 192.54.41.33)
   Directory      /pub/databases/swissprot/new

   !! Important notes !!!

   Although we  try to  follow a  regular schedule,  we do  not promise  to
   update these  files every  week. In some cases two weeks will elapse in-
   between two updates.

   Due to  the current  mechanism used  to build a release the entries that
   are provided in these updates are not guaranteed to be error free. Also,
   for the  same reason,  new  entries  do  not  contain  an  OC  (Organism
   Classification) line.



                            3. ENZYME AND PROSITE

   3.1  The ENZYME data bank

   Release 15.0  of the  ENZYME data bank is distributed with release 28 of
   SWISS-PROT. ENZYME  release 15.0  contains information  relative to 3489
   enzymes.


   3.2  The PROSITE data bank

        3.2.1  Release 11.1

   Release 11.1  of the PROSITE data bank is distributed with release 28 of
   SWISS-PROT.  Release  11.1  contains  715  documentation  chapters  that
   describes 926 different patterns. Release 11.1 does not really represent
   a new  release; the  only changes  between releases  11.0 and  11.1  are
   updating of  the pointers to the SWISS-PROT entries whose name have been
   modified between  releases 27 and 28. The next release of PROSITE (12.0)
   will be distributed with release 29 of SWISS-PROT.

        3.2.2  Future developments

   Starting with  the next major releases (12.0 of June 1994), PROSITE will
   be extended  to include  weight matrices (also known as profiles). There
   are a  number of  protein families  as well  as functional or structural
   domains that  cannot be  detected using  patterns due  to their  extreme
   sequence divergence.  Typical examples  of important  functional domains
   which are  weakly conserved  are the immunoglobulin domains, the SH2 and




<PAGE>



   SH3 domains,  or the  fibronectin type III domain. In such domains there
   are only  a few sequence positions which are well conserved. Any attempt
   of building  a consensus  pattern for  such regions  will either fail to
   pick up  a significant  proportion of the protein sequences that contain
   such region  (false negative)  or will pick up too many proteins that do
   not contain  the region  (false positive). The use of technique based on
   weight matrices  or profiles  allows the  detection of  such proteins or
   domains. Dr.  Philipp  Bucher  at  ISREC  in  Lausanne  and  myself  are
   collaborating to  include such  methods into PROSITE. This collaboration
   also includes  other participants such as Roland Luethy (AMGEN), Michael
   Gribskov (SDSC)  and Steve  Altschul (NCBI).  If you  are interested  in
   participating in this project please contact Philipp Bucher at:

                          pbucher@isrec-sun1.unil.ch

   Important notice  for software  developers: the  integration of profiles
   into PROSITE  will not  "break" the current format. The profiles entries
   in the  PROFILE.DAT file  will be  tagged with the token "MATRIX" on the
   "ID" line  (currently, only  "PATTERN" and "RULE" are used as tokens); a
   new line-type "MA" will be used in these entries to store all the weight
   matrices specific  parameters. The  format of  the PROSITE.DOC file will
   not be changed.

   The full  description of  the format  of the  PROSITE profile extensions
   will be  available in  a couple  of weeks as a user's manual (file name:
   PROFILE.TXT) that will be posted on the ExPASy and NCBI file servers.

   Organism       ExPASy (Geneva University Expert Protein Analysis System)
   Address        expasy.hcuge.ch  (or 129.195.254.61)
   Directory      /databases/prosite

   Organism       National Center for Biotechnology Information (NCBI)
   Address        ncbi.nlm.nih.gov (or 130.14.20.1)
   Directory      /repository/prosite


   Here is an example of a PROSITE profile:


ID   SH3; MATRIX.
AC   PS90001;
DT   JUN-1994 (CREATED); JUN-1994 (DATA UPDATE); JUN-1994 (INFO UPDATE).
DE   SH3 domain profile. 
CC   /TAXO-RANGE=??E??; /MAX-REPEAT=2;
MA   /GENERAL_SPEC: ALPHABET='ACDEFGHIKLMNPQRSTVWY';
MA   /DISJOINT: DEFINITION=PROTECT; N1=1; N2=53.
MA   /NORMALIZATION: MODE=1; FUNCTION=GRIBSKOV; R1=2.97; R2=-0.0035;
MA      R3=0.7386; R4=-1.001; R5=0.208; TEXT='ZScore';
MA   /NORMALIZATION: MODE=2; FUNCTION=LINEAR; R1=0.0; R2=100.0;
MA      TEXT='OrigScore';
MA   /CUT_OFF: LEVEL=0; SCORE=600; N_SCORE=7.0; MODE=1;
MA   /DEFAULT: MI=-26; I=-3; IM=0; MD=-26; D=-3; DM=0;





<PAGE>


MA   /M: SY='F';M=-2,-3,-3,-4,2,-3,-2,1,-2,0,-1,-2,-3,-3,-4,-2,-1,0,-5,2;
MA   /M: SY='I';M=-1,-5,-2,-3,-2,-3,0,1,1,-1,1,-1,-2,-1,1,-1,0,1,-4,-4;
MA   /M: SY='A';M=2,-3,1,0,-5,2,-2,-1,-1,-3,-2,1,1,0,-2,2,2,0,-8,-5;
MA   /M: SY='L';M=-3,-8,-5,-4,2,-6,-2,2,-4,6,4,-3,-3,-2,-3,-3,-2,1,-3,0;
MA   /M: SY='Y';M=-4,-2,-6,-6,9,-7,0,-1,-5,-1,-3,-3,-6,-5,-6,-4,-4,-4,-1,11;
MA   /M: SY='D';M=1,-6,3,3,-7,0,0,-2,-1,-4,-3,2,0,1,-2,0,0,-2,-9,-6;
MA   /M: SY='Y';M=-5,-3,-6,-6,10,-7,-1,-1,-2,-1,-2,-3,-6,-5,-5,-4,-4,-4,-1,11;
MA   /M: SY='K';M=-1,-6,1,1,-4,-2,0,-2,2,-3,-1,1,-1,1,1,0,0,-3,-7,-6;
MA   /M: SY='A';M=1,-4,1,0,-5,1,-1,-1,0,-3,-1,1,0,0,0,1,1,-1,-7,-6;
MA   /M: SY='R';M=0,-5,0,0,-5,-1,0,-1,1,-3,-1,1,0,1,1,0,0,-2,-5,-5;
MA   /M: SY='R';M=0,-5,1,1,-6,0,1,-2,1,-4,-2,1,0,1,2,1,0,-2,-5,-5;
MA   /M: SY='E';M=1,-6,2,2,-6,0,0,-2,-1,-4,-2,1,1,1,-1,0,0,-3,-8,-6;
MA   /M: SY='D';M=0,-6,2,2,-6,0,1,-3,0,-5,-3,2,-1,2,-1,0,0,-4,-7,-4;
MA   /M: SY='D';M=0,-8,4,3,-6,0,0,-2,-1,-3,-2,2,-2,2,-2,0,-1,-3,-9,-6;
MA   /M: SY='L';M=-2,-8,-5,-5,2,-5,-3,3,-4,7,5,-4,-3,-3,-4,-3,-2,3,-4,-2;
MA   /M: SY='S';M=1,-4,1,1,-5,1,0,-2,1,-4,-2,1,0,0,0,1,1,-2,-6,-5;
MA   /M: SY='F';M=-3,-7,-6,-6,6,-5,-3,3,-2,5,3,-4,-5,-4,-5,-4,-3,1,-3,3;
MA   /M: SY='Q';M=-1,-6,0,0,-3,-2,1,-1,1,-2,0,0,-1,1,1,-1,0,-1,-6,-4;
MA   /M: SY='K';M=-1,-8,0,1,-3,-2,0,-2,3,-3,0,1,0,2,2,0,0,-3,-6,-6;
MA   /M: SY='G';M=2,-5,1,0,-7,7,-3,-4,-2,-6,-4,1,-1,-2,-4,2,0,-2,-10,-8;
MA   /M: SY='D';M=1,-7,5,4,-8,1,1,-3,0,-5,-3,2,-1,2,-2,0,0,-4,-10,-6;
MA   /M: SY='I';M=0,-5,-1,-2,-2,-2,-1,2,0,0,1,-1,-2,0,0,-1,0,1,-6,-5;
MA   /M: SY='L';M=-2,-6,-5,-5,3,-5,-3,4,-3,6,4,-4,-4,-3,-4,-3,-2,3,-5,0;
MA   /M: SY='Q';M=-1,-5,-1,-1,-3,-2,0,0,0,-2,-1,0,-1,0,0,-1,0,-1,-6,-3;
MA   /M: SY='V';M=0,-4,-3,-4,-1,-3,-3,5,-3,3,3,-2,-2,-2,-3,-2,0,5,-8,-4;
MA   /M: SY='L';M=-1,-6,-3,-3,-1,-3,-2,2,-3,3,2,-2,-2,-2,-3,-2,-1,2,-5,-3;
MA   /M: SY='D';M=0,-6,3,3,-6,0,1,-3,2,-5,-2,2,-1,2,1,0,0,-4,-7,-5;
MA   /M: SY='K';M=-1,-6,0,0,-2,-1,0,-3,3,-4,-1,1,-1,0,1,0,0,-3,-6,-4;
MA   /M: SY='N';M=1,-4,1,1,-5,0,0,-2,0,-3,-2,1,1,0,-1,1,1,-1,-7,-5;
MA      /I: MI=0; I=-1; MD=0; /M SY='X'; M=0; D=-1;
MA   /M: SY='G';M=1,-5,0,0,-5,1,-2,-1,-2,-3,-2,0,0,-1,-2,0,0,-1,-8,-6;
MA   /M: SY='G';M=1,-6,3,3,-7,3,0,-4,-1,-5,-4,2,-1,1,-2,1,0,-3,-10,-6;
MA   /M: SY='W';M=-9,-12,-9,-11,1,-11,-4,-8,-5,-3,-6,-6,-8,-7,3,-4,-8,-9,26,0;
MA   /M: SY='W';M=-7,-9,-9,-9,0,-9,-4,-5,-5,-1,-4,-6,-7,-6,2,-3,-6,-6,18,-1;
MA   /M: SY='K';M=-1,-7,0,0,-3,-2,0,-2,2,-3,-1,1,-1,1,2,0,-1,-3,-5,-5;
MA   /M: SY='G';M=2,-3,0,-1,-6,3,-3,-2,-3,-4,-3,0,0,-2,-3,1,0,0,-10,-6;
MA   /M: SY='Q';M=-2,-6,0,0,-3,-3,1,-2,0,-2,-1,0,-2,1,1,-1,-1,-3,-5,-3;
MA      /I: MI=0; I=-2; MD=0; /M SY='X'; M=0; D=-2;
MA   /M: SY='T';M=0,-4,-1,-1,-4,0,-2,0,-1,-2,0,0,-1,-1,-1,0,1,0,-7,-5;
MA   /M: SY='T';M=0,-5,0,0,-3,-1,-1,-1,1,-3,-1,1,-1,0,0,1,1,-1,-6,-4;
MA   /M: SY='G';M=0,-5,0,-1,-5,3,-2,-3,-1,-5,-3,0,-1,-1,-1,1,0,-2,-7,-6;
MA   /M: SY='K';M=0,-6,1,1,-5,-1,1,-2,2,-4,-1,1,-1,2,2,0,0,-3,-6,-6;
MA   /M: SY='R';M=-1,-6,-1,-1,-5,-3,1,-1,1,-3,-1,0,-1,1,3,-1,-1,-2,-2,-6;
MA   /M: SY='G';M=1,-5,0,0,-6,6,-3,-3,-3,-5,-4,0,-1,-2,-4,1,0,-2,-10,-6;
MA   /M: SY='W';M=-5,-5,-5,-5,2,-6,-2,-2,-4,-1,-3,-3,-6,-5,-3,-3,-4,-4,4,3;
MA   /M: SY='F';M=-3,-5,-6,-6,6,-5,-3,4,-1,3,2,-4,-4,-5,-4,-3,-2,2,-4,3;
MA   /M: SY='P';M=2,-4,-1,-1,-7,-1,0,-3,-2,-4,-3,-1,8,0,0,1,0,-2,-8,-7;
MA   /M: SY='G';M=1,-3,0,0,-4,2,-1,-2,0,-3,-2,0,0,-1,-1,1,1,-1,-6,-5;
MA   /M: SY='N';M=1,-5,2,1,-5,0,1,-2,1,-4,-2,2,0,0,0,1,1,-2,-7,-4;
MA   /M: SY='Y';M=-5,-1,-7,-7,10,-8,-1,-1,-5,-1,-3,-3,-7,-6,-6,-4,-4,-5,0,13;
MA   /M: SY='V';M=0,-3,-3,-5,-2,-2,-3,5,-3,2,2,-2,-2,-3,-4,-1,0,5,-8,-5;
MA   /M: SY='E';M=1,-6,2,3,-6,0,0,-2,1,-4,-2,1,0,2,0,0,0,-3,-8,-6;
MA   /M: SY='P';M=0,-5,-1,-1,-2,-2,-1,-2,-1,-3,-2,0,1,-1,-2,0,-1,-2,-6,-3;
//




<PAGE>


                             WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about  annotations to be updated, if, for example, the function
   of a protein has been clarified or if new post-translational information
   has become available.


















































<PAGE>


                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.62   Gln (Q) 4.02   Leu (L) 9.20   Ser (S) 7.13
   Arg (R) 5.23   Glu (E) 6.26   Lys (K) 5.83   Thr (T) 5.82
   Asn (N) 4.47   Gly (G) 6.98   Met (M) 2.36   Trp (W) 1.29
   Asp (D) 5.28   His (H) 2.25   Phe (F) 4.01   Tyr (Y) 3.22
   Cys (C) 1.77   Ile (I) 5.59   Pro (P) 5.01   Val (V) 6.52

   Asx (B) 0.005  Glx (Z) 0.005  Xaa (X) 0.02


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Gln,
   Phe, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 4283

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 1939
                            2x:  703
                            3x:  398
                            4x:  250
                            5x:  179
                            6x:  160
                            7x:   94
                            8x:   68
                            9x:   87
                           10x:   47
                       11- 20x:  170
                       21- 50x:  111
                       51-100x:   36
                         >100x:   41














<PAGE>




        A.2.2  Table of the most represented species


    Number   Frequency          Species
         1        2663          Human
         2        2555          Escherichia coli
         3        1731          Baker's yeast (Saccharomyces cerevisiae)
         4        1578          Mouse
         5        1453          Rat
         6         679          Bovine
         7         672          Caenorhabditis elegans
         8         600          Fruit fly (Drosophila melanogaster)
         9         529          Bacillus subtilis
        10         515          Chicken
        11         380          African clawed frog (Xenopus laevis)
        12         367          Rabbit
        13         352          Salmonella typhimurium
        14         327          Pig
        15         251          Vaccinia virus (strain Copenhagen)
        16         236          Maize
        17         211          Arabidopsis thaliana (Mouse-ear cress)
        18         200          Bacteriophage T4
        19         193          Human cytomegalovirus (strain AD169)
        20         183          Slime mold (Dictyostelium discoideum)
                   183          Vaccinia virus (strain WR)
        22         182          Rice
        23         176          Pseudomonas aeruginosa
        24         170          Tobacco
        25         169          Pea
        26         165          Wheat
        27         164          Fission yeast (Schizosaccharomyces pombe)
        28         149          Barley
        29         146          Variola virus
        30         138          Dog
        31         136          Soybean
        32         131          Staphylococcus aureus
                   131          Sheep
        34         129          Spinach
        35         122          Neurospora crassa
        36         120          Marchantia polymorpha (Liverwort)
        37         118          Rhodobacter capsulatus
        38         116          Pseudomonas putida
        39         112          Klebsiella pneumoniae
        40         108          Agrobacterium tumefaciens












<PAGE>




   A.3  Repartition of the sequences by size



               From   To  Number             From   To   Number
                  1-  50    2081             1001-1100      342
                 51- 100    3595             1101-1200      227
                101- 150    4999             1201-1300      176
                151- 200    3459             1301-1400      103
                201- 250    3057             1401-1500      102
                251- 300    2681             1501-1600       51
                301- 350    2516             1601-1700       49
                351- 400    2572             1701-1800       40
                401- 450    1943             1801-1900       43
                451- 500    2065             1901-2000       31
                501- 550    1413             2001-2100       17
                551- 600    1014             2101-2200       43
                601- 650     714             2201-2300       50
                651- 700     533             2301-2400       18
                701- 750     514             2401-2500       22
                751- 800     393             >2500          114
                801- 850     305
                851- 900     319
                901- 950     199
                951-1000     200




   Currently the ten longest sequences are:


                            HTS1_COCCA  5217 a.a.
                             FAT_DROME  5147 a.a.
                            RYNR_RABIT  5037 a.a.
                            RYNR_HUMAN  5032 a.a.
                            RYNC_RABIT  4969 a.a.
                            DYHC_DICDI  4725 a.a.
                            APB_HUMAN   4563 a.a.
                            APOA_HUMAN  4548 a.a.
                            RRPA_CVMJH  4488 a.a.
                            DYHC_TRIGR  4466 a.a.














<PAGE>


                         APPENDIX B: ON-LINE EXPERTS



   B.1  List of on-line experts for PROSITE and SWISS-PROT


Field of expertise            Name               Email address
---------------------------   ------------------ ----------------------------
Alcohol dehydrogenases        Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt.persson@embl-heidelberg.de
Aldehyde dehydrogenases       Joernvall H.       hans.jornvall@k1m.ki.se
                              Persson B.         bengt.persson@embl-heidelberg.de
Alpha-crystallins/HSP-20      Leunissen J.A.M.   jackl@caos.caos.kun.nl
                              de Jong W.         u629000@hnykun11.bitnet
Alpha-2-macroglobulins        Van Leuven F.      fred@blekul13.bitnet
AA-tRNA synthetases class II  Leberman R.        leberman@frembl51.bitnet
Apolipoproteins               Boguski M.S.       boguski@ncbi.nlm.nih.gov
AraC family HTH proteins      Ramos J.L.         jlramos@cnbvx3.cnb.uam.es
Arrestins                     Kolakowski L.F.Jr. lfk@receptor.mgh.harvard.edu
Asparaginase / glutaminase    Gribskov M.        gribskov@sdsc.edu
ATP synthase c subunit        Recipon H.         recipon@ncbi.nlm.nih.gov
Band 4.1 family proteins      Rees J.            jrees@vax.oxford.ac.uk
Beta-lactamases               Brannigan J.       jab5@vaxa.york.ac.uk
Beta-transducin family        Boguski M.S.       boguski@ncbi.nlm.nih.gov
C-type lectin domain          Drickamer K.       drick@cuhhca.hhmi.columbia.edu
Chalcone/stilbene synthases   Schroeder J.       raf@sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60       Georgopoulos C.    georgopo@cmu.unige.ch
Chaperonins TCP1 family       Willison K.R.      willison@icr.ac.uk
Chitinases                    Henrissat B.       bernie@cermav.grenet.fr
Clusterin                     Peitsch M.C.       mcp13936@ggr.co.uk
Cold shock domain             Landsman D.        landsman@ncbi.nlm.nih.gov
CTF/NF-I                      Mermod N.          nmermod@ulys.unil.ch
                              Gronostajski R.    gronosr@ccsmtp.ccf.org
Cytochromes P450              Holsztynska E.J.   ela@netcom.uucp
                                                 netcom!ela@apple.com
DEAD-box helicases            Linder P.          linder@urz.unibas.ch
Deoxyribonuclease I           Peitsch M.C.       mcp13936@ggr.co.uk
dnaJ family                   Kelley W.          kelley@cmu.unige.ch
EF-hand calcium-binding       Cox J.A.           cox@sc2a.unige.ch
                              Kretsinger R.H.    rhk5i@virginia.bitnet
Elongation factor 1           Amons R.           wmbamons@rulgl.leidenuniv.nl
Enoyl-CoA hydratase           Hofmann K.O.       khofmann@biomed.biolan.uni-koeln.de
Fatty acid desaturases        Piffanelli P.      piffanelli@jii.afrc.ac.uk
fruR/lacI family HTH proteins Reizer J.          jreizer@ucsd.edu
GATA-type zinc-fingers        Boguski M.S.       boguski@ncbi.nlm.nih.gov
GDT/GTP dissociation stimul.  Boguski M.S.       boguski@ncbi.nlm.nih.gov
GltP family of transporters   Hofmann K.O.       khofmann@biomed.biolan.uni-koeln.de
Glucanases                    Henrissat B.       bernie@cermav.grenet.fr
                              Beguin P.          phycel@pasteur.bitnet
Glutamine synthetase          Tateno Y.          ytateno@genes.nig.ac.jp
G-protein coupled receptors   Chollet A.         arc3029@ggr.co.uk
                              Attwood T.K.       bph6tka@biovax.leeds.ac.uk
                              Kolakowski L.F.Jr. lfk@receptor.mgh.harvard.edu




<PAGE>


GTPase-activating proteins    Boguski M.S.       boguski@ncbi.nlm.nih.gov
HIT family                    Seraphin B.        seraphin@embl-heidelberg.de
HMG1/2 and HMG-14/17          Landsman D.        landsman@ncbi.nlm.nih.gov
Inorganic pyrophosphatases    Kolakowski L.F.Jr. lfk@receptor.mgh.harvard.edu
Integrases                    Roy P.H.           2020000@saphir.ulaval.ca
Kringle domain                Ikeo K.            kikeo@genes.nig.ac.jp
Lipocalins                    Boguski M.S.       boguski@ncbi.nlm.nih.gov
                              Peitsch M.C.       mcp13936@ggr.co.uk
lysR family HTH proteins      Henikoff S.        henikoff@sparky.fhcrc.org
MAC components / perforin     Peitsch M.C.       mcp13936@ggr.co.uk
Malic enzymes                 Glynias M.         mglynias@ncsa.uiuc.edu
MAM domain                    Bork P.            bork@embl-heidelberg.de
MIP family proteins           Reizer J.          jreizer@ucsd.edu
Myelin proteolipid protein    Hofmann K.O.       khofmann@biomed.biolan.uni-koeln.de
Pancreatic trypsin inhibitor  Ikeo K.            kikeo@genes.nig.ac.jp
PEP requiring enzymes         Reizer J.          jreizer@ucsd.edu
pfkB carbohydrate kinases     Reizer J.          jreizer@ucsd.edu
Phosphomannose isomerases     Proudfoot A.E.I.   aep6830@ggr.co.uk
Phytochromes                  Partis M.D.        partis@afrc.ac.uk
Plant viruses icosahedral     Koonin E.V.        koonin@ncbi.nlm.nih.gov
capsid proteins
Protein kinases               Quinn A.M.         quinn@biomed.med.yale.edu
                              Hunter T.          hunter@salk-sc2.sdsc.edu
PTS proteins                  Reizer J.          jreizer@ucsd.edu
Restriction-modification      Bickle T.          bickle@urz.unibas.ch
            enzymes           Roberts R.J.       roberts@neb.com
Ribosomal protein S15         Ellis S.R.         srelli01@ulkyvm.bitnet
Ring-cleavage dioxygenases    Harayama S.        sharayam@ddbj.nig.ac.jp
Signal sequence peptidases    von Heijne G.      gvh@csb.ki.se
                              Dalbey R.E.        rdalbey@magnus.acs.ohio-state.edu
Sodium symporters             Reizer J.          jreizer@ucsd.edu
Subtilases                    Brannigan J.       jab5@vaxa.york.ac.uk
                              Siezen R.J.        nizo@caos.caos.kun.nl
Thiol proteases               Turk B.            turk@ijs.ac.mail.yu
Thiol proteases inhibitors    Turk B.            turk@ijs.ac.mail.yu
TNF family                    Jongeneel C.V.     vjongene@isrecmail.unil.ch
TPR repeats                   Boguski M.S.       boguski@ncbi.nlm.nih.gov
Transit peptides              von Heijne G.      gvh@csb.ki.se
Type-II membrane antigens     Levy S.            levy@cellbio.stanford.edu
Uracil-DNA glycosylase        Aasland R.         aasland@embl-heidelberg.de
Vitamin K-depend. Gla domain  Price P.A.         pprice@ucsd.edu
XPGC protein                  Clarkson S.G.      clarkson@cmu.unige.ch
Xylose isomerase              Jenkins J.         jenkins@frira.afrc.ac.uk
WAP-type domain               Claverie J.-M.     jmc@ncbi.nlm.nih.gov
ZP domain                     Bork P.            bork@embl-heidelberg.de

African swine fever virus     Yanez R.J.         ryanez@cbm2.uam.es
Bacteriophage P4              Halling C.         chh9@midway.uchicago.edu
Caenorhabditis elegans	      Sonnhammer E.      esr@mrc-molecular.biology.cam.ac.uk
Chloroplast encoded proteins  Hallick R.B.       hallick@arizona.edu
Dictyostelium discoideum      Smith D.W.         dsmith@ucsd.edu
Drosophila                    Ashburner M.       ma11@phx.cam.ac.uk
Escherichia coli              Rudd K.            rudd@ncbi.nlm.nih.gov
Salmonella typhimurium        Rudd K.            rudd@ncbi.nlm.nih.gov
Snakes                        Stocklin R.        stocklin@cmu.unige.ch



<PAGE>




   B.2  Requirements to fulfill to become an on-line expert

   An expert  should be  a scientist  working with  specific famili(es)  of
   proteins (or specific domains) and who would:

   a) Review the  protein sequences in SWISS-PROT and the patterns/matrices
      in PROSITE relevant to their field of research.
   b) Agree to  be contacted  by people  that have obtained new sequence(s)
      which seem to belong to "their" familie(s) of proteins.
   c) Have access  to electronic  mail and be willing to use it to send and
      receive data.

   If you are willing to be part of this scheme please contact Amos Bairoch
   at the following electronic mail address:

                             bairoch@cmu.unige.ch








































<PAGE>




           APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:


                                                       **********************
                        ***********************        * EPD [Euk. Promot.] *
                        *  EMBL Nucleotide    * <----> **********************
                        *  Sequence Data      *
******************      *  Library            *        **********************
* FLYBASE        * <--> *********************** <----- * ECD [E. coli map]  *
* [Drosophila    *                ^      ^             **********************
* genomic d.b.]  * <--------+     |      |
******************          |     |      +------------ **********************
                            |     |                    * TFD [Trans. fact.] *
                            |     |      +-----------> **********************
******************          |     |      |
* WormPep        *          |     |      |             **********************
* [C.elegans]    * <----+   |     |      |    +------> * DictyDB [D.disco.] *
******************      |   |     |      |    |        **********************
                        |   |     |      |    |
******************      |   v     v      v    v        **********************
* REBASE         *      ***********************        * ENZYME [Nomencl.]  *
* [Restriction   * <--- *  SWISS-PROT         * <----- **********************
*  enzymes]      *      *  Protein Sequence   *            |
******************      *  Data Bank          *            v
                        ***********************        **********************
******************       ^  ^  |  |  ^   ^  |          * OMIM   [Diseases]  *
* EcoGene/EcoSeq *       |  |  |  |  |   |  +--------> **********************
* [E. coli]      * <-----+  |  |  |  |   |
******************          |  |  |  |   +-----------> **********************
                            |  |  |  |                 * ECO2DBASE     [2D] *
                            |  |  |  |                 **********************
******************          |  |  |  |
* PROSITE        * <--------+  |  |  +---------------> **********************
* [Patterns]     *             |  |                    * SWISS-2DPAGE  [2D] *
******************             |  +---------------+    **********************
             |                 v                  |
             |          ***********************   |    **********************
             +--------> * PDB [3D structures] *   +--> * Aarhus/Ghent  [2D] *
                        ***********************        **********************














<PAGE>

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by kr flag YPRC Korea Mirror sites: Australia  Brazil  Canada  China  Switzerland