ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by kr flag YPRC Korea Mirror sites: Australia  Brazil  Canada  China  Switzerland
Search for

                    SWISS-PROT RELEASE 32.0 RELEASE NOTES


                               1. INTRODUCTION

   1.1  Evolution

   Release 32.0  of SWISS-PROT contains 49'340 sequence entries, comprising
   17'385'503  amino   acids  abstracted   from  43'056   references.  This
   represents an  increase of  11.3% over  release 31. The recent growth of
   the data bank is summarized below.

   Release    Date   Number of entries     Nb of amino acids

   2.0        09/86               3939               900 163
   3.0        11/86               4160               969 641
   4.0        04/87               4387             1 036 010
   5.0        09/87               5205             1 327 683
   6.0        01/88               6102             1 653 982
   7.0        04/88               6821             1 885 771
   8.0        08/88               7724             2 224 465
   9.0        11/88               8702             2 498 140
   10.0       03/89              10008             2 952 613
   11.0       07/89              10856             3 265 966
   12.0       10/89              12305             3 797 482
   13.0       01/90              13837             4 347 336
   14.0       04/90              15409             4 914 264
   15.0       08/90              16941             5 486 399
   16.0       11/90              18364             5 986 949
   17.0       02/91              20024             6 524 504
   18.0       05/91              20772             6 792 034
   19.0       08/91              21795             7 173 785
   20.0       11/91              22654             7 500 130
   21.0       03/92              23742             7 866 596
   22.0       05/92              25044             8 375 696
   23.0       08/92              26706             9 011 391
   24.0       12/92              28154             9 545 427
   25.0       04/93              29955            10 214 020
   26.0       07/93              31808            10 875 091
   27.0       10/93              33329            11 484 420
   28.0       02/94              36000            12 496 420
   29.0       06/94              38303            13 464 008
   30.0       10/94              40292            14 147 368
   31.0       02/95              43470            15 335 248
   32.0       11/95              49340            17 385 503


      2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 31

   2.1  Sequences and annotations

   5'959 sequences  have been  added since release 31, the sequence data of
   921 existing  entries has  been updated  and the  annotations of  10'691
   entries have been revised.





<PAGE>



   Major annotations and sequences updates have been made in preparation of
   the changes that will take place in release 33 (see section 3.1 of these
   notes).



   2.2  What's happening with the model organisms

   We have  selected a  number of  organisms that  are the target of genome
   sequencing and/or mapping projects and for which we intend to:

   -  Be as  complete as  possible. All sequences available at a given time
      should be  immediately included  in SWISS-PROT.  This  also  includes
      sequence corrections and updates;
   -  Provide a higher level of annotation;
   -  Provide cross-references  to specialized  database(s)  that  contain,
      among other  data, some genetic information about the genes that code
      for these proteins;
   -  Provide specific indices or documents.

   What was  done since  the last  release or  in preparation  for the next
   release concerning model organisms:

   -  We have added two species to the list of model organisms:

      Haemophilus influenzae. Haemophilus influenzae is the first bacterial
      genome to be completely sequenced. Its 1'830 Kb sequence was recently
      (Science 269:496-512(1995))  determined by  a team from the Institute
      of Genomic  Research (TIGR).  This  bacterial  genome  codes  for  an
      estimated 1'740  protein sequences. We  have  already  annotated  and
      incorporated about  85% of  this data  into SWISS-PROT.  What is left
      will be made available in the following weeks.

      Candida albicans. We have added Candida albicans to the list of model
      organisms because  of the  extensive work  being done by Stew Scherer
      and colleagues at the Department of Microbiology of the University of
      Minnesota to  organize data  from that fungal organism. Their data is
      available from a WWW server:

                    http://alces.med.umn.edu/Candida.html
      
      We currently have in SWISS-PROT all the publicly available C.albicans
      protein sequences.

   -  We have  started a  major effort  in catching  up with the backlog of
      sequences from  Arabidopsis thaliana.  About 150  entries  have  been
      added since release 31. This effort will be continued and expanded in
      the next months.

   -  We have  added in SWISS-PROT, all the sequences from yeast chromosome
      VI. All  the data  from yeast chromosome X is also in preparation and
      will be  available in  a few  days with  the first  weekly update  of
      SWISS-PROT. Yeast  sequence entries  are now cross-referenced to both




<PAGE>



      LISTA and  SGD (see  section 2.3).  We plan to work on chromosome XII
      and XIII entries very soon.

   -  We are  regularly adding  data coming  from the  S.pombe chromosome I
      sequencing project.  About  180  S.pombe  entries  were  added  since
      release 31.

   -  Although we  added 234 entries from C.elegans, we have not yet caught
      up with  the backlog  of  sequence  data  produced  from  the  genome
      sequencing project of that organism. We hope to be able to clear up a
      significant part of that backlog for release 33.

   -  We are  almost up  to date  concerning Bacillus subtilis (306 entries
      added),  Escherichia   coli  (317   entries  added)   and  Salmonella
      typhimurium (35 entries added).

   -  A big  effort needs  to be  done to  take care  of human (214 entries
      added) and Drosophila (57 entries added) sequences.

   -  We plan  to add Mycoplasma genitalium (the second bacterial genome to
      be completely sequenced) as a model organism in release 33.

   Here is the current status of the model organisms:

   Organism         Database               Index file       Number of
                    cross-referenced                        sequences
   --------------   ---------------------  --------------   ---------
   A.thaliana       None yet               In preparation         432
   B.subtilis       SubtiList              SUBTILIS.TXT          1389
   C.albicans       None yet               CALBICAN.TXT           100
   C.elegans        WormPep                CELEGANS.TXT           924
   D.discoideum     DictyDB                DICTY.TXT              213
   D.melanogaster   FlyBase                In preparation         768
   E.coli           EcoGene                ECOLI.TXT             3468
   H.influenzae     None yet               HAEINFLU.TXT          1575
   H.sapiens        MIM                    MIMTOSP.TXT           3281
   S.cerevisiae     LISTA/SGD              YEAST.TXT             3391
   S.typhimurium    StyGene                SALTY.TXT              603
   S.pombe          None yet               POMBE.TXT              460
   S.solfataricus   None yet               None yet                61


   2.3  Changes in the DR line and other news about cross-references

   We have  added cross-references  from SWISS-PROT  to  the  Saccharomyces
   Genome Database  (SGD) (previously  known as SacchDB) prepared under the
   supervisation of  Michael Cherry  at the  Stanford University  School of
   Medicine. These cross-references are present in the DR lines:

   Data bank identifier: SGD
   Primary identifier:   Unique identifier attributed by  SGD to  the  gene
                         coding for the protein
   Secondary identifier: The gene designation (name)
   Example:              DR   SGD; L0000008; AAR2.



<PAGE>


   We started  very recently  to receive  directly from  PDB pre-release of
   protein 3D-structure entries. Thanks to this new development, we will be
   able to keep the cross-references between SWISS-PROT and PDB up to date.
   Currently there  are 920 SWISS-PROT entries that are cross-referenced to
   PDB, but  we need  to catch up with a small backlog corresponding to the
   significant increase  in the  number of  PDB entries  in  the  last  six
   months. We  plan to be in synchronization with PDB starting with release
   33.

   There are  currently 174'439  DR lines in SWISS-PROT, an average of 3.53
   cross-references per entry.


   2.4  Replacement of RM line by RX line

   In this  release, the RM (Reference Medline) line has been replaced by a
   more 'generic'  line called  RX (Reference cross-references). The format
   of that line is:

   RX   BIBLIOGRAPHIC_DATABASE_NAME; IDENTIFIER.

   As of  this release, the only "bibliographic_database_name" that is used
   is "MEDLINE"  and the associated "identifier" is the eight digit Medline
   Unique  Identifier   (UID).  But   it  is   'rumored'  that   additional
   bibliographic databases  are interested  to be  linked to  the  sequence
   databases.

   Example:

   RM   91002678

   has been changed to:

   RX   MEDLINE; 91002678.

   There are currently 64'668 Medline cross-references (RX) in SWISS-PROT.



   2.5  Status of the documentation files

   SWISS-PROT is  distributed with  a large  number of documentation files.
   Some of  these files  have been  available for  a long  time  (the  user
   manual, release  notes, the  various  indices  for  authors,  citations,
   keywords, etc.),  but  many  have  been  created  recently  and  we  are
   continuously adding  new files.  Since release  31, we  have added 8 new
   document files.  The following  table list  all the  documents that  are
   either currently  available or  that we  plan to  add in  the  next  few
   months.

   USERMAN .TXT   User manual
   RELNOTES.TXT   Release notes
   SHORTDES.TXT   Short description of entries in SWISS-PROT





<PAGE>



   JOURLIST.TXT   List of abbreviations for journals cited
   KEYWLIST.TXT   List of keywords in use
   SPECLIST.TXT   List of organism identification codes
   EXPERTS .TXT   List of on-line experts for PROSITE and SWISS-PROT
   SUBMIT  .TXT   Submission of sequence data to the SWISS-PROT data bank [1]

   ACINDEX .TXT   Accession number index
   AUTINDEX.TXT   Author index
   CITINDEX.TXT   Citation index
   KEYINDEX.TXT   Keyword index
   SPEINDEX.TXT   Species index

   7TMRLIST.TXT   List of 7-transmembrane G-linked receptors entries
   AATRNASY.TXT   List of aminoacyl-tRNA synthetases [1]
   ALLERGEN.TXT   Nomenclature and index of allergen sequences [1]
   CALBICAN.TXT   Index of Candida albicans entries and their corresponding
                  gene designations [1]
   CDLIST  .TXT   CD nomenclature for surface proteins of human leucocytes
   CELEGANS.TXT   Index of Caenorhabditis elegans entries and their
                  corresponding gene designations and WormPep cross-
                  references
   DICTY   .TXT   Index of Dictyostelium discoideum entries and their
                  corresponding gene designations and DictyDB cross-
                  references
   EC2DTOSP.TXT   Index of Escherichia coli Gene-protein database entries
                  referenced in SWISS-PROT
   ECOLI   .TXT   Index of Escherichia coli K12 chromosomal entries and
                  their corresponding EcoGene cross-reference
   EMBLTOSP.TXT   Index of EMBL Database entries referenced in SWISS-PROT
   EXTRADOM.TXT   Nomenclature of extracellular domains
   GLYCOSYL.TXT   Index of glycosyl hydrolases classified by families on the
                  basis of sequence similarities [2]
   HAEINFLU.TXT   Index of Haemophilus influenzae RD chromosomal entries [1]
   HOXLIST .TXT   Vertebrate homeotic Hox proteins: nomenclature and index
   HUMCHR21.TXT   Index of protein sequence entries encoded on human
                  chromosome 21
   HUMCHR22.TXT   Index of protein sequence entries encoded on human
                  chromosome 22 [1]
   HUMCHRY .TXT   Index of protein sequence entries encoded on human
                  chromosome Y
   MIMTOSP .TXT   Index of MIM entries referenced in SWISS-PROT
   NOMLIST .TXT   List of nomenclature related references for proteins
   PDBTOSP .TXT   Index of Brookhaven PDB entries referenced in SWISS-PROT
   PEPTIDAS.TXT   Classification of peptidase families and index of peptidases
                  entries [1]
   PLASTID .TXT   List of chloroplast and cyanelle encoded proteins
   POMBE   .TXT   Index of Schizosaccharomyces pombe entries in SWISS-PROT
                  and their corresponding gene designations
   RESTRIC .TXT   List of restriction enzymes and methylases entries
   RIBOSOMP.TXT   Index of ribosomal proteins classified by families on the
                  basis of sequence similarities [2]






<PAGE>



   SALTY   .TXT   Index of Salmonella typhimurium  LT2 chromosomal entries
                  and their corresponding StyGene cross-references
   SUBTILIS.TXT   Index of Bacillus subtilis 168 chromosomal entries and
                  their corresponding SubtiList cross-references
   YEAST   .TXT   Index of Saccharomyces cerevisiae entries and their
                  corresponding gene designations [3]
   YEAST1  .TXT   Yeast Chromosome I entries
   YEAST2  .TXT   Yeast Chromosome II entries
   YEAST3  .TXT   Yeast Chromosome III entries
   YEAST5  .TXT   Yeast Chromosome V entries
   YEAST6  .TXT   Yeast Chromosome VI entries [1]
   YEAST8  .TXT   Yeast Chromosome VIII entries
   YEAST9  .TXT   Yeast Chromosome IX entries
   YEAST10 .TXT   Yeast Chromosome X entries [2]
   YEAST11 .TXT   Yeast Chromosome XI entries

   Notes:

   [1]  New in release 32.
   [2]  Will be available starting with release 33 in February 1996.
   [3]  The format of that file was changed to add cross-references to SGD.


   We also  started to  include in  SWISS-PROT document  files  listing  of
   World-Wide Web  (sites) relevant to the subject under consideration. For
   example, in the "POMBE.TXT" file, you will find the following lines:

   More  information   on  Schizosaccharomyces,  its  genome,  biology  and
   genetics, is available from the following WWW pages:

   NIH : http://www.nih.gov/sigs/yeast/fission.html
   Salk: http://flosun.salk.edu/users/forsburg/lab.html
   UCL : http://t-chappell.mcbl.ucl.ac.uk/


   2.6  The Expasy World-Wide Web server

        2.6.1  Background information

   The most  efficient and  user-friendly way  to browse  interactively  in
   SWISS-PROT, PROSITE, ENZYME, SWISS-2DPAGE and other databases. is to use
   the World-Wide  Web (WWW)  molecular biology  server ExPASy.  WWW  is  a
   global information  retrieval system  merging the  power  of  world-wide
   networks, hypertext  and multimedia.  Through hypertext  links, it gives
   access to  documents and  information available  on thousands of servers
   around the  world. To  access a  WWW server  one needs  a  WWW  browser.
   Popular  browsers   available  for   most  computer   platforms  include
   Mosaic(TM),  developed   at  the   National  Center  for  Supercomputing
   Applications (NCSA)  of the  University of Illinois at Champaign (it may
   be obtained  by  anonymous  ftp  from  ftp.ncsa.uiuc.edu)  and  Netscape
   Navigator(TM)  from   Netscape  Communications   Corp.  (available  from
   ftp.netscape.com). Using  a WWW  browser, one  has  access  to  all  the
   hypertext documents  stored on  the ExPASy  server as well as many other
   WWW servers.



<PAGE>



   The ExPASy server was made available to the public in September 1993. On
   November 1995  a cumulative total of 3 million connections was attained.
   It may  be accessed  through its  Uniform Resource  Locator (URL  -  the
   addressing system defined in WWW), which is:

        http://expasy.hcuge.ch/

   The ExPASy  WWW server  allows access, using the user-friendly hypertext
   model, to  the SWISS-PROT,  PROSITE,  ENZYME,  SWISS-2DPAGE  and  SWISS-
   3DIMAGE databases and, through any SWISS-PROT protein sequence entry, to
   other databases  such as  EMBL, EcoCyc,  FlyBase, GCRDb, LISTA, MaizeDB,
   SubtiList, OMIM, PDB, HSSP, ProDom, REBASE, SGD, YEPD and Medline. Using
   a browser  which is  able to display images one can also remotely access
   2D gels  image data from SWISS-2DPAGE. ExPAsy also offers many tools for
   the analysis of protein seqiuences and 2D gels.

   For more  information on  the  ExPASy  WWW  server,  you  can  read  the
   following article:

      Appel R.D., Bairoch A., Hochstrasser D.F.
      A new  generation of  information retrieval tools for biologists: the
      example of the ExPASy WWW server.
      Trends Biochem. Sci. 19:258-260(1994).

   Or you can contact Dr. Ron Appel:

      Email: ron.appel@dim.hcuge.ch
      Fax: +41-22-372 61 98


        2.6.2  SWISS-SHOP

   Thanks to the work of Manuel Peitsch from the Geneva Glaxo Institute for
   Molecular Biology,  we can  provide, on ExPASy, a  service called SWISS-
   SHOP. SWISS-Shop  allows  any  users  of  SWISS-PROT  to  indicate  what
   proteins he/she  is interested  in.  This  can  be  done  using  various
   criteria that can be combined:

   -  By entering  one  or  more  words  that  should  be  present  in  the
      description line;
   -  By entering one or more species name(s) or taxonomic division(s);
   -  By entering one or more keywords;
   -  By entering one or more author names;
   -  By entering the accession number (or entry name) of a PROSITE pattern
      or a user-defined sequence pattern;
   -  By entering  the accession  number (or  entry name)  of  an  existing
      SWISS-PROT entry or by entering a "private" sequence.

   Every week,  the new  sequences entered  in SWISS-PROT are automatically
   compared with all the criteria that have been defined by the users. If a
   sequence corresponds  to the  selection criteria defined by a user, that
   sequence is sent by electronic mail.





<PAGE>


        2.6.3  What is new on ExPASy

   Since  the   last  release,  there  has  been  a  large  number  of  new
   developments on the ExPASy WWW server. Here are some highlights of these
   changes:

   -  A new option has been introduced that allows to search in SWISS-PROT,
      PROSITE and  SeqAnalRef by  citation. When  you call this option, you
      are prompted  to enter  the name of a journal and optionally a volume
      number and/or  a year.  The program is written in such a way that you
      can enter  either the  full name  of a  journal or  its official  (as
      listed in  the  JOURLIST.TXT  file)  abbreviation  (with  or  without
      periods). It  is also able to recognize special abbreviations such as
      JBC, NAR, PNAS, etc. So, for example, you can either enter:

      Journal of Biological Chemistry
      J. Biol. Chem.
      J Biol Chem
      JBC

      If you  do not  enter a  valid journal  name or abbreviation, it will
      show you the list of those that could potentially match your input.


   -  We have  improved the  options that allow you to search in SWISS-PROT
      by 'description' or by 'full text':

        If your  search criteria  return a list that contains more than two
        entries, you  now have  the option  that to  save these  SWISS-PROT
        entries into  a file  which is  stored (for  up to  a week)  on the
        ExPASy FTP  anonymous server.  Thus it is now possible for users to
        create custom subsets of the database and to download them on their
        computer.

        If your  search criteria does not return any entry, you can, if you
        believe that the sequence(s) that you are looking for are currently
        missing in  SWISS-PROT, send  a message  to the  SWISS-PROT team so
        that they  can take steps to insure that these sequence(s) be added
        to the database.

   -  The Journal  of Biological  Chemistry (JBC)  has a  WWW server  where
      abstracts and  full text of articles are made available. We are happy
      to announce  the implementation  of what  we believe  to be the first
      direct link  in a  sequence database between a reference and the full
      text version  of a cited article. Recent JBC references in SWISS-PROT
      and PROSITE  are directly  linked to the corresponding entry point in
      the JBC server.

   -  ProtParam is a new tool which we have implemented and that allows the
      computation of  various physical  and chemical parameters for a given
      protein stored  in SWISS-PROT  or for  a user  entered sequence.  The
      computed parameters  include the  molecular weight,  theoretical  pI,
      amino acid  composition, extinction coefficient, estimated half-life,
      instability index and aliphatic index.




<PAGE>



   -  RandSeq is  a new  tool which generates random protein sequences. You
      can choose the length of the sequence to be created as well as choose
      between four  different options  for the composition of the generated
      sequence: equal  composition for all amino acids; use the composition
      of  a   specific  sequence   from  SWISS-PROT;   average  amino  acid
      composition (computed from SWISS-PROT); user specified composition in
      percent.

   -  WWW links  have been implemented between SWISS-PROT yeast entries and
      SGD  (see   section  2.3),  as  well  between  Escherichia  coli  K12
      chromosomal entries  and the  EcoCyc database, the encyclopedia of E.
      coli Gene and Metabolism.

   -  Most SWISS-PROT  documents are  now directly  linked to  relevant WWW
      servers or specific documents (see section 2.5).

   -  Many other changes have been made to all parts of the server.


   2.7  Weekly updates of SWISS-PROT

   Weekly updates of SWISS-PROT are available by anonymous FTP. Three files
   are updated at each update:

   new_seq.dat    Contains all the new entries since the last full release;
   upd_seq.dat    Contains the entries for which the sequence data has been
                  updated since the last release;
   upd_ann.dat    Contains the  entries for  which one  or more  annotation
                  fields have been updated since the last release.

   Currently these  files are  available on  the  following  anonymous  ftp
   servers:

   Organization   ExPASy (Geneva University Expert Protein Analysis System)
   Address        expasy.hcuge.ch  (or 129.195.254.61)
   Directory      /databases/swiss-prot/updates

   Organization   National Center for Biotechnology Information (NCBI)
   Address        ncbi.nlm.nih.gov (or 130.14.20.1)
   Directory      /repository/swiss-prot/updates

   Organization   European Bioinformatics Institute (EBI)
   Address        ftp.ebi.ac.uk (or 193.62.196.6)
   Directory      /pub/databases/swissprot/new

   Organization   Bioinformatics Unit, Weizmann Institute of Science (WIS)
   Address        bioinformatics.weizmann.ac.il (or 132.76.55.12)
   Directory      /pub/databases/swiss-prot/updates

   !! Important notes !!!

   Although we  try to  follow a  regular schedule,  we do  not promise  to
   update these  files every  week. In some cases two weeks will elapse in-
   between two updates.



<PAGE>


   Due to  the current  mechanism used  to build a release the entries that
   are provided in these updates are not guaranteed to be error free.



                      3.0  IMPORTANT FORTHCOMING CHANGE

   3.1  Major changes to the cross-references to EMBL

   In the  next release,  the format  of the  DR (Database cross-Reference)
   lines pointing  to EMBL  Nucleotide Sequence  Database entries  will  be
   changed from:

   DR   EMBL; ACCESSION_NUMBER; ENTRY_NAME.

   to:

   DR   EMBL; ACCESSION_NUMBER; PID; STATUS_IDENTIFIER.

   Where 'PID'  stands for  the "Protein  IDentification" number.  It is  a
   number that  you will  find from  EMBL release  45 onwards  (and Genbank
   release 94.0  onwards) in  a qualifier called "/db_xref" which is tagged
   to every CDS in the nucleotide database. Example:

   FT   CDS            54..1382
   FT                  /note="ribulose-1,5-bisphosphate carboxylase/
   FT                  oxygenase activase precursor"
   FT                  /db_xref="PID:g1006835"

   When an EMBL database CDS exists as a sequence report in SWISS-PROT, the
   SWISS-PROT DR  lines of  the  corresponding  SWISS-PROT  entry  will  be
   updated by  citing the PID as secondary identifier. In all cases where a
   PID will  have been  integrated into  SWISS-PROT, a "/db_xref" qualifier
   citing the  corresponding SWISS-PROT  entry will  be added  to the  EMBL
   database CDS labeled with this PID. Example:

   FT   CDS             14556__15696
   FT                   /gene="cytochrome b"
   FT                   /codon_start=1
   FT                   /product="apoprotein"
   FT                   /db_xref="PID:g463170"
   FT                   /db_xref="SWISS-PROT:P12778"

   This approach  enables us  to point  precisely from  a given  SWISS-PROT
   entry to one of potentially many CDS in the corresponding EMBL entry and
   vice versa.  This change  will allow  the development  of software tools
   that automatically retrieve the part of a nucleotide sequence entry that
   codes for  a specific  protein. This  will be  especially useful  in the
   context of  World-Wide Web  as  it  will  render  obsolete  the  current
   situation where,  for  example,  one  needs  to  retrieve  the  complete
   sequence of  a yeast  chromosome when  one wants the nucleotide sequence
   coding for a specific protein encoded on that chromosome.

   This major  changes has  been in preparation for the last six months, it
   is one of the reasons that release 32 was delayed so long. In the course



<PAGE>



   of cross-referencing at the level of the "PID", we had to manually check
   thousands of  problem cases.  This lead  to many sequence and annotation
   updates.

   An additional  important principle  of the PID system is that whenever a
   change is  made to  the nucleotide  entry or  to the annotations of that
   entry and  that this  change produces  a modification  in the translated
   protein sequence,  the PID  number corresponding  to the modified CDS is
   replaced by  a completely  new number.  The old number will be kept in a
   special field tagged to the CDS. The exact syntax of this field is under
   discussion at the international nucleotide databases.

   The  new   cross-referencing   system   will   allow   a   much   closer
   interconnection between  SWISS-PROT  and  the  international  nucleotide
   sequence databases.  For example, it will allow us to automatically take
   into account  sequence updates  made to  the nucleotide entry when these
   updates have an impact on the derived protein sequence(s).

   It should also be noted that the "PID" numbers in the context of GenBank
   replace the  "NCBI gi" numbering system which was present in the "/note"
   qualifier. The "gi" identifiers for the nucleic acid sequences have been
   replaced by "NID" (nucleic acid identifier) numbers.

   The 'STATUS_IDENTIFIER'  provides  information  about  the  relationship
   between the  sequence in  the  SWISS-PROT  entry  and  the  CDS  in  the
   corresponding EMBL entry.

   a) In  most cases  the translation  of the  EMBL nucleotide sequence CDS
   results in  the same  sequence as  shown in the corresponding SWISS-PROT
   entry or  the differences  are mentioned  in the SWISS-PROT feature (FT)
   lines as  CONFLICT, VARIANT  or VARSPLIC  and in  the RP lines. In these
   cases the status identifier shows a dash ("-").

   Example:

   DR   EMBL; Y00312; G63880; -.

   b) In  some cases  the translation  of the  EMBL nucleotide sequence CDS
   results  in  a  sequence  different  from  the  sequence  shown  in  the
   corresponding SWISS-PROT  entry  and  the  differences  are  either  not
   mentioned in  the SWISS-PROT  feature (FT) lines as CONFLICT, VARIANT or
   VARSPLIC and  in the  RP lines,  or do  simply not meet the criteria for
   such situations.

   1) If the  difference is  due to a different start of the sequence (e.g.
      SWISS-PROT believes  that the  start of  the sequence  is upstream or
      downstream of  the site annotated as the start of the sequence in the
      EMBL database),  the status  identifier shows the comment "ALT_INIT".
      Example:

        DR   EMBL; L29151; G466334; ALT_INIT.






<PAGE>


   2) If the  difference is  due to a different termination of the sequence
      (e.g. SWISS-PROT  believes that  the termination  of the  sequence is
      upstream or  downstream of  the site  annotated as  the  end  of  the
      sequence in  the EMBL  database), the  status  identifier  shows  the
      comment "ALT_TERM". Example:

        DR   EMBL; L20562; G398099; ALT_TERM.


   3) If the  difference is  due to  frameshifts in  the EMBL sequence, the
      status identifier shows the comment "ALT_FRAME". Example:

        DR   EMBL; M95935; G146416; ALT_FRAME.


   4) If the difference is not due to the cases mentioned above (e.g. wrong
      intron-exon boundaries  given in  the EMBL  entry) or to a mixture of
      the cases  mentioned above,  the status  identifier shows the comment
      "ALT_SEQ". Example:

        DR   EMBL; X79206; G809602; ALT_SEQ.

   c) In some cases the nucleotide sequence of a complete CDS is divided in
   exons present in different EMBL entries. We point to the exon containing
   EMBL entries  by citing  the PID  as secondary identifier and adding the
   comment "JOINED"  into the status identifier. These EMBL entries are not
   containing a  CDS feature,  they contain  exons joined  to a CDS feature
   which is labeled with the given PID.

   Example:

   DR   EMBL; M63397; G177196; -.
   DR   EMBL; M63395; G177196; JOINED.
   DR   EMBL; M63396; G177196; JOINED.

   In the  above example  the SWISS-PROT  sequence is  derived from the CDS
   labeled with  the PID G177196. This CDS feature can be found in the EMBL
   entry M63397.  Exons belonging  to this  CDS are  not only found in EMBL
   entry M63397, but also in the EMBL entries M63395 and M63396.

   d) In  some cases  there is  no CDS  feature key  annotating  a  protein
   translation in  an EMBL entry and thus no PID for that CDS. Therefore it
   is not  possible for  us to point to a PID as a secondary identifier. In
   these cases  we point  to the  relevant EMBL entries by including a dash
   ("-") in  the position  of the  missing PID and "NOT_ANNOTATED_CDS" into
   the status identifier.

   Example:

   DR   EMBL; J04126; -; NOT_ANNOTATED_CDS.








<PAGE>


   3.2  TREMBL - a supplement to SWISS-PROT

   The ongoing  genome sequencing  and mapping  projects have  dramatically
   increased the number of protein sequences to be incorporated into SWISS-
   PROT. Since we do not want to dilute the quality standards of SWISS-PROT
   by incorporating  sequences  into  SWISS-PROT  without  proper  sequence
   analysis and  annotation, we  cannot speed  up the  incorporation of new
   incoming data  indefinitely. But  as we  also want to make the sequences
   available as  fast as  possible, we  will introduce  with SWISS-PROT  an
   computer annotated supplement to SWISS-PROT. This supplement consists of
   entries in  SWISS-PROT-like format  derived from  the translation of all
   coding sequences  (CDS) in the EMBL nucleotide sequence database, except
   the CDS already included in SWISS-PROT.

   We name  this supplement  TREMBL  (TRanslation  from  EMBL),  since  the
   translation tools  used to  create the translations of the CDS are based
   on the  program  'trembl'  written  by  Thure  Etzold  at  the  EMBL  in
   Heidelberg.

   We will  translate all  CDS's in  the EMBL  Nucleotide Sequence Database
   into TREMBL  preentries. The  preentries already  as sequence reports in
   SWISS-PROT will be excluded from TREMBL. Then the remaining entries will
   be automatically  merged  whenever  possible  to  reduce  redundancy  in
   TREMBL. This  step will  lead to  approximately 90'000  TREMBL  entries,
   which are supplementing SWISS-PROT.

   We will split TREMBL in two main sections; SP-TREMBL and REM-TREMBL:

   SP-TREMBL (SWISS-PROT  TREMBL) will  contain the  entries (about 75'000)
   which  should   be  incorporated  into  SWISS-PROT.  SP-TREMBL  will  be
   partially redundant  against SWISS-PROT,  since approximately  40'000 of
   these SP-TREMBL  entries will  be only  additional sequence  reports  of
   proteins already  in SWISS-PROT.  We will  try to  merge these  sequence
   reports as fast as possible with the already existing SWISS-PROT entries
   for these  proteins, so  as to  make SWISS-PROT  and  TREMBL  completely
   nonredundant.

   REM-TREMBL (REMaining  TREMBL) will  contain the  entries (about 15'000)
   that we  do not  want to  include in  SWISS-PROT. This  section will  be
   organized in four subsections:

   1) Most REM-TREMBL entries will be immunoglobulins and T-cell receptors.
      We stopped  entering immunoglobulins and T-cell receptors into SWISS-
      PROT, because  we only  want to  keep  the  germ  line  gene  derived
      translations of  these proteins  in  SWISS-PROT  and  not  all  known
      somatic recombinated  variations of  these proteins. We are expecting
      more than  10'000 immunoglobulins  and T-cell receptors in TREMBL. We
      would like  to create  a  specialized  database  dealing  with  these
      sequences as  a further  supplement to  SWISS-PROT and  keep  only  a
      representative cross-section of these proteins in SWISS-PROT.

   2) Another category of data which will not be included in SWISS-PROT are
      synthetic sequences.  Again, we do not want to leave these entries in
      TREMBL.  Ideally   one  should   build  a  specialized  database  for
      artificial sequences as a further supplement to SWISS-PROT.



<PAGE>


   3) A third  subsection consists  of fragments with less than seven amino
      acids.

   4) The last subsection consists of CDS translations where we have strong
      evidence to believe that these CDS are not coding for real proteins.

   The first  full release of TREMBL will be distributed with release 34 of
   SWISS-PROT. However  we will  make available,  with release  33, a  beta
   release so that users and software developers can send us feedback about
   this new supplement to SWISS-PROT.



   3.3  Introduction of a new CC line-type topic (MASS SPECTROMETRY)

   We will  introduce in  the next  release a  new 'topic' for the comments
   (CC) line-type: MASS SPECTROMETRY. This topic will be used to report the
   exact molecular  weight of  a protein or part of a protein as determined
   by mass spectrometric methods. The syntax of this new topic will be:

   CC   -!-  MASS SPECTROMETRY: MW=XXX[; MW_ERR=XX]; METHOD=XX[; RANGE=XX-XX].

   Where:

   -  "MW=XX" is the determined molecular weight (MW);
   -  "MW_ERR=XX" (optional)  is the  accuracy or  error range  of  the  MW
      measurement;
   -  "METHOD=XX" is the masss spectrometric method: "ELECTROSPRAY" is used
      for electrospray  ionization (ESI)  and "MALDI"  is used  for matrix-
      assisted laser desorption/ionization;
   -  "RANGE=XX-XX" (optional) is used to indicate what part of the protein
      sequence entry corresponds to the molecular weight. If this qualifier
      is not  present, the  MW value  corresponds to the full length of the
      protein sequence.

   Examples of its usage:

   CC   -!- MASS SPECTROMETRY: MW=13423.3; METHOD=ELECTROSPRAY.
   CC   -!- MASS SPECTROMETRY: MW=71890; MR_ER=7; METHOD=ELECTROSPRAY.
   CC   -!- MASS SPECTROMETRY: MW=8597.5; METHOD=ELECTROSPRAY; RANGE=40-119.

   It should  be noted  that the  syntax of this topic may evolve in future
   releases as  we expect  feedback from groups using mass spectrometry for
   protein identification on 2D gels, MW determination and characterization
   of post-translational modifications.



   3.4  Change in the syntax of the SQ line

   The SQ  (SeQuence header)  line marks the beginning of the sequence data
   and gives  a quick  summary of its content. The format of the SQ line is
   currently:

   SQ   SEQUENCE  XXXX AA; XXXXX MW;  XXXXX CN;



<PAGE>



   The line  contains the  length  of  the  sequence  in  amino-acids  (AA)
   followed by  the molecular weight (MW) rounded to the nearest gram and a
   checking number (CN) as shown in the example:

   SQ   SEQUENCE 104 AA; 11530 MW; 54319 CN;

   Starting with the next release, we will replace the checking number (CN)
   by a 32-bit CRC (Cyclic Redundancy Check) value. The new syntax will be:

   SQ   SEQUENCE  XXXX AA; XXXXX MW;  XXXXXXXX CRC32;

   Example:

   SQ   SEQUENCE   104 AA;  11530 MW;  7A70363C CRC32;




                            4. ENZYME AND PROSITE


   4.1  The ENZYME data bank

        4.1.1  Content of the release

   Release 19.0  of the  ENZYME data bank is distributed with release 32 of
   SWISS-PROT. ENZYME  release 19.0  contains information  relative to 3601
   enzymes. We  have updated the data bank with new information released by
   the Nomenclature Committee of IUBMB.


        4.1.2  Improvements in the ENZYME section of the ExPASy WWW server

   On ExPASY,  the display of ENZYME entries has been completely changed to
   be made  more readable.  One of the changes is that each compound listed
   in a reaction is presented on a separate line. Example:

   UDP-GLUCOSE + 2 NAD(+) + H(2)O = UDP-GLUCURONATE + 2 NADH.

   is now shown as:

         UDP-GLUCOSE
      +  2 NAD(+)
      +  H(2)O
     <=> UDP-GLUCURONATE
      +  2 NADH.


   Links have  been added  to the  Klotho database  of metabolic  compounds
   maintained by  Tonic Kazic  at the Institute for Biomedical Computing at
   Washington University in St. Louis.






<PAGE>


   4.2  The PROSITE data bank

        4.2.1  Statistics for release 13

   Release 13.0  of the PROSITE data bank is distributed with release 32 of
   SWISS-PROT. This  release of  PROSITE contains 889 documentation entries
   that describe  1'167 different  patterns, rules  and  profiles/matrices.
   Since the  last full  release (12.0  of June  1994)  we  added  104  new
   documentation entries  and updated  499 entries.  Therefore 68%  of  all
   PROSITE entries are either new or updated.

   Out of  a total  of 49'340  entries in  SWISS-PROT,  24'137  are  cross-
   referenced in  PROSITE (excluding  the false  positives). This tally for
   exactly 49% of the sequences in SWISS-PROT.

        4.2.2  List of the new entries in release 13

      C1q domain signature
      Death domain profile
      Forkhead-associated (FHA) domain profile
      PH domain profile
      Src homology 2 (SH2) domain profile
      Src homology 3 (SH3) domain profile
      WW/rsp5/WWP domain signature and profile
      S-layer homology domain signature
      Prokaryotic dksA/traR C4-type zinc finger
      Copper-fist domain
      Bacterial regulatory proteins, iclR family signature
      Bacterial regulatory proteins, marR family signature
      Bacterial regulatory proteins, tetR family signature
      Sigma-70 factors ECF subfamily signature
      Ribosomal protein L10 signature
      Ribosomal protein L24 signature
      Ribosomal protein L31 signature
      Ribosomal protein L7Ae signature
      Ribosomal protein L13e signature
      Ribosomal protein L18e signature
      Ribosomal protein L24e signature
      Ribosomal protein L27e signature
      Ribosomal protein L31e signature
      Ribosomal protein L34e signatures
      Ribosomal protein L35Ae signature
      Ribosomal protein L37e signature
      Ribosomal protein S6 signature
      Homoserine dehydrogenase signature
      Aspartate-semialdehyde dehydrogenase signature
      Pyridoxamine 5'-phosphate oxidase signature
      Respiratory-chain NADH dehydrogenase 20 Kd subunit signature
      Respiratory-chain NADH dehydrogenase 24 Kd subunit signature
      NNMT/PNMT/TEMT family of methyltransferases signature
      Ribosomal RNA adenine dimethylases signature
      Squalene and phytoene synthases signatures
      ROK family signature





<PAGE>



      Casein kinase II regulatory subunit signature
      Shikimate kinase signature
      Prokaryotic diacylglycerol kinase signature
      Acetate and butyrate kinases family signatures
      RNA polymerases H / 23 Kd subunits signature
      RNA polymerases L / 13 to 16 Kd subunits signature
      RNA polymerases N / 8 Kd subunits signature
      RNA polymerases RPB6 / 6 Kd subunits signature
      Lipolytic enzymes "G-D-S-L" family, serine active site
      Class A bacterial acid phosphatases signature
      Phosphatidylinositol-specific phospholipase C profiles
      DNA/RNA non-specific endonucleases active site
      Thermonuclease family signature
      Chitinases family 18 signature
      Glycosyl hydrolases family 45 active site
      ATP-dependent serine proteases, lon family, serine active site
      Interleukin-1 beta converting enzyme family active sites
      Hydroxymethylglutaryl-coenzyme A lyase active site
      DNA photolyases class 2 signatures
      Adenylate cyclases class-I signatures
      Ribulose-phosphate 3-epimerase family signatures
      PpiC-type peptidyl-prolyl cis-trans isomerase signature
      Glucosamine/galactosamine-6-phosphate isomerases signature
      Terpene synthases signature
      SAICAR synthetase signatures
      NAD-dependent DNA ligase signatures
      Transposases, IS30 family, signature
      Molybdenum cofactor biosynthesis proteins signatures
      Radical activating enzymes signature
      Electron transfer flavoprotein beta-subunit signature
      Heavy-metal-associated domain
      Bacterial extracellular solute-binding proteins, family 1 signature
      Bacterial extracellular solute-binding proteins, family 3 signature
      Bacterial extracellular solute-binding proteins, family 5 signature
      Sulfate transporters signature
      Xanthine/uracil permeases family signature
      OmpA-like domain
      GPR1/FUN34/yaaH family signature
      FtsZ protein signatures
      Kinesin light chain repeat
      Bacterial microcompartiments proteins signature
      Flagella transport protein fliP family signatures
      Macrophage migration inhibitory factor family signature
      Scorpion short toxins signature
      GrpE protein signature
      Bacterial type II secretion system protein C signature
      Bacterial type II secretion system protein N signature
      Protein secE/sec61-gamma signature
      Fimbrial biogenesis outer membrane usher protein signature
      Apoptosis regulator proteins, Bcl-2 family signature
      GTP-binding nuclear protein ran signature
      Elongation factor Ts signatures
      Translation initiation factor SUI1 signature
      Calponin family repeat



<PAGE>



      CAP protein signatures
      Hydrogenases expression/synthesis hupF/hypC family signature
      NOL1/NOP2/fmu family signature
      Hypothetical SUA5/yciO/yrdC family signature
      Hypothetical YBL055c/yjjV family signatures
      Hypothetical YBR002c family signature
      Hypothetical YBR177c/yheT family signature
      Hypothetical YER057c/yjgF family signature
      Hypothetical YKL151c/yjeF family signatures
      Hypothetical hesB/yadR/yfhF family signature
      Hypothetical yabO/yceC/yfiI family signature
      Hypothetical yciL/yejD/yjbC family signature
      Hypothetical yedF/yeeD/yhhP family signature
      Hypothetical yhdG/yjbN/yohI family signature



        4.2.3  Status of profiles in PROSITE

   This is  the second  release of PROSITE to include weight matrices (also
   known as  profiles). The last release included only two profile entries;
   this release includes 16 profiles. Seven of these profiles are described
   by documentation entries that are linked to both a signature pattern and
   a profile.

   As in  general, a  profile is  much more  sensitive than  a pattern, you
   should try  to make  use of  the profile  if  you  have  access  to  the
   necessary software tools to do so.

   Many new  profiles are being prepared and will be progressively added to
   PROSITE. We also plan to upgrade some unsatisfactory patterns entries to
   profiles.

        4.2.4  Software to make use of the profiles

   A set  of two programs (for Unix systems) have been developed by Philipp
   Bucher to make use of the PROSITE profile entries:

   pfscan    scans a single sequence for the occurrences of several
             PROSITE profile entries.
   pfsearch  searches a sequence database for occurrences of a single
             PROSITE profile entry.

   These programs  are  available  from  the  ISREC  anonymous  ftp  server
   "ulrec3.unil.ch"; the files are located in the directory "/pub/pftools".

   From WWW,  you can  use "ProfileScan",  an ISREC  service that allows to
   scan a sequence against the profile entries in PROSITE; the URL for this
   service is:

               http://ulrec3.unil.ch/software/profilescan.html

   A link to this tool is also provided by the ExPASy WWW server.




<PAGE>



        4.2.5  Changes in the format of the PROSITE.DAT file

   In the  NR line  (Numerical  Results)  we  changed  the  format  of  the
   "/FALSE_NEG" qualifier and added a new qualifier, "/PARTIAL".

   The syntax  of the  "/FALSE_NEG" qualifier  which reports  the number of
   known  missed   sequences  used  to  be:  "/FALSE_NEG=x(y);"  where  `x'
   represented the  number of  hits and  `y' the  number of  sequences;  we
   simplified this  syntax to  "/FALSE_NEG=y;"  where  `y'  represents  the
   number of sequences.

   The new  qualifier "/PARTIAL"  is used to indicate the number of partial
   sequences which  belong to  the set  in consideration, but which are not
   hit by  the pattern  or profile  because  they  are  partial  (fragment)
   sequences. Its  syntax is  "/PARTIAL=y;" where `y' represents the number
   of sequences.


   Example of a complete block of NR lines:

   NR   /RELEASE=32,49340;
   NR   /TOTAL=123(56); /POSITIVE=115(51); /UNKNOWN=5(2); /FALSE_POS=3(3);
   NR   /FALSE_NEG=3; /PARTIAL=2;


   In the  above example  the scan for the pattern (or profile) was done on
   release 32  of SWISS-PROT  which contains  49'340 sequence entries, that
   pattern (or  profile) was  found 123  times in  56  different  sequences
   (/TOTAL). Out  of those  123 `hits',  115 were  produced by 51 sequences
   that belong  to the  set under  consideration (/POSITIVE),  5 hits  were
   produced by  two sequences  which  could  possible  belong  to  the  set
   (/UNKNOWN) and  3 hits  were produced by 3 other sequences (/FALSE_POS).
   That particular  pattern missed  3 sequences (/FALSE_NEG) and there were
   two partial  sequences that  belong to  the set  under consideration but
   which do  not include the region that contains that pattern (or profile)
   (/PARTIAL).



        4.2.6  New feature in the PROSITE.DOC file

   Starting with  release 13,  we added  a new  form of  references in  the
   PROSITE documentation  file (PROSITE.DOC).  These references  are of the
   form "[En]",  where "n"  is a number. These references are used to point
   to electronic documents available on the Word-Wide Web. Example:












<PAGE>



   {BEGIN}
   ********************************
   * AAA-protein family signature *
   ********************************

   A large  family of  ATPases has  been described  [1 to  5,E1] whose  key
   feature is that they  share  a conserved region of about 220 amino acids
   that contains an ATP-binding site. This  family  is now called AAA,  for
   'A'TPases 'A'ssociated

   ..Lots of lines deleted..

   [ 5] Confalonieri F., Duguet M.
        BioEssays 17:639-650(1995).
   [E1] http://yeamob.pci.chemie.uni-tuebingen.de/AAA/Description.html
   {END}

   It is  of course  possible, on  the ExPASY WWW server, when displaying a
   PROSITE  documentation   entry  to   directly  access  these  electronic
   references. While  this change  seems minor, we consider it as the first
   step in  the establishment  of a  on-line decentralized encyclopedia for
   protein families.



                             WE NEED YOUR HELP !

   We welcome  feedback from our users. We would especially appreciate that
   you notify  us if  you find  that sequences  belonging to  your field of
   expertise are  missing from  the data  bank. We  also would  like to  be
   notified about  annotations to be updated, if, for example, the function
   of a protein has been clarified or if new post-translational information
   has become available.
























<PAGE>



                         APPENDIX A: SOME STATISTICS



   A.1  Amino acid composition

        A.1.1  Composition in percent for the complete data bank

   Ala (A) 7.58   Gln (Q) 4.03   Leu (L) 9.29   Ser (S) 7.17
   Arg (R) 5.17   Glu (E) 6.31   Lys (K) 5.91   Thr (T) 5.76
   Asn (N) 4.52   Gly (G) 6.88   Met (M) 2.36   Trp (W) 1.26
   Asp (D) 5.30   His (H) 2.23   Phe (F) 4.04   Tyr (Y) 3.21
   Cys (C) 1.70   Ile (I) 5.70   Pro (P) 4.92   Val (V) 6.52

   Asx (B) 0.001  Glx (Z) 0.001  Xaa (X) 0.02


        A.1.2  Classification of the amino acids by their frequency

   Leu, Ala, Ser, Gly, Val, Glu, Lys, Thr, Ile, Asp, Arg, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Cys, Trp



   A.2  Repartition of the sequences by their organism of origin

   Total number of species represented in this release of SWISS-PROT: 4921

        A.2.1 Table of the frequency of occurrence of species

        Species represented 1x: 2231
                            2x:  776
                            3x:  441
                            4x:  272
                            5x:  200
                            6x:  198
                            7x:  117
                            8x:   95
                            9x:  103
                           10x:   50
                       11- 20x:  194
                       21- 50x:  147
                       51-100x:   49
                         >100x:   48













<PAGE>



        A.2.2  Table of the most represented species

    Number   Frequency          Species
         1        3468          Escherichia coli
         2        3391          Baker's yeast (Saccharomyces cerevisiae)
         3        3281          Human
         4        1978          Mouse
         5        1773          Rat
         6        1575          Haemophilus influenzae
         7        1389          Bacillus subtilis
         8         924          Caenorhabditis elegans
         9         800          Bovine
        10         768          Fruit fly (Drosophila melanogaster)
        11         605          Chicken
        12         603          Salmonella typhimurium
        13         479          African clawed frog (Xenopus laevis)
        14         460          Fission yeast (Schizosaccharomyces pombe)
        15         432          Rabbit
                   432          Arabidopsis thaliana (Mouse-ear cress)
        17         376          Pig
        18         282          Maize
        19         275          Bacteriophage T4
        20         251          Vaccinia virus (strain Copenhagen)
        21         236          Rice
        22         232          Pseudomonas aeruginosa
        23         213          Slime mold (Dictyostelium discoideum)
        24         205          Tobacco
        25         193          Human cytomegalovirus (strain AD169)
        26         190          Pea
        27         183          Vaccinia virus (strain WR)
                   183          Wheat
        29         173          Barley
        30         165          Staphylococcus aureus
        31         161          Soybean
        32         160          Pseudomonas putida
                   160          Dog
        34         157          Rhodobacter capsulatus
        35         155          Neurospora crassa
        36         154          Autographa californica nuclear polyhedrosis virus
        37         150          Marchantia polymorpha (Liverwort)
        38         148          Sheep
                   148          Klebsiella pneumoniae
        40         146          Variola virus
                   146          Bacillus stearothermophilus
        42         138          Spinach
        43         130          Tomato
        44         124          Potato
        45         122          Rhizobium meliloti
                   122          Mycobacterium leprae
        47         117          Lactococcus lactis (subsp. lactis)
        48         116          Agrobacterium tumefaciens
        49         100          Candida albicans
                   100          Chlamydomonas reinhardtii
                   100          Streptomyces coelicolor



<PAGE>



   A.3  Repartition of the sequences by size

               From   To  Number             From   To   Number
                  1-  50    2622             1001-1100      445
                 51- 100    4679             1101-1200      318
                101- 150    6342             1201-1300      239
                151- 200    4810             1301-1400      151
                201- 250    4339             1401-1500      134
                251- 300    3837             1501-1600       83
                301- 350    3650             1601-1700       61
                351- 400    3624             1701-1800       60
                401- 450    2762             1801-1900       64
                451- 500    2777             1901-2000       40
                501- 550    1982             2001-2100       23
                551- 600    1412             2101-2200       51
                601- 650    1036             2201-2300       56
                651- 700     782             2301-2400       23
                701- 750     713             2401-2500       30
                751- 800     568             >2500          145
                801- 850     431
                851- 900     457
                901- 950     322
                951-1000     272


   A.4  Longest sequences

   The longest sequences (>=4000 residues) are listed here:

                               HTS1_COCCA  5217
                               FAT_DROME   5147
                               RYNR_RABIT  5037
                               RYNR_PIG    5035
                               RYNR_HUMAN  5032
                               RYNC_RABIT  4969
                               DYHC_DICDI  4725
                               DYHC_RAT    4644
                               DYHC_DROME  4639
                               APB_HUMAN   4563
                               APOA_HUMAN  4548
                               RRPA_CVMJH  4488
                               DYHC_TRIGR  4466
                               DYHC_ANTCR  4466
                               GRSB_BACBR  4451
                               PKSK_BACSU  4447
                               PKSL_BACSU  4427
                               YP73_CAEEL  4385
                               DYHC_NEUCR  4367
                               DYHC_EMENI  4344
                               PLEC_RAT    4140
                               DYHC_YEAST  4092
                               RRPA_CVH22  4085





<PAGE>


   A.5  List of the most cited journals in SWISS-PROT

   Citations            Journal abbreviation

   4793                 J. BIOL. CHEM.
   3162                 NUCLEIC ACIDS RES.
   3037                 PROC. NATL. ACAD. SCI. U.S.A.
   2087                 J. BACTERIOL.
   1706                 GENE
   1644                 FEBS LETT.
   1535                 EUR. J. BIOCHEM.
   1394                 EMBO J.
   1323                 NATURE
   1304                 BIOCHEM. BIOPHYS. RES. COMMUN.
   1235                 BIOCHEMISTRY
   1023                 BIOCHIM. BIOPHYS. ACTA
    973                 J. MOL. BIOL.
    956                 CELL
    923                 MOL. CELL. BIOL.
    786                 MOL. GEN. GENET.
    716                 PLANT MOL. BIOL.
    705                 VIROLOGY
    684                 BIOCHEM. J.
    610                 SCIENCE
    570                 MOL. MICROBIOL.
    551                 J. BIOCHEM.
    452                 J. VIROL.
    404                 J. GEN. VIROL.
    316                 J. CELL BIOL.
    304                 GENOMICS
    287                 GENES DEV.
    258                 YEAST
    253                 BIOL. CHEM. HOPPE-SEYLER
    250                 CURR. GENET.
    233                 PLANT PHYSIOL.
    232                 ARCH. BIOCHEM. BIOPHYS.
    229                 J. IMMUNOL.
    223                 INFECT. IMMUN.
    213                 HOPPE-SEYLER'S Z. PHYSIOL. CHEM.
    212                 MOL. BIOCHEM. PARASITOL.
    197                 J. GEN. MICROBIOL.
    179                 MOL. ENDOCRINOL.
    175                 HUM. MOL. GENET.
    169                 J. CLIN. INVEST.
    167                 ONCOGENE
    156                 FEMS MICROBIOL. LETT.
    151                 AM. J. HUM. GENET.
    145                 DNA
    136                 J. EXP. MED.
    129                 J. MOL. EVOL.
    129                 GENETICS
    115                 BLOOD
    112                 DEVELOPMENT
    108                 NEURON
    108                 HEMOGLOBIN
    102                 AGRIC. BIOL. CHEM.


<PAGE>

           APPENDIX B: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES

   The current  status of the relationships (cross-references) between some
   biomolecular databases is shown in the following schematic:

                         ***********************
******************       *  EMBL Nucleotide    *       **********************
* EPD [Euk.Prom] * <---> *  Sequence Database  * <---- * ECDC [E.coli map]  *
******************       *       [EBI]         *       **********************
                         ***********************
                          ^  ^ ^  ^  ^ ^ ^  ^
******************        |  | |  I  | | |  |
* FlyBase        * <------+  | |  I  | | |  |          **********************
* [D.melanogas.] *        |  | |  I  | | |  +--------> * GCRDb [7TM recep.] *
******************        |  | |  I  | | |  |          **********************
                          |  | |  I  | | |  |
******************        |  | |  I  | | |  |          **********************
* SubtiList      * <---------+ |  I  | | +-----------> * EcoGene [E.coli]   *
* [B.subtilis]   *        |  | |  I  | | |  |          **********************
******************        |  | |  I  | | |  |
                          |  | |  I  | | |  |          **********************
******************        |  | |  I  +---------------> * LISTA [Yeast]      *
* MaizeDb        * <-----------+  I  | | |  |          **********************
* [Zea mays]     *        |  | |  I  | | |  |
******************        |  | |  I  | | |  |          **********************
                          |  | |  I  | +-------------> * SGD [Yeast]        *
******************        |  | |  I  | | |  |          **********************
* WormPep        *        |  | |  I  | | |  |
* [C.elegans]    * <----+ |  | |  I  | | |  |          **********************
******************      | |  | |  I  | | |  | +------> * DictyDB [D.disco.] *
                        | |  | |  I  | | |  | |        **********************
******************      | v  v v  v  v v v  v v
* REBASE         *      ***********************        **********************
* [Restriction   * <--- *  SWISS-PROT         * <----- * ENZYME [Nomencl.]  *
*  enzymes]      *      *  Protein Sequence   *        **********************
******************      *  Data Bank          *            v
                        ***********************        **********************
******************      ^ ^ ^  ^ ^  ^ | ^ ^ |          * OMIM [Human]       *
* StyGene        *      | | |  | |  | | | | +--------> **********************
* [S.Typhimurium]* <----+ | |  | |  | | | |
******************        | |  | |  | | | |            **********************
                          | |  | |  | | | +----------> * ECO2DBASE     [2D] *
******************        | |  | |  | | |              **********************
* Transfac       * <------+ |  | |  | | |
******************          |  | |  | | |              **********************
                            |  | |  | | +------------> * SWISS-2DPAGE  [2D] *
******************          |  | |  | |                **********************
* PROSITE        * <--------+  | |  | |
* [Patterns and  *             | |  | |                **********************
* profiles]      *             | |  | +--------------> * Aarhus/Ghent  [2D] *
******************             | |  |                  **********************
             |                 | |  |
             |                 | |  +----------------> **********************
             |                 | |                     * YEPD [Yeast]  [2D] *
             |                 | +-----------------+   **********************
             |                 v                   |
             |          ***********************    +-> **********************
             +--------> * PDB [3D structures] * <----- * HSSP [3D similar.] *
                        ***********************        **********************
<PAGE>

ExPASy logo ExPASy Home page Site Map Search ExPASy Contact us Swiss-Prot
 Hosted by kr flag YPRC Korea Mirror sites: Australia  Brazil  Canada  China  Switzerland