chado doc

git-svn-id: svn+ssh://svn.internal.sanger.ac.uk/repos/svn/pathsoft/artemis/trunk@9762 ee4ac58c-ac51-4696-9907-e4b3aa274f04

chado doc
b52fa99c · tjc · 071715df · b52fa99c · b52fa99c · b52fa99c
Commit b52fa99c authored 16 years ago by tjc
--- a/docs/chado/admin.shtml
+++ b/docs/chado/admin.shtml
+<!--#set var="banner" value="Artemis - Chado Admin/Developer Documentation"-->
+<!--#include virtual="/perl/header"-->
+
+
+<p>
+This page describes some of the development background and admin of
+using and setting up an Artemis and ACT connection with a Chado database.
+
+<ul>
+<li><a href="#DATABASEMANAGER">Opening the Database Manager</a>
+<li><a href="#OPENART">Opening the main Artemis/ACT window</a>
+<li><a href="#CONFIG">Option File Configuration</a>
+<li><a href="#GENEBUILDER">Opening the Standalone Gene Builder</a>
+</ul>
+
+
+
+<h3><a name="DATABASEMANAGER">Opening the Database Manager</a></h3>
+<p>
+To open the Artemis Database Manager panel (from which the browser is launched), 
+Artemis looks initially for the existence of the cvterm.name = 'top_level_seq' which 
+belongs to cv.name = 'genedb_misc'. If these exist it follows method A:
+
+<ol type="A">
+<li>
+-call 'getTopLevelOrganisms' (in Organism.xml mapping). This relies on the the source 
+features (e.g. chromosome) having a featureprop with a type_id corresponding to 'top_level_seq'.
+<p>
+If the 'top_level_seq' is not implemented in the database it then follows method B:
+<p>
+</li>
+
+<li>
+-call 'getOrganismsContainingSrcFeatures' (in Organism.xml mapping). This searches for those 
+organisms that contain sequences with residues and have a type_id that 
+corresponds to a cvterm name that matches:
+<p>
+*chromosome*, *sequence*, supercontig, ultra_scaffold, golden_path_region, or contig
+<p>
+</li>
+</ol>
+When the organisms with the source feature have been identified these are displayed. When 
+a user clicks on an organism it then builds the types (e.g. chromosome, contig) of source 
+feature and the underlying features that have residues (getResidueFeatures in Feature.xml).
+<p>
+<h3><a name="OPENART">Opening the main Artemis/ACT window</a></h3>
+<p>
+The organismprop's are loaded lazily when a sequence is opened. If an organismprop 
+is of type 'translationTable' the value of the organismprop is then used as the 
+translation table when Artemis opens a sequence from that organism.
+<p>
+When a sequence is double clicked to open it in Artemis, most things for that sequence
+are read from the database. The iBatis statement calls made when reading an entry are
+summarised below.
+<p>
+<table border=1 cellspacing=1 cellpadding=2>
+<tr><td>Statement ID</td><td>SQL Mapping File</td><td>Description</td></tr>
+<tr><td>getFeature                         </td><td>(Feature.xml)</td> 
+   <td>Retrieves all the features and their featureloc's, featureprop's, feature_relationship's and primary dbxref</td></tr>
+<tr><td>getFeatureDbXRefsBySrcFeature      </td><td>(FeatureDbXRef.xml)</td>      
+    <td>Retrieves all secondary dbxref's</td></tr>
+<tr><td>getFeatureSynonymsBySrcFeature     </td><td>(FeatureSynonym.xml)</td>
+    <td>Retrieves feature synonyms</td></tr>
+<tr><td>getFeatureCvTermsBySrcFeature      </td><td>(FeatureCvTerm.xml)</td>
+    <td>Retrieves feature_cvterm's, feature_cvtermprop (evidence code, extra qualifiers, date).</td></tr>
+<tr><td>getFeatureCvTermDbXRefBySrcFeature </td><td>(FeatureCvTermDbXRef.xml)</td>
+    <td>Retrieves feature_cvterm_dbxref (WITH/FROM column).</td></tr>
+<tr><td>getFeatureCvTermPubBySrcFeature    </td><td>(FeatureCvTermPub.xml)</td>
+    <td>Retrieves feature_cvterm_pub's.</td></tr>
+</table>
+<p>
+Artemis constructs an internal GFF3 stream from these calls for the selected sequence. 
+This is then read in the same way as a GFF3 file as an Artemis DatabaseDocumentEntry
+(which extends GFFDocumentEntry) and creating GFFStreamFeatures.
+<p>
+If the lazy load option is selected from the Database Manager's File menu, then only
+getFeature is called. The resulting GFFStreamFeature object is marked as lazy loading
+and FeatureDbXRefs, FeatureSynonyms, FeatureCvTerms, FeatureCvTermDbXRefs
+and FeatureCvTermPubs are read from the database for a feature when the Gene Builder is opened.
+<p>
+The feature_relationship (from getFeature) is used to create the gene hierarchy; 'part_of'a
+and 'derives_from' relationships become Parent and Derives_from in GFF3 terms. If the 
+feature_relationship type_id does not correspond to one of these terms (derives_from, 
+part_of, proper_part_of, partof, producedby) then the object_id is recorded as a qualifer 
+value. This is used to read orthologous_to and paralogous_to relations. The qualifier 
+values for these are lazily stored (as ClusterLazyQualifierValue.java). When Artemis 
+displays these qualifiers in the Gene Builder it then queries the database further to 
+list the related genes.
+<p>
+Other properties that have a featureloc association with a feature
+are found by calling getLazySimilarityMatches (Feature.xml). Artemis then
+constructs lazy loading qualifiers (QualifierLazyLoading.java) from this that query
+the database further only when that qualifier is needed. This is used for
+blast/fasta similarity and polypeptide_domains.
+
+<p>
+The gene hierarchy is stored internally by the ChadoCanonicalGene.java object and is based
+on the Parent/Derives_from relationships. It stores the related children of the gene.
+The spliced features (exon, pseudogenic_exon) are combined into a single Artemis 
+Feature. The joined exons become an Artemis CDS feature (GFFStreamFeature), which stores 
+the uniquenames of the original exons in the database.
+
+<p>
+<h3><a name="CONFIG">Option File Configuration</a></h3>
+<p>
+Artemis combines the exons stored in chado and describes it as an 'exon-model' 
+feature by default (defined by uk.ac.sanger.artemis.util.DatabaseDocument.EXONMODEL).
+The chado_exon_model flag in the options file allows this to be changed. 
+<p>
+When a gene model is created in Artemis it creates the transcript as a 'mRNA' 
+feature by default (defined by uk.ac.sanger.artemis.util.DatabaseDocument.TRANSCRIPT).
+The chado_transcript flag in the options file allows this to be changed.
+<p>
+A list of available databases can be configured in the options file (chado_servers flag).
+For each an alias is given followed by its location (host:port/database?user), each alias
+is displayed in a drop down menu in the login box.
+<p>
+Below is an example configuration option:
+<pre>
+# chado gene model features default types
+chado_exon_model=CDS
+#chado_transcript=transcript
+
+# provide a list of available servers
+chado_servers = \
+   test localhost:5432/test?userName \
+   genedb_ro db.genedb.org:5432/snapshot?genedb_ro
+</pre>
+
+<a name="GENEBUILDER"></a><h3>Opening the Standalone Gene Builder</h3>
+<p>
+The Gene Builder can be launched on its own without opening up Artemis. The following
+opens up a window which lets you type in a gene name to be opened:
+
+ <pre>
+ java -mx500m -Dibatis -Dchado="localhost:5432/database?" \
+      -Djdbc.drivers=org.postgresql.Driver -classpath artemis.jar \
+      uk.ac.sanger.artemis.components.genebuilder.GeneEdit
+ </pre>
+
+Alternatively the gene name can be given as an argument:
+ <pre>
+ java -mx500m -Dibatis -Dchado="db.genedb.org:5432/snapshot?genedb_ro" \
+      -Djdbc.drivers=org.postgresql.Driver -Dshow_log -Dread_only \
+      -classpath jar_build/artemis.jar:etc 
+       uk.ac.sanger.artemis.components.genebuilder.GeneEdit PFA0010c
+ </pre>
+
+
--- a/docs/chado/index.shtml
+++ b/docs/chado/index.shtml
+<!--#set var="banner" value="Artemis Connecting to Chado Databases"-->
+<!--#include virtual="/perl/header"-->
+
+	<a href="http://www.sanger.ac.uk/Software/Artemis/"><b>Artemis</b></a> and
+        <a href="http://www.sanger.ac.uk/Software/ACT/"><b>ACT</b></a> can be used
+	to connect to <a href="http://www.gmod.org/"><b>Chado</b></a> databases.
+        They are being developed to read and write to the database and perform the
+        same functions as the standard Artemis and ACT. 
+	
+	<p>
+	An example read-only database can be found
+	<a href="/Software/Artemis/databases/">here</a>.
+	<p>
+	<h3>Online documentation</h3>
+	<p>
+	<ul>
+	<li><a href="overview.shtml">Overview Document</a></li>
+	<li><a href="admin.shtml">Admin Document</a></li>
+	<li><a href="storage.html">Data storage</a> (for the Pathogen Group)</li>
+	</ul>
+      
+<p>
+
+<!--#set var="pubmed_tabulate" value="yes"-->
+<!--#set var="pubmed_ids" value="18845581"-->
+<!--#include virtual="/perl/utils/pubmedalizer"-->
+
+
--- a/docs/chado/overview.shtml
+++ b/docs/chado/overview.shtml
+<!--#set var="banner" value="Artemis - Chado Overview"-->
+<!--#include virtual="/perl/header"-->
+
+	<p>This overview covers:
+
+<ul>
+<li><a href="#CONNECT">Connecting to a Chado Database</a>
+<li><a href="#READ">Reading From the Database</a>
+<li><a href="#IBATIS">iBatis Database Mapping</a>
+<li><a href="#GENE">Gene Representation</a>
+<li><a href="#GENEBUILDING">Gene Building</a>
+<li><a href="#MERGE+SPLIT">Gene merging and splitting</a>
+<li><a href="#WRITE">Writing To The Database</a>
+<li><a href="#GENEBUILDER">Opening the Standalone Gene Builder</a>
+<li><a href="#COMMUNITY+ANNOTATION">Community Annotation</a>
+</ul>
+		<a NAME="CONNECT"></a><h2>Connecting to a Chado Database</h2>
+		The following java flags are used when running Artemis when connecting to a
+		database. These options currently are all needed.
+		<ol>
+			<li><b><pre>-Dchado</pre></b>
+				this is used to get Artemis to look for the database.
+				The address of the database (hostname, port and name) can be conveniently
+				included as follows:
+				<br><pre>-Dchado="hostname:port/test?username"</pre>
+				So that these details are already completed in the popup login pane.
+				<br><br>
+				<img src="login.gif" align="middle" alt="login"/>
+				<br>
+			</li>
+			<li><b><pre>-Djdbc.drivers=org.postgresql</pre></b>
+				this is used to define the <a href="http://jdbc.postgresql.org/">
+					JDBC postgres driver
+				</a>.
+			</li>
+			<li><b><pre>-Dibatis</pre></b>
+				use the <a href="http://ibatis.apache.org/" title="iBatis">iBATIS</a>
+				Data Mapper
+			</li>
+		</ol>
+		<br>
+		So the command line will look something like this example:
+        <pre> ./art -Dchado="localhost:2996/test?tjc" -Dibatis \
+             -Djdbc.drivers=org.postgresql.Driver</pre>
+		
+        <a NAME="READ"></a><h2>Reading From the Database</h2>
+		On a successful login a database and file manager window will open up. 
+        The database manager will display "Database Loading...". The organisms
+		in the database with residues are shown in a expandable tree. Double
+		clicking on the sequence names opens them up in Artemis.
+		<br><br>
+		<img src="databasemanager.gif"/>
+		<p>
+		A sequence can be opened in Artemis from the command line (without going
+		through the database manager). This is done by supplying a command line argument
+		with the organism and chromosome (or source feature):
+		<br><pre>Pfalciparum:Pf3D7_09</pre>
+		and optionally a range can be included to just display features within it:
+		<br><pre>Pfalciparum:Pf3D7_09:92000..112000</pre>
+		this could be used in combination with the <i>-Doffset=base</i> flag (<i>e.g.
+		-Doffset=10000</i>) to open Artemis at a particular section of a sequence
+
+        <p>
+        To reduce the number of transactions to the database, all of the sequence is
+        read into Artemis. This includes most of the feature qualifiers. There are some
+        qualifiers (ortho/paralog and similarity qualifiers) that lazily load their data
+        as and when it is needed, <i>i.e.</i> when opened for viewing in the gene builder. 
+        This lazy loading improves the performance of reading data from the database 
+        for sequences with a large number of features.
+       
+
+		<a NAME="IBATIS"></a><h2>iBatis Database Mapping</h2>
+		<a href="http://ibatis.apache.org/" title="iBatis">iBatis</a> data mapper
+		framework has been used to facilitate the communication with the database
+		from Artemis. It uses XML descriptors to couple the SQL statements with the
+		Java objects that Artemis understands. The XML maps are in the '<i>artemis_sqlmap</i>'
+		in the Artemis distribution. These are divided up into files based on the
+		Chado table names.
+		<p>
+		The SQL statements can be seen in the Artemis <a href=
+        "http://www.sanger.ac.uk/Software/Artemis/manual/launch-window.html#LAUNCH-WINDOW-OPTIONS-SHOW-LOG">
+		Log Viewer</a> window:<br>
+		<img src="logviewer.gif" width="100%"/>
+		<br><br>This is mainly useful for debugging and tracking problems with reading
+		from and writing to the database. Artemis uses 
+        <a href="http://logging.apache.org/log4j/">log4j</a> to produce logging
+        and the configuration file for this is in the file '<i>etc/log4j.properties</i>'.
+
+		<a NAME="GENE"></a><h2>Gene Representation</h2>
+		Below is an illustration of how the feature are stored in Chado
+		in the Sanger PSU.
+		<p>
+		<center><b><i>Gene Model</i></b>
+		<br>
+		<img src="chado_gene_model.gif"/></center>
+		<p>The names (in red) are the internal database uniquenames. These names are
+		automatically generated by the gene builder from an ID provided by the
+		user. <i>N.B.</i> in our data model UTRs are represented as distinct from 
+        exons.
+		<a NAME="GENEBUILDING"></a><h3>Gene Building</h3>
+		A gene can be created in Artemis (or ACT) by highlighting a base range and selecting
+        from the '<i>Create</i>' menu the '<i>Gene Model From Base Range</i>' option. 
+        This prompts for a unique ID and this corresponds to the names in the above 
+        gene model representation. The basic constituent features are created; <i>i.e.</i> 
+        gene, transcript, CDS and polypeptide. <i>N.B.</i> Artemis joins the exon 
+        features and represents them as a CDS feature. These are shown on the frame 
+        lines in the feature display window.
+		<p>
+		A gene builder for a selected gene feature can be opened from the '<i>Edit</i>' menu
+		by selecting the '<i>Selected Feature in Editor</i>' option or simply using the '<i>E</i>'
+		shortcut key.
+		<center><p><b><i>The Artemis Gene Builder</i></b><br>
+		<img src="editor.gif"></center>
+		<p>There are two distinct parts to the gene builder window. The top part shows
+		the <b><i>gene hierarchy and structure</i></b>. The bottom part shows the 
+        <b><i>annotation</i></b> associated with one of the constituent features. 
+        These two parts of the gene builder are described below.
+		<ol>
+			<li><b>Gene Hierarchy and Structure</b><br>
+            The top left hand side is a tree structure of the gene model. To the right
+			of this is a graphical representation of the features. A feature can be selected
+			from either the tree or the graphical view. The annotation for the selected 
+            feature is displayed in the bottom part of the gene builder.
+			<p>Structural changes can be carried out in the graphical view. The feature ends
+			can be dragged to adjust their coordinates. On right clicking on this area there
+			is a popup menu for adding and deleting features in the gene model.
+			<center><p><b><i>Editing the Gene Model In the Gene Builder</i></b><br>
+			<img src="genebuilder2.gif" border="1"></center>
+			<p>
+			Additional transcripts can be added from here. The checkbox to the right of
+			the above CDS is used to hide and show the associated CDS in the Artemis 
+            feature display. This can make structural edits clearer for multiple transcripts.
+			<p>
+			<li><b>Annotation</b><br>
+            There are 4 (Properties, Core, Controlled Vocabulary and Match) sections in 
+            the annotation part of the gene builder. These are described below. These can 
+            be viewed in a scrollable view or in a tabbed view. There is a check box at 
+            the bottom of the gene builder to change between these views.
+            <p>
+			<ul>
+				<li><b>Properties</b><br>
+					This contains properties such as the synonyms, time last modified and the
+					internal ID. Synonyms are added as a controlled vocabulary (these are in 
+                    a cv named '<i>genedb_synonym_type</i>').
+
+                    <center><p><b><i>Properties section</i></b><br>
+                    <img src="Properties.gif" border="1"></center>
+
+				</li><p>
+				<li><b>Core</b><br>
+					The core annotation contains any other annotation that does not fit into the
+					other sections. <i>E.g.</i> comments, literature, Dbxref. Hyperlinks are 
+                    provided for SWALL, EMBL, UniProt, PMID, PubMed, InterPro and Pfam, and opening
+                    up a local browser. 
+				</li><p>
+				<li><b>Controlled Vocabulary (CV)</b><br>
+                    The CV module in Chado is concerned with controlled vocabularies or ontologies.
+                    Therefore, Chado can use the biological ontologies and this makes it very
+                    expressive.
+                    <p>
+					This section in the gene builder provides a form for adding and deleting GO, 
+                    controlled curation, product, Riley class annotation. CV terms are added by 
+                    clicking the 'ADD' button. When adding a term to a feature the user is 
+                    prompted for the CV name and then keyword. The term to be added is then 
+                    selected from a drop down list of terms containing the word or phrase. 
+                    To further assist in finding the CV term from the list, typing in the 
+                    text will start to autocomplete and scroll to the first matching term.
+
+                    <center><p><b><i>CV section</i></b><br>
+                    <img src="CV.gif" border="1"></center>
+
+                    <p>GO terms are selected from molecular_function, biological_process 
+                    or cellular_component CV's.
+
+                    <p>Products are stored in Chado as a CV (<i>i.e.</i> in cvterm in 
+                    a cv named '<i>genedb_products</i>').
+
+                    <p>Other generic controlled curations can be found by Artemis and shown 
+                    if their CV name in Chado is prefixed with '<i>CC_</i>' (<i>e.g.</i> 
+                    CC_controlledcuration, CC_workshop). These then appear in a drop down 
+                    list when adding CV terms to a feature.
+
+                    <p>Adding new terms to the database can also be done from this section. 
+                    In the drop down selection of CV's there is an 'Add term...' option. 
+                    This opens an input panel for new terms.
+                    <center><p><b><i>Adding a new CV term</i></b><br>
+                    <img src="addterm.gif"></center>
+				</li><p>
+				<li><b>Match</b><br>
+					This section allows the user to add ortholog/paralog links to other genes
+					in the database. 
+                    <p>
+                    The ortholog/paralog tables provide links for opening the gene editor or
+                    an Artemis window for each entry. The '<i>VIEW</i>' button opens a 
+                    separate Artemis displaying the gene ortholog or paralog and the 
+                    surrounding features.
+                    <p>
+                    In addition similarity qualifiers can be added here from matches to 
+                    blast and fasta searches carried out in Artemis. These are added
+					from the Artemis Object Editor.
+				</li>
+			</ul>
+		</ol>
+
+       <a NAME="MERGE+SPLIT"></a><h3>Gene merging and splitting</h3>
+
+       To merge gene models, select the CDS segments that are to be merged. Then use
+       the menu option:
+       <p>
+       <pre>Edit->Selected Feature(s)->Merge</pre>
+       <p>
+       The annotation and names from the segment first selected are maintained and 
+       the CDS features from the second gene model are added to the first selected gene model. 
+       The second gene model is deleted automatically.
+       <p>
+       To unmerge (split) the gene model into two gene models consecutive segments 
+       in the CDS are selected. This is done by clicking on the first segment and 
+       then pressing SHIFT and clicking on the second segment. Then use the menu option:
+       <p>
+       <pre>Edit->Selected Feature(s)->Unmerge</pre>
+       <p>
+       On unmerging the annotation and synonyms are maintained in both gene models. 
+       The second gene model component features are given a new internal ID (uniquename) 
+       based on the original and prefixed with DUP1-. 
+
+       <a NAME="WRITE"></a><h3>Writing To The Database</h3>
+       When a feature or qualifier is changed, added or deleted the '<i>Commit</i>' button (on 
+       the top tool bar) changes colour to red. Changes in Artemis only get written back to the
+       database when this button is clicked. 
+       <center><p><b><i>Commit Button</i></b><br>
+       <img src="commit.gif" border="1"></center>
+       <p>
+       There is also an option under the '<i>File</i>' menu to '<i>Commit To 
+       Database</i>'. Note in ACT there is no commit button and the '<i>Commit To 
+       Database</i>' menu option is used to write back to the database.
+       <p>
+       If there is an error during the commit then Artemis will provide the option to
+       force commit. This means it will commit what it can. Naturally this can be potentially
+       problematic. Therefore, <b>committing back to the database frequently is encouraged</b>. 
+       Any errors are reported in the log viewer.
+
+       <a NAME="COMMUNITY+ANNOTATION"></a><h3>Community Annotation</h3>
+       Multiple users can launch Artemis and query the database. This has been stress 
+       tested and used in the malaria re-annotation exercise with 30+ Artemis clients 
+       connecting to the database.
+       <p>
+       Artemis records the time a features was last modified (<i>timelastmodified</i>). Before 
+       changing a feature it will check this time stamp against the database record of the 
+       <i>timelastmodified</i>.
+       If the corresponding feature in the database has changed by another user it will
+       ask whether to continue with the commit process.
+
--- a/docs/chado/storage.html
+++ b/docs/chado/storage.html
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr" lang="en"><head>
+
+  
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
+    <meta name="KEYWORDS" content="Chado Data Storage">
+<meta name="robots" content="index,follow">
+<title>Chado Data Storage - SangerWiki</title>
+    
+</head><body class="ns-0">
+    <div id="globalWrapper">
+      <div id="column-content">
+	<div id="content">
+	  <a name="top" id="top"></a>
+	  	  <h1 class="firstHeading">Chado Data Storage</h1>
+	  <div id="bodyContent">
+	    <div id="contentSub"></div>
+	    	    	    <!-- start content -->
+	    <table id="toc" class="toc"><tbody><tr><td><div id="toctitle"><h2>Contents</h2> </div>
+<ul>
+<li class="toclevel-1"><a href="#Chado_Canonical_Gene"><span class="tocnumber">1</span> <span class="toctext">Chado Canonical Gene</span></a></li>
+<li class="toclevel-1"><a href="#Chado_Pseudogene"><span class="tocnumber">2</span> <span class="toctext">Chado Pseudogene</span></a></li>
+<li class="toclevel-1"><a href="#Gene_Model"><span class="tocnumber">3</span> <span class="toctext">Gene Model</span></a></li>
+<li class="toclevel-1"><a href="#Qualifier_Storage"><span class="tocnumber">4</span> <span class="toctext">Qualifier Storage</span></a>
+<ul>
+<li class="toclevel-2"><a href="#Note"><span class="tocnumber">4.1</span> <span class="toctext">Note</span></a></li>
+<li class="toclevel-2"><a href="#codon_start"><span class="tocnumber">4.2</span> <span class="toctext">codon_start</span></a></li>
+<li class="toclevel-2"><a href="#Similarity"><span class="tocnumber">4.3</span> <span class="toctext">Similarity</span></a></li>
+<li class="toclevel-2"><a href="#Controlled_Vocabulary_Qualifiers"><span class="tocnumber">4.4</span> <span class="toctext">Controlled Vocabulary Qualifiers</span></a>
+<ul>
+<li class="toclevel-3"><a href="#GO"><span class="tocnumber">4.4.1</span> <span class="toctext">GO</span></a></li>
+<li class="toclevel-3"><a href="#controlled_curation"><span class="tocnumber">4.4.2</span> <span class="toctext">controlled_curation</span></a></li>
+<li class="toclevel-3"><a href="#product"><span class="tocnumber">4.4.3</span> <span class="toctext">product</span></a></li>
+<li class="toclevel-3"><a href="#class_.28Riley_classification.29"><span class="tocnumber">4.4.4</span> <span class="toctext">class (Riley classification)</span></a></li>
+</ul>
+</li>
+<li class="toclevel-2"><a href="#Dbxref"><span class="tocnumber">4.5</span> <span class="toctext">Dbxref</span></a></li>
+<li class="toclevel-2"><a href="#EC_number"><span class="tocnumber">4.6</span> <span class="toctext">EC_number</span></a></li>
+<li class="toclevel-2"><a href="#literature"><span class="tocnumber">4.7</span> <span class="toctext">literature</span></a></li>
+<li class="toclevel-2"><a href="#Search_and_Results_files"><span class="tocnumber">4.8</span> <span class="toctext">Search and Results files</span></a>
+<ul>
+<li class="toclevel-3"><a href="#.2Fblast_file"><span class="tocnumber">4.8.1</span> <span class="toctext">/blast_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Fblastn_file"><span class="tocnumber">4.8.2</span> <span class="toctext">/blastn_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Fblastp.2Bgo_file"><span class="tocnumber">4.8.3</span> <span class="toctext">/blastp+go_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Fblastp_file"><span class="tocnumber">4.8.4</span> <span class="toctext">/blastp_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Fblastx_file"><span class="tocnumber">4.8.5</span> <span class="toctext">/blastx_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Ffasta_file"><span class="tocnumber">4.8.6</span> <span class="toctext">/fasta_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Ffastx_file"><span class="tocnumber">4.8.7</span> <span class="toctext">/fastx_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Ftblastn_file"><span class="tocnumber">4.8.8</span> <span class="toctext">/tblastn_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Ftblastx_file"><span class="tocnumber">4.8.9</span> <span class="toctext">/tblastx_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Fclustalx_file"><span class="tocnumber">4.8.10</span> <span class="toctext">/clustalx_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Fsigcleave_file"><span class="tocnumber">4.8.11</span> <span class="toctext">/sigcleave_file</span></a></li>
+<li class="toclevel-3"><a href="#.2Fpepstats_file"><span class="tocnumber">4.8.12</span> <span class="toctext">/pepstats_file</span></a></li>
+</ul>
+</li>
+<li class="toclevel-2"><a href="#Synonyms"><span class="tocnumber">4.9</span> <span class="toctext">Synonyms</span></a>
+<ul>
+<li class="toclevel-3"><a href="#.2Freserved_name"><span class="tocnumber">4.9.1</span> <span class="toctext">/reserved_name</span></a></li>
+<li class="toclevel-3"><a href="#.2Fsynonym"><span class="tocnumber">4.9.2</span> <span class="toctext">/synonym</span></a></li>
+<li class="toclevel-3"><a href="#.2Fprimary_name"><span class="tocnumber">4.9.3</span> <span class="toctext">/primary_name</span></a></li>
+<li class="toclevel-3"><a href="#.2Fprotein_name"><span class="tocnumber">4.9.4</span> <span class="toctext">/protein_name</span></a></li>
+<li class="toclevel-3"><a href="#.2Fsystematic_id"><span class="tocnumber">4.9.5</span> <span class="toctext">/systematic_id</span></a></li>
+<li class="toclevel-3"><a href="#.2Ftemporary_systematic_id"><span class="tocnumber">4.9.6</span> <span class="toctext">/temporary_systematic_id</span></a></li>
+</ul>
+</li>
+<li class="toclevel-2"><a href="#colour"><span class="tocnumber">4.10</span> <span class="toctext">colour</span></a></li>
+<li class="toclevel-2"><a href="#ortholog.2Fparalog.2Fcluster"><span class="tocnumber">4.11</span> <span class="toctext">ortholog/paralog/cluster</span></a></li>
+</ul>
+</li>
+</ul>
+</td></tr></tbody></table>
+<p><script type="text/javascript"> if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); } </script>
+</p>
+<a name="Chado_Canonical_Gene"></a><h2> Chado Canonical Gene </h2>
+<pre>gene
+|
+|- part_of mRNA
+     |
+     |---- part_of exon
+     |
+     |---- derives_from polypeptide
+</pre>
+<a name="Chado_Pseudogene"></a><h2> Chado Pseudogene </h2>
+<pre>pseudogene
+|
+|- part_of pseudogenic_transcript
+     |
+     |---- part_of pseudogenic_exon
+     |
+     |---- derives_from polypeptide
+</pre>
+<a name="Gene_Model"></a><h2> Gene Model </h2>
+<p><a href="Chado_gene_model.gif" class="image" title="Image:Chado_gene_model.gif"><img src="Chado_gene_model.gif" alt="Image:Chado_gene_model.gif" height="540" width="720"></a>
+</p>
+<a name="Qualifier_Storage"></a><h2> Qualifier Storage </h2>
+<a name="Note"></a><h3> Note </h3>
+<p>-stored as a FeatureProp with CvTerm = comment
+</p>
+<a name="codon_start"></a><h3> codon_start </h3>
+<p>This is loaded as phase in the FeatureLoc table.
+</p><p>phase = 0 =&gt; codon_start = 1;<br>
+phase = 1 =&gt; codon_start = 2;<br>
+phase = 2 =&gt; codon_start = 3
+</p>
+<a name="Similarity"></a><h3> Similarity </h3>
+<p><i>e.g.</i>: /similarity="fasta; SWALL:O85168
+(EMBL:AF047828);Pseudomonas syringae; syringomycin synthetase;
+syrE;length 9376 aa; id=31.93%; ungapped id=35.04%;E()=1.5e-105;&nbsp;;
+6198 aa overlap; query 36-6020 aa; subject 2593-8452 aa"
+</p>
+<pre>                             analysis ------ fasta
+                                 |
+                                 |
+                                 |
+                          analysisfeature ---- raw score (null), evalue, id
+                                 |
+                                 |
+                                 |          |---featureprop---ungapped id (35.04)
+                                 |          |
+                            Matchfeature ---| 
+                             /   |          |
+                            /    |          |---featureprop---overlap (6198)
+rank=0                     /     |
+(srcfeature_id=product    /      |
+FeatureId)               /       |
+(subject 2593-8452)featureloc  featureloc (query 36-6020) srcfeature_id=queryFeatureId rank=1
+                |                |
+                |                |
+  featuredbxref |                |
+     (AF04782)  |                |
+            \   |                |
+             \  |                |
+              \ |                |
+               \|                |
+(dbxref=O85168) |                |
+(seqlen=9376)feature          feature (polypeptide if protein match |   transcript if nucleotide match)
+               /|\          
+              / | \   
+             /  |  \    
+            /   |   \
+           /    |    \
+          /     |     \       
+         / featureprop \       
+ featureprop    |     featureprop
+     |          |           |  
+     |       product        |  
+     | (syringomycin s)     |
+     |                      |
+     |                      | 
+ organism                  gene(syrE) 
+(Pseudomonas syringae)
+
+</pre>
+<p>N.B. For now the match feature is entered as CvTerm = '<b>region'</b>.
+The Cv '<b>genedb_misc'</b> is used for Cvterms like 'ungapped id'  found in /similarity.
+</p>
+<a name="Controlled_Vocabulary_Qualifiers"></a><h3> Controlled Vocabulary Qualifiers </h3>
+<p>These qualifiers are all FeatureCvTerm's.
+</p>
+<a name="GO"></a><h4> GO </h4>
+<p><br>
+/GO="aspect=;GOid=;term=;qualifer=;evidence='db_xref=;with=;date="
+</p>
+<p>GO annotation can be attached at different levels of the heirarchy.
+The GeneDB loader attaches it by default to the polypeptide as that
+seems to be the most typical case.
+</p>
+<p>Each GO entry has a CvTerm and a DbXRef associated with it. The GO
+term should be looked up by its DbXRef i.e. GO:123456, to get the
+correspontding CvTerm. A FeatureCvTerm links this CvTerm to the
+Feature. The FeatureCvTerm may well exist so needs to be looked up. The
+qualifier NOT is treated specially, as a field in FeatureCvTerm,
+becauses it reverses the meaning of the assignment, rather than adding
+more details as most qualifiers do. The FeatureCvTerm may have a number
+of associated FeatureCvTermProp's. This is general storage for the GO
+evidence code, extra qualifiers and the date of the assignment. (A hack
+for the evidence code would be possible, using a CvTerm to represent
+key and evidence code, but it wouldn't work for the date). One or more
+FeatureCvTermDbXRef's can be associated with the FeatureCvTerm which
+corresponds to the WITH/FROM column in GO. The dbxref value in this
+case correspond to publications, so the primary Pub is linked to the
+FeatureCvTerm.pub_id. One or more FeatureCvTermPub's can be associated
+with the FeatureCvTerm which corresponds to any ID's after the pipe
+symbol in the publication column.
+</p>
+<a name="controlled_curation"></a><h4> controlled_curation </h4>
+<p>/controlled_curation="term=;cv=; qualifier=;evidence=;db_xref=;residue=; attribution=;date="
+</p><p>Storage is similar to GO. The db_xref is stored either as a FeatureCvTermDbXref or a FeatureCvTermPub:
+</p>
+<ol><li> if the value is a PMID:12345 then it is stored in the pub
+table. a dummy dbxref is created with the 'accession' = 12345. a
+pubdbxref is created to link the pub with the dbxref.
+</li><li> if the value is other database like UNIPROT:23456 then it is
+stored in the dbxref table with accession=23456 and an entry is also
+created in the feature_cvterm_dbxref table to link the featurecvterm
+and the dbxref
+</li></ol>
+<a name="product"></a><h4> product </h4>
+<p>This is stored as a FeatureCvTerm with a CvTerm from the '<b>genedb_products'</b> Cv.
+</p>
+<a name="class_.28Riley_classification.29"></a><h4> class (Riley classification) </h4>
+<p><i>e.g.</i> /class=6.2.2
+</p>
+<p>
+These are linked to the Feature as below:
+
+</p><pre>              Feature
+                 |
+           FeatureCvTerm--name='anti sigma factor' 
+                 |
+              CvTerm
+                 |
+             -----------
+             |         |
+      RILEY--Cv       DbXRef--accession=6.2.2
+                       |
+                       Db-name=RILEY
+</pre>
+
+<a name="Dbxref"></a><h3> Dbxref </h3>
+<p>- stored as a FeatureDbXRef
+</p>
+<a name="EC_number"></a><h3> EC_number </h3>
+<p>- stored as a FeatureProp
+</p>
+<a name="literature"></a><h3> literature </h3>
+<p>- stored as FeaturePub
+</p>
+<a name="Search_and_Results_files"></a><h3> Search and Results files </h3>
+<p>The following are stored as FeatureProp's:<br>
+</p>
+<a name=".2Fblast_file"></a><h4> /blast_file </h4>
+<a name=".2Fblastn_file"></a><h4> /blastn_file </h4>
+<a name=".2Fblastp.2Bgo_file"></a><h4> /blastp+go_file </h4>
+<a name=".2Fblastp_file"></a><h4> /blastp_file </h4>
+<a name=".2Fblastx_file"></a><h4> /blastx_file </h4>
+<a name=".2Ffasta_file"></a><h4> /fasta_file </h4>
+<a name=".2Ffastx_file"></a><h4> /fastx_file </h4>
+<a name=".2Ftblastn_file"></a><h4> /tblastn_file </h4>
+<a name=".2Ftblastx_file"></a><h4> /tblastx_file </h4>
+<a name=".2Fclustalx_file"></a><h4> /clustalx_file </h4>
+<a name=".2Fsigcleave_file"></a><h4> /sigcleave_file </h4>
+<a name=".2Fpepstats_file"></a><h4> /pepstats_file </h4>
+<a name="Synonyms"></a><h3> Synonyms </h3>
+<p>The following qualifiers are loaded in the Synonym table:<br>
+</p>
+<a name=".2Freserved_name"></a><h4> /reserved_name </h4>
+<a name=".2Fsynonym"></a><h4> /synonym </h4>
+<a name=".2Fprimary_name"></a><h4> /primary_name </h4>
+<a name=".2Fprotein_name"></a><h4> /protein_name </h4>
+<a name=".2Fsystematic_id"></a><h4> /systematic_id </h4>
+<a name=".2Ftemporary_systematic_id"></a><h4> /temporary_systematic_id </h4>
+<p>and these are linked to the Feature via FeatureSynonym's. These
+Synonym's are in the '<b>genedb_synonym_type'</b> Cv table.
+FeatureSynonym.is_current is used to store previous/obsolete
+synonyms.
+</p>
+<a name="colour"></a><h3> colour </h3>
+<p>Presumably a FeatureProp (at least for now). Additional qualifiers
+are being preposed status (containing information about functional
+annotation and whether the annotation is manual or automatic) and
+evidence. These are likely to be FeatureProp's.
+</p>
+<a name="ortholog.2Fparalog.2Fcluster"></a><h3> ortholog/paralog/cluster </h3>
+<p>Orthologue/paralogues cluster are stored in a similar way to <a href="#Similarity" title="">/similarity</a>. 
+As input we have:
+<br>a) manually curated orthologues which simply list other genes' 
+systematic ids and the relationship type
+<br>b) auto-generated clusters of genes which also have associated data like clustering method, cut-off/score etc.
+<br>However, they features are linked to each other by
+feature_relationship's (rather than featureloc which are used with
+/similarity). The feature_relationship's are given the type_id =
+'orthologous_to' or 'paralogous_to'. For manually curated
+ortholog/paralog data the analysisfeature and analysis are not required
+and are not added.
+</p>
+<pre>analysis
+ |
+ + analysisfeature
+    |
+    feature (type_id == protein_match)
+      |
+ +----+-------+------------+
+ |            |            |
+ |            |            |
+feature1    feature2    feature3
+gene        gene        gene
+
+</pre>
+<p>The bottom links are feature_relationships of SO type orthologous_to.
+</p>
+
+      <div id="footer">
+      </div>
+    </div>
+  </body></html>