Skip to content
Snippets Groups Projects
Commit b52fa99c authored by tjc's avatar tjc
Browse files

chado doc

git-svn-id: svn+ssh://svn.internal.sanger.ac.uk/repos/svn/pathsoft/artemis/trunk@9762 ee4ac58c-ac51-4696-9907-e4b3aa274f04
parent 071715df
No related branches found
No related tags found
No related merge requests found
<!--#set var="banner" value="Artemis - Chado Admin/Developer Documentation"-->
<!--#include virtual="/perl/header"-->
<p>
This page describes some of the development background and admin of
using and setting up an Artemis and ACT connection with a Chado database.
<ul>
<li><a href="#DATABASEMANAGER">Opening the Database Manager</a>
<li><a href="#OPENART">Opening the main Artemis/ACT window</a>
<li><a href="#CONFIG">Option File Configuration</a>
<li><a href="#GENEBUILDER">Opening the Standalone Gene Builder</a>
</ul>
<h3><a name="DATABASEMANAGER">Opening the Database Manager</a></h3>
<p>
To open the Artemis Database Manager panel (from which the browser is launched),
Artemis looks initially for the existence of the cvterm.name = 'top_level_seq' which
belongs to cv.name = 'genedb_misc'. If these exist it follows method A:
<ol type="A">
<li>
-call 'getTopLevelOrganisms' (in Organism.xml mapping). This relies on the the source
features (e.g. chromosome) having a featureprop with a type_id corresponding to 'top_level_seq'.
<p>
If the 'top_level_seq' is not implemented in the database it then follows method B:
<p>
</li>
<li>
-call 'getOrganismsContainingSrcFeatures' (in Organism.xml mapping). This searches for those
organisms that contain sequences with residues and have a type_id that
corresponds to a cvterm name that matches:
<p>
*chromosome*, *sequence*, supercontig, ultra_scaffold, golden_path_region, or contig
<p>
</li>
</ol>
When the organisms with the source feature have been identified these are displayed. When
a user clicks on an organism it then builds the types (e.g. chromosome, contig) of source
feature and the underlying features that have residues (getResidueFeatures in Feature.xml).
<p>
<h3><a name="OPENART">Opening the main Artemis/ACT window</a></h3>
<p>
The organismprop's are loaded lazily when a sequence is opened. If an organismprop
is of type 'translationTable' the value of the organismprop is then used as the
translation table when Artemis opens a sequence from that organism.
<p>
When a sequence is double clicked to open it in Artemis, most things for that sequence
are read from the database. The iBatis statement calls made when reading an entry are
summarised below.
<p>
<table border=1 cellspacing=1 cellpadding=2>
<tr><td>Statement ID</td><td>SQL Mapping File</td><td>Description</td></tr>
<tr><td>getFeature </td><td>(Feature.xml)</td>
<td>Retrieves all the features and their featureloc's, featureprop's, feature_relationship's and primary dbxref</td></tr>
<tr><td>getFeatureDbXRefsBySrcFeature </td><td>(FeatureDbXRef.xml)</td>
<td>Retrieves all secondary dbxref's</td></tr>
<tr><td>getFeatureSynonymsBySrcFeature </td><td>(FeatureSynonym.xml)</td>
<td>Retrieves feature synonyms</td></tr>
<tr><td>getFeatureCvTermsBySrcFeature </td><td>(FeatureCvTerm.xml)</td>
<td>Retrieves feature_cvterm's, feature_cvtermprop (evidence code, extra qualifiers, date).</td></tr>
<tr><td>getFeatureCvTermDbXRefBySrcFeature </td><td>(FeatureCvTermDbXRef.xml)</td>
<td>Retrieves feature_cvterm_dbxref (WITH/FROM column).</td></tr>
<tr><td>getFeatureCvTermPubBySrcFeature </td><td>(FeatureCvTermPub.xml)</td>
<td>Retrieves feature_cvterm_pub's.</td></tr>
</table>
<p>
Artemis constructs an internal GFF3 stream from these calls for the selected sequence.
This is then read in the same way as a GFF3 file as an Artemis DatabaseDocumentEntry
(which extends GFFDocumentEntry) and creating GFFStreamFeatures.
<p>
If the lazy load option is selected from the Database Manager's File menu, then only
getFeature is called. The resulting GFFStreamFeature object is marked as lazy loading
and FeatureDbXRefs, FeatureSynonyms, FeatureCvTerms, FeatureCvTermDbXRefs
and FeatureCvTermPubs are read from the database for a feature when the Gene Builder is opened.
<p>
The feature_relationship (from getFeature) is used to create the gene hierarchy; 'part_of'a
and 'derives_from' relationships become Parent and Derives_from in GFF3 terms. If the
feature_relationship type_id does not correspond to one of these terms (derives_from,
part_of, proper_part_of, partof, producedby) then the object_id is recorded as a qualifer
value. This is used to read orthologous_to and paralogous_to relations. The qualifier
values for these are lazily stored (as ClusterLazyQualifierValue.java). When Artemis
displays these qualifiers in the Gene Builder it then queries the database further to
list the related genes.
<p>
Other properties that have a featureloc association with a feature
are found by calling getLazySimilarityMatches (Feature.xml). Artemis then
constructs lazy loading qualifiers (QualifierLazyLoading.java) from this that query
the database further only when that qualifier is needed. This is used for
blast/fasta similarity and polypeptide_domains.
<p>
The gene hierarchy is stored internally by the ChadoCanonicalGene.java object and is based
on the Parent/Derives_from relationships. It stores the related children of the gene.
The spliced features (exon, pseudogenic_exon) are combined into a single Artemis
Feature. The joined exons become an Artemis CDS feature (GFFStreamFeature), which stores
the uniquenames of the original exons in the database.
<p>
<h3><a name="CONFIG">Option File Configuration</a></h3>
<p>
Artemis combines the exons stored in chado and describes it as an 'exon-model'
feature by default (defined by uk.ac.sanger.artemis.util.DatabaseDocument.EXONMODEL).
The chado_exon_model flag in the options file allows this to be changed.
<p>
When a gene model is created in Artemis it creates the transcript as a 'mRNA'
feature by default (defined by uk.ac.sanger.artemis.util.DatabaseDocument.TRANSCRIPT).
The chado_transcript flag in the options file allows this to be changed.
<p>
A list of available databases can be configured in the options file (chado_servers flag).
For each an alias is given followed by its location (host:port/database?user), each alias
is displayed in a drop down menu in the login box.
<p>
Below is an example configuration option:
<pre>
# chado gene model features default types
chado_exon_model=CDS
#chado_transcript=transcript
# provide a list of available servers
chado_servers = \
test localhost:5432/test?userName \
genedb_ro db.genedb.org:5432/snapshot?genedb_ro
</pre>
<a name="GENEBUILDER"></a><h3>Opening the Standalone Gene Builder</h3>
<p>
The Gene Builder can be launched on its own without opening up Artemis. The following
opens up a window which lets you type in a gene name to be opened:
<pre>
java -mx500m -Dibatis -Dchado="localhost:5432/database?" \
-Djdbc.drivers=org.postgresql.Driver -classpath artemis.jar \
uk.ac.sanger.artemis.components.genebuilder.GeneEdit
</pre>
Alternatively the gene name can be given as an argument:
<pre>
java -mx500m -Dibatis -Dchado="db.genedb.org:5432/snapshot?genedb_ro" \
-Djdbc.drivers=org.postgresql.Driver -Dshow_log -Dread_only \
-classpath jar_build/artemis.jar:etc
uk.ac.sanger.artemis.components.genebuilder.GeneEdit PFA0010c
</pre>
<!--#set var="banner" value="Artemis Connecting to Chado Databases"-->
<!--#include virtual="/perl/header"-->
<a href="http://www.sanger.ac.uk/Software/Artemis/"><b>Artemis</b></a> and
<a href="http://www.sanger.ac.uk/Software/ACT/"><b>ACT</b></a> can be used
to connect to <a href="http://www.gmod.org/"><b>Chado</b></a> databases.
They are being developed to read and write to the database and perform the
same functions as the standard Artemis and ACT.
<p>
An example read-only database can be found
<a href="/Software/Artemis/databases/">here</a>.
<p>
<h3>Online documentation</h3>
<p>
<ul>
<li><a href="overview.shtml">Overview Document</a></li>
<li><a href="admin.shtml">Admin Document</a></li>
<li><a href="storage.html">Data storage</a> (for the Pathogen Group)</li>
</ul>
<p>
<!--#set var="pubmed_tabulate" value="yes"-->
<!--#set var="pubmed_ids" value="18845581"-->
<!--#include virtual="/perl/utils/pubmedalizer"-->
<!--#set var="banner" value="Artemis - Chado Overview"-->
<!--#include virtual="/perl/header"-->
<p>This overview covers:
<ul>
<li><a href="#CONNECT">Connecting to a Chado Database</a>
<li><a href="#READ">Reading From the Database</a>
<li><a href="#IBATIS">iBatis Database Mapping</a>
<li><a href="#GENE">Gene Representation</a>
<li><a href="#GENEBUILDING">Gene Building</a>
<li><a href="#MERGE+SPLIT">Gene merging and splitting</a>
<li><a href="#WRITE">Writing To The Database</a>
<li><a href="#GENEBUILDER">Opening the Standalone Gene Builder</a>
<li><a href="#COMMUNITY+ANNOTATION">Community Annotation</a>
</ul>
<a NAME="CONNECT"></a><h2>Connecting to a Chado Database</h2>
The following java flags are used when running Artemis when connecting to a
database. These options currently are all needed.
<ol>
<li><b><pre>-Dchado</pre></b>
this is used to get Artemis to look for the database.
The address of the database (hostname, port and name) can be conveniently
included as follows:
<br><pre>-Dchado="hostname:port/test?username"</pre>
So that these details are already completed in the popup login pane.
<br><br>
<img src="login.gif" align="middle" alt="login"/>
<br>
</li>
<li><b><pre>-Djdbc.drivers=org.postgresql</pre></b>
this is used to define the <a href="http://jdbc.postgresql.org/">
JDBC postgres driver
</a>.
</li>
<li><b><pre>-Dibatis</pre></b>
use the <a href="http://ibatis.apache.org/" title="iBatis">iBATIS</a>
Data Mapper
</li>
</ol>
<br>
So the command line will look something like this example:
<pre> ./art -Dchado="localhost:2996/test?tjc" -Dibatis \
-Djdbc.drivers=org.postgresql.Driver</pre>
<a NAME="READ"></a><h2>Reading From the Database</h2>
On a successful login a database and file manager window will open up.
The database manager will display "Database Loading...". The organisms
in the database with residues are shown in a expandable tree. Double
clicking on the sequence names opens them up in Artemis.
<br><br>
<img src="databasemanager.gif"/>
<p>
A sequence can be opened in Artemis from the command line (without going
through the database manager). This is done by supplying a command line argument
with the organism and chromosome (or source feature):
<br><pre>Pfalciparum:Pf3D7_09</pre>
and optionally a range can be included to just display features within it:
<br><pre>Pfalciparum:Pf3D7_09:92000..112000</pre>
this could be used in combination with the <i>-Doffset=base</i> flag (<i>e.g.
-Doffset=10000</i>) to open Artemis at a particular section of a sequence
<p>
To reduce the number of transactions to the database, all of the sequence is
read into Artemis. This includes most of the feature qualifiers. There are some
qualifiers (ortho/paralog and similarity qualifiers) that lazily load their data
as and when it is needed, <i>i.e.</i> when opened for viewing in the gene builder.
This lazy loading improves the performance of reading data from the database
for sequences with a large number of features.
<a NAME="IBATIS"></a><h2>iBatis Database Mapping</h2>
<a href="http://ibatis.apache.org/" title="iBatis">iBatis</a> data mapper
framework has been used to facilitate the communication with the database
from Artemis. It uses XML descriptors to couple the SQL statements with the
Java objects that Artemis understands. The XML maps are in the '<i>artemis_sqlmap</i>'
in the Artemis distribution. These are divided up into files based on the
Chado table names.
<p>
The SQL statements can be seen in the Artemis <a href=
"http://www.sanger.ac.uk/Software/Artemis/manual/launch-window.html#LAUNCH-WINDOW-OPTIONS-SHOW-LOG">
Log Viewer</a> window:<br>
<img src="logviewer.gif" width="100%"/>
<br><br>This is mainly useful for debugging and tracking problems with reading
from and writing to the database. Artemis uses
<a href="http://logging.apache.org/log4j/">log4j</a> to produce logging
and the configuration file for this is in the file '<i>etc/log4j.properties</i>'.
<a NAME="GENE"></a><h2>Gene Representation</h2>
Below is an illustration of how the feature are stored in Chado
in the Sanger PSU.
<p>
<center><b><i>Gene Model</i></b>
<br>
<img src="chado_gene_model.gif"/></center>
<p>The names (in red) are the internal database uniquenames. These names are
automatically generated by the gene builder from an ID provided by the
user. <i>N.B.</i> in our data model UTRs are represented as distinct from
exons.
<a NAME="GENEBUILDING"></a><h3>Gene Building</h3>
A gene can be created in Artemis (or ACT) by highlighting a base range and selecting
from the '<i>Create</i>' menu the '<i>Gene Model From Base Range</i>' option.
This prompts for a unique ID and this corresponds to the names in the above
gene model representation. The basic constituent features are created; <i>i.e.</i>
gene, transcript, CDS and polypeptide. <i>N.B.</i> Artemis joins the exon
features and represents them as a CDS feature. These are shown on the frame
lines in the feature display window.
<p>
A gene builder for a selected gene feature can be opened from the '<i>Edit</i>' menu
by selecting the '<i>Selected Feature in Editor</i>' option or simply using the '<i>E</i>'
shortcut key.
<center><p><b><i>The Artemis Gene Builder</i></b><br>
<img src="editor.gif"></center>
<p>There are two distinct parts to the gene builder window. The top part shows
the <b><i>gene hierarchy and structure</i></b>. The bottom part shows the
<b><i>annotation</i></b> associated with one of the constituent features.
These two parts of the gene builder are described below.
<ol>
<li><b>Gene Hierarchy and Structure</b><br>
The top left hand side is a tree structure of the gene model. To the right
of this is a graphical representation of the features. A feature can be selected
from either the tree or the graphical view. The annotation for the selected
feature is displayed in the bottom part of the gene builder.
<p>Structural changes can be carried out in the graphical view. The feature ends
can be dragged to adjust their coordinates. On right clicking on this area there
is a popup menu for adding and deleting features in the gene model.
<center><p><b><i>Editing the Gene Model In the Gene Builder</i></b><br>
<img src="genebuilder2.gif" border="1"></center>
<p>
Additional transcripts can be added from here. The checkbox to the right of
the above CDS is used to hide and show the associated CDS in the Artemis
feature display. This can make structural edits clearer for multiple transcripts.
<p>
<li><b>Annotation</b><br>
There are 4 (Properties, Core, Controlled Vocabulary and Match) sections in
the annotation part of the gene builder. These are described below. These can
be viewed in a scrollable view or in a tabbed view. There is a check box at
the bottom of the gene builder to change between these views.
<p>
<ul>
<li><b>Properties</b><br>
This contains properties such as the synonyms, time last modified and the
internal ID. Synonyms are added as a controlled vocabulary (these are in
a cv named '<i>genedb_synonym_type</i>').
<center><p><b><i>Properties section</i></b><br>
<img src="Properties.gif" border="1"></center>
</li><p>
<li><b>Core</b><br>
The core annotation contains any other annotation that does not fit into the
other sections. <i>E.g.</i> comments, literature, Dbxref. Hyperlinks are
provided for SWALL, EMBL, UniProt, PMID, PubMed, InterPro and Pfam, and opening
up a local browser.
</li><p>
<li><b>Controlled Vocabulary (CV)</b><br>
The CV module in Chado is concerned with controlled vocabularies or ontologies.
Therefore, Chado can use the biological ontologies and this makes it very
expressive.
<p>
This section in the gene builder provides a form for adding and deleting GO,
controlled curation, product, Riley class annotation. CV terms are added by
clicking the 'ADD' button. When adding a term to a feature the user is
prompted for the CV name and then keyword. The term to be added is then
selected from a drop down list of terms containing the word or phrase.
To further assist in finding the CV term from the list, typing in the
text will start to autocomplete and scroll to the first matching term.
<center><p><b><i>CV section</i></b><br>
<img src="CV.gif" border="1"></center>
<p>GO terms are selected from molecular_function, biological_process
or cellular_component CV's.
<p>Products are stored in Chado as a CV (<i>i.e.</i> in cvterm in
a cv named '<i>genedb_products</i>').
<p>Other generic controlled curations can be found by Artemis and shown
if their CV name in Chado is prefixed with '<i>CC_</i>' (<i>e.g.</i>
CC_controlledcuration, CC_workshop). These then appear in a drop down
list when adding CV terms to a feature.
<p>Adding new terms to the database can also be done from this section.
In the drop down selection of CV's there is an 'Add term...' option.
This opens an input panel for new terms.
<center><p><b><i>Adding a new CV term</i></b><br>
<img src="addterm.gif"></center>
</li><p>
<li><b>Match</b><br>
This section allows the user to add ortholog/paralog links to other genes
in the database.
<p>
The ortholog/paralog tables provide links for opening the gene editor or
an Artemis window for each entry. The '<i>VIEW</i>' button opens a
separate Artemis displaying the gene ortholog or paralog and the
surrounding features.
<p>
In addition similarity qualifiers can be added here from matches to
blast and fasta searches carried out in Artemis. These are added
from the Artemis Object Editor.
</li>
</ul>
</ol>
<a NAME="MERGE+SPLIT"></a><h3>Gene merging and splitting</h3>
To merge gene models, select the CDS segments that are to be merged. Then use
the menu option:
<p>
<pre>Edit->Selected Feature(s)->Merge</pre>
<p>
The annotation and names from the segment first selected are maintained and
the CDS features from the second gene model are added to the first selected gene model.
The second gene model is deleted automatically.
<p>
To unmerge (split) the gene model into two gene models consecutive segments
in the CDS are selected. This is done by clicking on the first segment and
then pressing SHIFT and clicking on the second segment. Then use the menu option:
<p>
<pre>Edit->Selected Feature(s)->Unmerge</pre>
<p>
On unmerging the annotation and synonyms are maintained in both gene models.
The second gene model component features are given a new internal ID (uniquename)
based on the original and prefixed with DUP1-.
<a NAME="WRITE"></a><h3>Writing To The Database</h3>
When a feature or qualifier is changed, added or deleted the '<i>Commit</i>' button (on
the top tool bar) changes colour to red. Changes in Artemis only get written back to the
database when this button is clicked.
<center><p><b><i>Commit Button</i></b><br>
<img src="commit.gif" border="1"></center>
<p>
There is also an option under the '<i>File</i>' menu to '<i>Commit To
Database</i>'. Note in ACT there is no commit button and the '<i>Commit To
Database</i>' menu option is used to write back to the database.
<p>
If there is an error during the commit then Artemis will provide the option to
force commit. This means it will commit what it can. Naturally this can be potentially
problematic. Therefore, <b>committing back to the database frequently is encouraged</b>.
Any errors are reported in the log viewer.
<a NAME="COMMUNITY+ANNOTATION"></a><h3>Community Annotation</h3>
Multiple users can launch Artemis and query the database. This has been stress
tested and used in the malaria re-annotation exercise with 30+ Artemis clients
connecting to the database.
<p>
Artemis records the time a features was last modified (<i>timelastmodified</i>). Before
changing a feature it will check this time stamp against the database record of the
<i>timelastmodified</i>.
If the corresponding feature in the database has changed by another user it will
ask whether to continue with the commit process.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr" lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="KEYWORDS" content="Chado Data Storage">
<meta name="robots" content="index,follow">
<title>Chado Data Storage - SangerWiki</title>
</head><body class="ns-0">
<div id="globalWrapper">
<div id="column-content">
<div id="content">
<a name="top" id="top"></a>
<h1 class="firstHeading">Chado Data Storage</h1>
<div id="bodyContent">
<div id="contentSub"></div>
<!-- start content -->
<table id="toc" class="toc"><tbody><tr><td><div id="toctitle"><h2>Contents</h2> </div>
<ul>
<li class="toclevel-1"><a href="#Chado_Canonical_Gene"><span class="tocnumber">1</span> <span class="toctext">Chado Canonical Gene</span></a></li>
<li class="toclevel-1"><a href="#Chado_Pseudogene"><span class="tocnumber">2</span> <span class="toctext">Chado Pseudogene</span></a></li>
<li class="toclevel-1"><a href="#Gene_Model"><span class="tocnumber">3</span> <span class="toctext">Gene Model</span></a></li>
<li class="toclevel-1"><a href="#Qualifier_Storage"><span class="tocnumber">4</span> <span class="toctext">Qualifier Storage</span></a>
<ul>
<li class="toclevel-2"><a href="#Note"><span class="tocnumber">4.1</span> <span class="toctext">Note</span></a></li>
<li class="toclevel-2"><a href="#codon_start"><span class="tocnumber">4.2</span> <span class="toctext">codon_start</span></a></li>
<li class="toclevel-2"><a href="#Similarity"><span class="tocnumber">4.3</span> <span class="toctext">Similarity</span></a></li>
<li class="toclevel-2"><a href="#Controlled_Vocabulary_Qualifiers"><span class="tocnumber">4.4</span> <span class="toctext">Controlled Vocabulary Qualifiers</span></a>
<ul>
<li class="toclevel-3"><a href="#GO"><span class="tocnumber">4.4.1</span> <span class="toctext">GO</span></a></li>
<li class="toclevel-3"><a href="#controlled_curation"><span class="tocnumber">4.4.2</span> <span class="toctext">controlled_curation</span></a></li>
<li class="toclevel-3"><a href="#product"><span class="tocnumber">4.4.3</span> <span class="toctext">product</span></a></li>
<li class="toclevel-3"><a href="#class_.28Riley_classification.29"><span class="tocnumber">4.4.4</span> <span class="toctext">class (Riley classification)</span></a></li>
</ul>
</li>
<li class="toclevel-2"><a href="#Dbxref"><span class="tocnumber">4.5</span> <span class="toctext">Dbxref</span></a></li>
<li class="toclevel-2"><a href="#EC_number"><span class="tocnumber">4.6</span> <span class="toctext">EC_number</span></a></li>
<li class="toclevel-2"><a href="#literature"><span class="tocnumber">4.7</span> <span class="toctext">literature</span></a></li>
<li class="toclevel-2"><a href="#Search_and_Results_files"><span class="tocnumber">4.8</span> <span class="toctext">Search and Results files</span></a>
<ul>
<li class="toclevel-3"><a href="#.2Fblast_file"><span class="tocnumber">4.8.1</span> <span class="toctext">/blast_file</span></a></li>
<li class="toclevel-3"><a href="#.2Fblastn_file"><span class="tocnumber">4.8.2</span> <span class="toctext">/blastn_file</span></a></li>
<li class="toclevel-3"><a href="#.2Fblastp.2Bgo_file"><span class="tocnumber">4.8.3</span> <span class="toctext">/blastp+go_file</span></a></li>
<li class="toclevel-3"><a href="#.2Fblastp_file"><span class="tocnumber">4.8.4</span> <span class="toctext">/blastp_file</span></a></li>
<li class="toclevel-3"><a href="#.2Fblastx_file"><span class="tocnumber">4.8.5</span> <span class="toctext">/blastx_file</span></a></li>
<li class="toclevel-3"><a href="#.2Ffasta_file"><span class="tocnumber">4.8.6</span> <span class="toctext">/fasta_file</span></a></li>
<li class="toclevel-3"><a href="#.2Ffastx_file"><span class="tocnumber">4.8.7</span> <span class="toctext">/fastx_file</span></a></li>
<li class="toclevel-3"><a href="#.2Ftblastn_file"><span class="tocnumber">4.8.8</span> <span class="toctext">/tblastn_file</span></a></li>
<li class="toclevel-3"><a href="#.2Ftblastx_file"><span class="tocnumber">4.8.9</span> <span class="toctext">/tblastx_file</span></a></li>
<li class="toclevel-3"><a href="#.2Fclustalx_file"><span class="tocnumber">4.8.10</span> <span class="toctext">/clustalx_file</span></a></li>
<li class="toclevel-3"><a href="#.2Fsigcleave_file"><span class="tocnumber">4.8.11</span> <span class="toctext">/sigcleave_file</span></a></li>
<li class="toclevel-3"><a href="#.2Fpepstats_file"><span class="tocnumber">4.8.12</span> <span class="toctext">/pepstats_file</span></a></li>
</ul>
</li>
<li class="toclevel-2"><a href="#Synonyms"><span class="tocnumber">4.9</span> <span class="toctext">Synonyms</span></a>
<ul>
<li class="toclevel-3"><a href="#.2Freserved_name"><span class="tocnumber">4.9.1</span> <span class="toctext">/reserved_name</span></a></li>
<li class="toclevel-3"><a href="#.2Fsynonym"><span class="tocnumber">4.9.2</span> <span class="toctext">/synonym</span></a></li>
<li class="toclevel-3"><a href="#.2Fprimary_name"><span class="tocnumber">4.9.3</span> <span class="toctext">/primary_name</span></a></li>
<li class="toclevel-3"><a href="#.2Fprotein_name"><span class="tocnumber">4.9.4</span> <span class="toctext">/protein_name</span></a></li>
<li class="toclevel-3"><a href="#.2Fsystematic_id"><span class="tocnumber">4.9.5</span> <span class="toctext">/systematic_id</span></a></li>
<li class="toclevel-3"><a href="#.2Ftemporary_systematic_id"><span class="tocnumber">4.9.6</span> <span class="toctext">/temporary_systematic_id</span></a></li>
</ul>
</li>
<li class="toclevel-2"><a href="#colour"><span class="tocnumber">4.10</span> <span class="toctext">colour</span></a></li>
<li class="toclevel-2"><a href="#ortholog.2Fparalog.2Fcluster"><span class="tocnumber">4.11</span> <span class="toctext">ortholog/paralog/cluster</span></a></li>
</ul>
</li>
</ul>
</td></tr></tbody></table>
<p><script type="text/javascript"> if (window.showTocToggle) { var tocShowText = "show"; var tocHideText = "hide"; showTocToggle(); } </script>
</p>
<a name="Chado_Canonical_Gene"></a><h2> Chado Canonical Gene </h2>
<pre>gene
|
|- part_of mRNA
|
|---- part_of exon
|
|---- derives_from polypeptide
</pre>
<a name="Chado_Pseudogene"></a><h2> Chado Pseudogene </h2>
<pre>pseudogene
|
|- part_of pseudogenic_transcript
|
|---- part_of pseudogenic_exon
|
|---- derives_from polypeptide
</pre>
<a name="Gene_Model"></a><h2> Gene Model </h2>
<p><a href="Chado_gene_model.gif" class="image" title="Image:Chado_gene_model.gif"><img src="Chado_gene_model.gif" alt="Image:Chado_gene_model.gif" height="540" width="720"></a>
</p>
<a name="Qualifier_Storage"></a><h2> Qualifier Storage </h2>
<a name="Note"></a><h3> Note </h3>
<p>-stored as a FeatureProp with CvTerm = comment
</p>
<a name="codon_start"></a><h3> codon_start </h3>
<p>This is loaded as phase in the FeatureLoc table.
</p><p>phase = 0 =&gt; codon_start = 1;<br>
phase = 1 =&gt; codon_start = 2;<br>
phase = 2 =&gt; codon_start = 3
</p>
<a name="Similarity"></a><h3> Similarity </h3>
<p><i>e.g.</i>: /similarity="fasta; SWALL:O85168
(EMBL:AF047828);Pseudomonas syringae; syringomycin synthetase;
syrE;length 9376 aa; id=31.93%; ungapped id=35.04%;E()=1.5e-105;&nbsp;;
6198 aa overlap; query 36-6020 aa; subject 2593-8452 aa"
</p>
<pre> analysis ------ fasta
|
|
|
analysisfeature ---- raw score (null), evalue, id
|
|
| |---featureprop---ungapped id (35.04)
| |
Matchfeature ---|
/ | |
/ | |---featureprop---overlap (6198)
rank=0 / |
(srcfeature_id=product / |
FeatureId) / |
(subject 2593-8452)featureloc featureloc (query 36-6020) srcfeature_id=queryFeatureId rank=1
| |
| |
featuredbxref | |
(AF04782) | |
\ | |
\ | |
\ | |
\| |
(dbxref=O85168) | |
(seqlen=9376)feature feature (polypeptide if protein match | transcript if nucleotide match)
/|\
/ | \
/ | \
/ | \
/ | \
/ | \
/ featureprop \
featureprop | featureprop
| | |
| product |
| (syringomycin s) |
| |
| |
organism gene(syrE)
(Pseudomonas syringae)
</pre>
<p>N.B. For now the match feature is entered as CvTerm = '<b>region'</b>.
The Cv '<b>genedb_misc'</b> is used for Cvterms like 'ungapped id' found in /similarity.
</p>
<a name="Controlled_Vocabulary_Qualifiers"></a><h3> Controlled Vocabulary Qualifiers </h3>
<p>These qualifiers are all FeatureCvTerm's.
</p>
<a name="GO"></a><h4> GO </h4>
<p><br>
/GO="aspect=;GOid=;term=;qualifer=;evidence='db_xref=;with=;date="
</p>
<p>GO annotation can be attached at different levels of the heirarchy.
The GeneDB loader attaches it by default to the polypeptide as that
seems to be the most typical case.
</p>
<p>Each GO entry has a CvTerm and a DbXRef associated with it. The GO
term should be looked up by its DbXRef i.e. GO:123456, to get the
correspontding CvTerm. A FeatureCvTerm links this CvTerm to the
Feature. The FeatureCvTerm may well exist so needs to be looked up. The
qualifier NOT is treated specially, as a field in FeatureCvTerm,
becauses it reverses the meaning of the assignment, rather than adding
more details as most qualifiers do. The FeatureCvTerm may have a number
of associated FeatureCvTermProp's. This is general storage for the GO
evidence code, extra qualifiers and the date of the assignment. (A hack
for the evidence code would be possible, using a CvTerm to represent
key and evidence code, but it wouldn't work for the date). One or more
FeatureCvTermDbXRef's can be associated with the FeatureCvTerm which
corresponds to the WITH/FROM column in GO. The dbxref value in this
case correspond to publications, so the primary Pub is linked to the
FeatureCvTerm.pub_id. One or more FeatureCvTermPub's can be associated
with the FeatureCvTerm which corresponds to any ID's after the pipe
symbol in the publication column.
</p>
<a name="controlled_curation"></a><h4> controlled_curation </h4>
<p>/controlled_curation="term=;cv=; qualifier=;evidence=;db_xref=;residue=; attribution=;date="
</p><p>Storage is similar to GO. The db_xref is stored either as a FeatureCvTermDbXref or a FeatureCvTermPub:
</p>
<ol><li> if the value is a PMID:12345 then it is stored in the pub
table. a dummy dbxref is created with the 'accession' = 12345. a
pubdbxref is created to link the pub with the dbxref.
</li><li> if the value is other database like UNIPROT:23456 then it is
stored in the dbxref table with accession=23456 and an entry is also
created in the feature_cvterm_dbxref table to link the featurecvterm
and the dbxref
</li></ol>
<a name="product"></a><h4> product </h4>
<p>This is stored as a FeatureCvTerm with a CvTerm from the '<b>genedb_products'</b> Cv.
</p>
<a name="class_.28Riley_classification.29"></a><h4> class (Riley classification) </h4>
<p><i>e.g.</i> /class=6.2.2
</p>
<p>
These are linked to the Feature as below:
</p><pre> Feature
|
FeatureCvTerm--name='anti sigma factor'
|
CvTerm
|
-----------
| |
RILEY--Cv DbXRef--accession=6.2.2
|
Db-name=RILEY
</pre>
<a name="Dbxref"></a><h3> Dbxref </h3>
<p>- stored as a FeatureDbXRef
</p>
<a name="EC_number"></a><h3> EC_number </h3>
<p>- stored as a FeatureProp
</p>
<a name="literature"></a><h3> literature </h3>
<p>- stored as FeaturePub
</p>
<a name="Search_and_Results_files"></a><h3> Search and Results files </h3>
<p>The following are stored as FeatureProp's:<br>
</p>
<a name=".2Fblast_file"></a><h4> /blast_file </h4>
<a name=".2Fblastn_file"></a><h4> /blastn_file </h4>
<a name=".2Fblastp.2Bgo_file"></a><h4> /blastp+go_file </h4>
<a name=".2Fblastp_file"></a><h4> /blastp_file </h4>
<a name=".2Fblastx_file"></a><h4> /blastx_file </h4>
<a name=".2Ffasta_file"></a><h4> /fasta_file </h4>
<a name=".2Ffastx_file"></a><h4> /fastx_file </h4>
<a name=".2Ftblastn_file"></a><h4> /tblastn_file </h4>
<a name=".2Ftblastx_file"></a><h4> /tblastx_file </h4>
<a name=".2Fclustalx_file"></a><h4> /clustalx_file </h4>
<a name=".2Fsigcleave_file"></a><h4> /sigcleave_file </h4>
<a name=".2Fpepstats_file"></a><h4> /pepstats_file </h4>
<a name="Synonyms"></a><h3> Synonyms </h3>
<p>The following qualifiers are loaded in the Synonym table:<br>
</p>
<a name=".2Freserved_name"></a><h4> /reserved_name </h4>
<a name=".2Fsynonym"></a><h4> /synonym </h4>
<a name=".2Fprimary_name"></a><h4> /primary_name </h4>
<a name=".2Fprotein_name"></a><h4> /protein_name </h4>
<a name=".2Fsystematic_id"></a><h4> /systematic_id </h4>
<a name=".2Ftemporary_systematic_id"></a><h4> /temporary_systematic_id </h4>
<p>and these are linked to the Feature via FeatureSynonym's. These
Synonym's are in the '<b>genedb_synonym_type'</b> Cv table.
FeatureSynonym.is_current is used to store previous/obsolete
synonyms.
</p>
<a name="colour"></a><h3> colour </h3>
<p>Presumably a FeatureProp (at least for now). Additional qualifiers
are being preposed status (containing information about functional
annotation and whether the annotation is manual or automatic) and
evidence. These are likely to be FeatureProp's.
</p>
<a name="ortholog.2Fparalog.2Fcluster"></a><h3> ortholog/paralog/cluster </h3>
<p>Orthologue/paralogues cluster are stored in a similar way to <a href="#Similarity" title="">/similarity</a>.
As input we have:
<br>a) manually curated orthologues which simply list other genes'
systematic ids and the relationship type
<br>b) auto-generated clusters of genes which also have associated data like clustering method, cut-off/score etc.
<br>However, they features are linked to each other by
feature_relationship's (rather than featureloc which are used with
/similarity). The feature_relationship's are given the type_id =
'orthologous_to' or 'paralogous_to'. For manually curated
ortholog/paralog data the analysisfeature and analysis are not required
and are not added.
</p>
<pre>analysis
|
+ analysisfeature
|
feature (type_id == protein_match)
|
+----+-------+------------+
| | |
| | |
feature1 feature2 feature3
gene gene gene
</pre>
<p>The bottom links are feature_relationships of SO type orthologous_to.
</p>
<div id="footer">
</div>
</div>
</body></html>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment