Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<SECT1 ID="VALIDATOR">
<TITLE>Annotation Validation</TITLE>
<SECT2 ID="VALIDATOR-RUN">
<TITLE>How to Run Validation Checks</TITLE>
<PARA>
&prog; can carry out validations checks to try and minimise annotation errors.
These checks can be carried out in the following ways:
</PARA>
<ORDEREDLIST ID="VALID-CHECK">
<LISTITEM ID="VALID-CHECK-1">
<PARA>
Click on the tick button
<inlinemediaobject>
<imageobject>
<imagedata fileref="tick.png">
</imageobject>
</inlinemediaobject>
found in the top right hand side of Artemis to validate all features. When complete it will
open a report window highlighting any features which have failed the checks.
</PARA>
</LISTITEM>
<LISTITEM ID="VALID-CHECK-2">
<PARA>
Select the features to be checked in &prog; and open the popup menu by right clicking on the feature display
and selecting the 'Validation report ...' option.
</PARA>
</LISTITEM>
<LISTITEM ID="VALID-CHECK-3">
<PARA>
From the View menu, select the 'Feature Filters' menu item and the 'Validation checks...' option. This opens
a feature list window for each of the type of check it carries out and these contain the features that have
failed the check.
</PARA>
</LISTITEM>
<LISTITEM ID="VALID-CHECK-4">
<PARA>
For organisms in a chado database the vaidator can be run from the 'Database and File Manager' window from the
'File' menu by selecting the 'Validate Selected Sequence / Organism' option.
</PARA>
</LISTITEM>
</ORDEREDLIST>
</SECT2>
<SECT2 ID="VALIDATOR-CHECKS-ALL">
<TITLE>Validation Checks For All File Types</TITLE>
<PARA>
The following checks are made on all file types (e.g. EMBL, GFF3):
</PARA>
<ITEMIZEDLIST SPACING="compact">
<LISTITEM>
<PARA>
CDS have no internal stop codon
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
CDS have a valid stop codon
</PARA>
</LISTITEM>
</ITEMIZEDLIST>
<PARA>
Additionally &prog; checks GO annotation for:
</PARA>
<ITEMIZEDLIST SPACING="compact">
<LISTITEM>
<PARA>
unexpected white space in with/from and dbxref columns
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
the WITH/FROM field must be empty when using IDA, NAS, ND, TAS or EXP evidence code
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
GO:0005515 can only have IPI evidence code
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
IEP is not allowed for molecular_function and cellular_component terms
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
the WITH/FROM field must be filled when using ISS, ISA, ISO and ISM codes
</PARA>
</LISTITEM>
</ITEMIZEDLIST>
</SECT2>
<SECT2 ID="VALIDATOR-CHECKS-GFF">
<TITLE>Validation Checks For GFF3</TITLE>
<PARA>
The following are checks for GFF3 and Chado entries only:
</PARA>
<ITEMIZEDLIST SPACING="compact">
<LISTITEM>
<PARA>
check that the gene model comprises of at least a gene and a transcript feature
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
check that the boundaries of the features making up a gene model are consistent
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
check that all the features in a gene model are on the same strand
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
check that CDS features have a phase
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
check the attribute column to ensure that qualifiers have a value (not empty) and
that only reserved tags start with an uppercase character
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
check that partial qualifiers are consistent within a gene model
</PARA>
</LISTITEM>
<LISTITEM>
<PARA>
check that the gene name prefix is consistent within a gene model
</PARA>
</LISTITEM>
</ITEMIZEDLIST>
</SECT2>
<SECT2 ID="VALIDATOR-REPORT">
<TITLE>Validation Report</TITLE>
<PARA>
The validation report window displays a summary for the features that have failed one or
more of the annotation checks above. The title bar of the window displays the number of
features that have passed and the number that have failed the validation checks. The
problems identified are highlighted in red.
</PARA>
<PARA>
<MEDIAOBJECT>
<IMAGEOBJECT>
<IMAGEDATA FORMAT="png" FILEREF="validation_report.png"></IMAGEOBJECT>
</MEDIAOBJECT>
</PARA>
<PARA>
Some of the errors can be fixed automatically. The 'Auto-Fix' button opens a window
with the fixes enabled that are available for the entry type that is loaded in &prog;. For example, it will
attempt to fix CDS features that have been found not to end in stop codons. If the last codon is not a
stop codon, but the very next codon is a stop codon, then the end of the feature is
extended by three bases.
</PARA>
<PARA>
For GFF3 and chado entries &prog; will also attempt to fix problems it finds with gene
boundaries and if a phase is absent then a default phase of 0 is given. Once these are
fixed the results window will automatically update and remove the problems it has