TAXODIUM
utility web-page
TAXODIUM utility is designed for building three-item
statement (3TS)-matrices from binary, ordered and unordered multistate
characters, with fractional and uniform weighting of the resulting statements.
TAXODIUM does
not require any installation process. Run the program without arguments to see
the command reference.
First argument
must always be the name of the CSV file with input matrix. One or several
options may follow the input file name. Order of appearance of the options is
not important. Table
1 shows the list of available options.
Table 1 TAXODIUM v1.2 options
Option |
Description |
Input symbols |
|
-ib
|
input: binary (default) |
-iom
|
input: ordered multistate |
-ium
|
input: unordered multistate |
-idna |
input: DNA/RNA |
-ip
|
input: protein |
Conversion
method |
|
-m3 |
method: 3TS (default, G-conversion = the
value of the outgroup exhaustive) |
Output symbols |
|
-ob |
output: binary (default) |
-om
|
output: multistate |
-odna |
output: DNA/RNA |
-op |
output: protein |
Fractional weights and outgroups |
|
-mus |
Unique
statements per input statement only (default: off) |
-fw
|
print fractional weights and save all 3TSs
in matrix (default: off) |
-og
|
print outgroup (in
case of G- conversion, default: off) |
Output
formats |
|
-phy
|
enable PHYLIP output (default: on if no
other output selected) |
-nex
|
enable NEXUS
output (default: off) |
-csv
|
enable CSV
output (default: off) |
Input matrix
format example is shown below:
taxonA,0,0
taxonB,=,0
taxonC,>,3
taxonD,@,4
taxonE,@,6
First (leftmost) column contains names
of taxa, all following columns contain characters.
Symbols allowed for each input option are shown in Table 2.
Table 2 Input file symbols
Input option |
Symbols |
Binary |
0 1 |
Ordered multistate |
0 1 2 3 4 5 6 7 8 9 : < = > @ A B C D E F G H I J K |
Unordered multistate |
0 1 2 3 4 5 6 7 8 9 : < = > @ A B C D E F G H I J K |
DNA/RNA |
A C G T U R Y S W K M B D H V |
protein |
A C D E F G H I K L M N P Q R S T V W Y |
Additionally, input file can contain a predefined outgroup taxon name. It must
always be last line in the input file, in the following format:
Out,taxonB
In the example above, “Out” is a reserved keyword. No real taxa must be named with that name in user’s input files. “taxonB” is the name of the outgroup taxon. If outgroup taxon is found in the
input file, the utility will do the following operations for each input file character individually:
1. Find which
symbol is contained in requested outgroup taxon (taxonB in this
example).
2. Output
statements will be written only if their outgroup
matches the symbol found in step 1.
G-conversion
with the binary 3TS matrix output from standard DNA matrix in simplified NEXUS
format with outgroup added, all 3TS fractionally
weighted:
taxodium.exe input.csv -idna -ob -og -fw -nex
Please note
that the command line interface may change in future versions. Please see the
documentation provided with each version of the utility for complete details.
Currently, the
maximum count of taxa in the input matrix must not
exceed 5000, and the maximum count of characters is 100000. These values can be
modified in the source code if necessary. The output matrix is constructed
entirely in computer's RAM before being written on disk. If a computer has
enough RAM to accommodate the entire output matrix then the processing will
occur with maximum possible performance. If the amount of RAM is not
sufficient, a typical operating system (such as Windows or Linux) will attempt
to use disk swapping. This will affect the performance severely, but the
program will still finish processing. Finally, if the size of the disk swap
file is not sufficient, TAXODIUM will report memory allocation error and show
the amount of memory required to accommodate the output matrix. In such case,
the user should increase the size of the disk swap file and rerun the utility.
Evgeny_at_ufl_dot_edu