PAUP* FAQ: Answers

Last updated 22 February 2007
Back to PAUP* Home Page

Why are all the answers given in terms of commands and not menu choices?

Primarily this is to maintain consistency. All versions of PAUP* have a command line interface, whereas only a few versions have a menu system, thus if answers were given in terms of menu choices, users of the UNIX and DOS versions would be out of luck. Also, many users prefer to put all of the commands for a particular analysis in a PAUP block directly in the data file itself. This maintains a complete record of how the analysis was carried out, which is useful later for purposes of writing the "Methods" section of a paper. The commands presented here can all be used within PAUP blocks as well as on the command line itself, thus facilitating the creation of PAUP blocks.
Back to questions

If I don't find it here, does that mean that it doesn't exist?

This FAQ is written as the need arises, and thus it will continue to grow in completeness each week. Thus, this FAQ is not intended to be a replacement for the PAUP* manual, but we hope it is a useful surrogate until the program and manual are officially published. The FAQ's authors frequently receive questions (usually by email) about using PAUP*, and this provides a convenient mechanism for responding to common questions that we receive over and over again. Please feel free to submit candidate questions for inclusion in the FAQ.
Back to questions

Can I submit questions that I think should be part of this FAQ?

Please do. We welcome submission of candidate questions for the PAUP* FAQ, but be aware that the decision to include any particular question resides with the authors of the FAQ. The questions most likely to make it into the FAQ are those that we feel would benefit a large proportion of PAUP* users. Please submit candidate questions Answers in the form of a series of PAUP* commands are of course very much appreciated. Please refrain from using abbreviations of commands, as abbreviations change over time as more commands are added to PAUP*. Also, if you find answers that are incorrect or ambiguous, please let us know!
Back to questions

I just updated PAUP* using the updater on your web site, yet when I try to run PAUP I still get the message that PAUP* has expired.?

Occasionally this happens because a user's computer is not set to the correct date or the user is clicking on an icon that is not linked to the beta 8 binary. Because PAUP is sensitive to both the creation date and expiration date, back-dating your computer to a time before the program was created will also generate the expiration notice. After checking your system date make sure that you are executing the beta 8 binary.
Back to questions

Is PAUP* Year 2000 Compliant?

Yes, PAUP* is "Year 2000 Compliant." The only time PAUP* uses dates is to output them to the main display and/or log file for the user's information. If the host operating system returns the correct date when PAUP* requests it, then PAUP* will show the correct date in its output. Even if the host operating system fails to return the correct date in the year 2000, the only consequence is that the date will not be shown correctly by PAUP* in its display output and log files.
Back to questions

What is a batch file?

A batch file contains commands that you would otherwise issue interactively (i.e., from pull-down menues or the command line). For example, using the pull-down menues in the Mac version of PAUP* you could:
1) open the data file combine2.dat from the file menu and execute it
2) exclude the charactersets cytb and junk2 from the data menu
3) start a heuristic search from the analysis menu
You could obtain the same result by executing a simple text file containing the following paup blocks.
begin paup;
execute d:\data\combine2.dat;
exclude cytb junk2;
hsearch;
end; 

Back to questions

I'm using a beta version of PAUP* 4.0. How should I cite the program?

Swofford, D. L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer Associates, Sunderland, Massachusetts.

Note: Because there are a number of beta and test versions of the program you should mention the specific version of PAUP* somewhere in the methods.

Back to questions

Is there a version of PAUP* that will run a search in parallel on a multiple processor machine or a cluster of machines?

Right now the answer is no. PAUP* is a single threaded application that will only take advantage of one processor at a time. Dave is in the process of parallellizing the code for the portable or Unix version of PAUP*, but it will be a while before a general parallel version of PAUP* available.
Back to questions

Could you recommend some text books that will help me to learn more about the analyses that can be done in paup?

There are a number of good books out there that deal with the subject of phylogenetic analyses. The selection below is just a few of the text books that I find myself referring to.
Back to questions

What are the maximum dimensions (i.e., characters x sequences) of a data matrix that PAUP* will read?

The maximum number of sequences (AKA taxa) is 16384. The maximum number of characters (AKA positions or sites) will depend on the type of computer you are using. If your machines uses a 32-bit processor the maximum will be 2^30 (2 raised to the power of 30), whereas machines with 64-bit processors can read a maximum of 2^62 characters.
Back to questions

What is the maximum number of character states that can be assigned to a character in PAUP*?

16 for a 16-bit machine
32 for a 32-bit machine
64 for a 64-bit machine
This limit stems from the use of bit manipulation to perform the state-set calculations in parsimony, and corresponds to the "word length" of the computer--usually 32 bits (e.g., most x86 PCs) but occasionally 64 bits (e.g., Alpha, G5, etc).
Back to questions

Why doesn't PAUP* allow me to set the criterion to likelihood after I execute my data set?

To use the maximum likelihood criterion in PAUP* your dataset must be composed of DNA, Nucleotide, or RNA characters and the "datatype" option under the "format" command must also be set to one of these values. For example:
Begin characters;
Dimensions nchar=200;
Format datatype=dna interleave;

Back to questions

How do I tell PAUP* I want to use the likelihood criterion?

set criterion=likelihood;

Back to questions

How do I tell PAUP* I want to use the parsimony criterion?

set criterion=parsimony;

Back to questions

How do I tell PAUP* I want to use the minimum evolution criterion?

set criterion=distance;
dset objective=me;

Back to questions

How do I tell PAUP* I want to use the least-squares criterion?

set criterion=distance;
dset objective=lsfit;

The default least-squares objective function is for weighted least squares, with the weights equal to the reciprocal of the square of the distance between each pair of taxa (see below).
Back to questions

How do I tell PAUP* I want to use unweighted least-squares criterion?

set criterion=distance;
dset objective=lsfit power=0;

In general, the "power" specifies the power to which the reciprocal of the distance between each pair of taxa is raised. Raising this value to the zero(th) power is equivalent to weighting all pairwise deviations by the constant "1".
Back to questions

Which non-NEXUS file formats will PAUP* import?

Back to questions

Where can a find examples of non-NEXUS file formats that PAUP* will import?

Sample non-NEXUS files are given at http://paup.csit.fsu.edu/nfiles.html.
Back to questions

How do I import non-NEXUS formatted files into PAUP*?

To import non-NEXUS formatted files into PAUP* you need to use the tonexus command. For example:
tonexus format=gcg fromfile=mygcgfile.gcg tofile=mynexusfile.nex;

If you are using the Mac interface you can get to the import dialog box by selecting File and then Import data...
Back to questions

How do I tell PAUP* to ignore certain taxa in further analyses?

The following lines show six alternative ways of telling PAUP* to ignore the taxa P._articulata, P._gracilis, P._fimbriata, and P._robusta (we'll assume these were number 2, 3, 4 and 7 in the data matrix, respectively) in further analyses.
delete P._articulata P._gracilis P._fimbriata P._robusta;
delete 'P. articulata' 'P. gracilis' 'P. fimbriata' 'P. robusta';
delete 'P. articulata'-'P. fimbriata' P._robusta'
delete 2 3 4 7;
delete 2-4 7;

Note: If you plan to refer to a set of taxa frequently, you may find it convenient to setup a taxset. Sets are defined in a sets block. For the five taxa given above defining a taxset would look like this:
begin sets;
taxset junk = P._articulata P._gracilis P._fimbriata P._robusta;
end;

After the taxset is defined, simply refer to the taxset to ignore these taxa in futher analyses. For example:
delete junk;

Back to questions

How do I tell PAUP* to use taxa that I previously told it to ignore?

The following lines show five alternative ways to tell PAUP* to reinstate four taxa previously deleted (see above)
restore P._articulata P._gracilis P._fimbriata P._robusta;
restore 'P. articulata' 'P. gracilis' 'P. fimbriata' 'P. robusta';
restore 'P. articulata'-'P. fimbriata' P._robusta'
restore 2 3 4 7;
restore 2-4 7;

Note: If you've defined a taxset then you can use the following syntax:
restore junk;

Back to questions

How do I tell PAUP* to ignore certain characters (sites) in further analyses?

The following lines show five alternative ways of telling PAUP* to ignore the characters leaf_length, leaf_width, stamen_number, and carpel_number (we'll assume these were characters number 2, 3, 4 and 7 in the data matrix, respectively) in further analyses.
exclude leaf_length	leaf_width stamen_number carpel_number;
exclude 'leaf length' 'leaf width' 'stamen number' 'carpel number';
exclude leaf_length-stamen_number 'carpel number';
exclude 2 3 4 7;
exclude 2-4 7;

If you planned to exclude these characters frequently it would be a good to define them in a characters set. This way you could exclude them by referencing the character set. For example:
charset  foo = 1-4 7;
exclude foo;

Here's how to tell PAUP* to ignore nucleotide sites 359 to 367, 586 to 588 and 693 to the last site in further analyses.
exclude 359-367 586-588 693-.;

Here's how to tell PAUP* to ignore every third nucleotide site in further analyses (starting with the third site).
exclude 3-.\3;

Back to questions

How do I tell PAUP* to use characters (sites) that I previously told it to ignore?

The following lines show five alternative ways to tell PAUP* to reinstate four characters previously excluded (see above)
include leaf_length	leaf_width stamen_number carpel_number;
include 'leaf length' 'leaf width' 'stamen number' 'carpel number';
include leaf_length-stamen_number 'carpel number';
include 2 3 4 7;
include 2-4 7;

Here's how to tell PAUP* to include previously excluded nucleotide sites 359 to 367, 586 to 588 and 693 to the last site in further analyses.
include 359-367 586-588 693-.;

Here's how to tell PAUP* to include every third nucleotide site (starting with site number 1) in further analyses.
include 1-.\3;

Back to questions

How do I exclude all the constant characters?

exclude constant;

Back to questions

How do I exclude all constant as well as autapomorphic characters?

exclude uninf; 

Back to questions

How do I combine different data set into a single NEXUS file?

In the example below protein and nucleotides are combined in a single interleaved data set. Notice that a character partition is used to distinguish the data sets.
#NEXUS 
Begin data;
Dimensions ntax=5 nchar=20;
Format datatype=protein interleave symbols="ACGT" gap=-;
Matrix
t1     VKYPNTNEEG
t2     VKYPNTNEEG
t3     VKYPNTNEEG
t4     VKYPNTNEDG
t5     VKYPNTNEDG

t1     AGCTAAACCT
t2     AGCTAGACCT
t3     AGCTAGACTT
t4     AGCTAGACTT
t5     AGCTAAACTT
;
end;

Begin Assumptions;
charset protein = 1-10;
charset dna = 10-.;

  
usertype 5_1 stepmatrix = 4 acgt
- 5 1 5 
5 - 5 1 
1 5 - 5 
5 1 5 - 
;
end;

Begin paup;
outgroup t2 t3;
ctype 5_1:dna;
hsearch addseq=random;
end;

Back to questions

How do I code indels so that they are not treated as missing data?

If you are confident about the homology of the indels then you might consider setting up an additional character for each site in the original matrix that contains an indel. The new sites would be represented by a binary character. The syntax for doing this looks like this:
begin data;
dimensions ntax=4 nchar=10;
format datatype=dna gap=- interleave symbols="01";
options gapmode=missing;
matrix
one    ATGGT--
two    AtggT--
three  A-GGTTG
four   A-GGTAG
one    011
two    011
three  100
four   100
;
end;

Back to questions

What are data partitions and why are they useful?

Data partitions divide the characters in your data matrix into two or more groups. This is useful for performing the partition homogeneity test or for estimating site-specific rates by maximum likelihood.
Back to questions

How do I define and name a data partition?

Here a partition is created and named codons. The partition divides sites into first, second and third codon positions. The first partition, named firstpos, includes every third site (the \3 means every third site) starting from site 1 and ending with the last site (the period means last character). The second and third partitions, named secondpos and thirdpos respectively, are defined similarly, except they have different starting points.
charpartition codons = firstpos:1-.\3, secondpos:2-.\3, thirdpos:3-.\3;

Back to questions

How do I do a partition homogeneity test?

First you'll need to set up a partition. For this example, I'll pretend to setup a partition called genes for two partial gene sequences.
charpartition genes = gene1:1-210, gene2:230-.;

Next I'll need to exclude the characters contained in the NEXUS data set but not defined in either of the two partitions -- gene1 or gene2.
exclude 211-229;

Now I can use the partition homogeneity test.
hompart partition=genes;

Back to questions

What are topological constraints?

Topological constraints are unresolved trees used to filter out trees discovered during the search that do not match a particular topological criterion. One possible use of a topological constraint is to force a particular group to be convex (i.e., monophyletic if the tree is rooted outside the group). This type of topological constraint is referred to as a monophyly constraint. Monophyly constraint trees contain all the taxa but are unresolved to some degree. A second type of constraint is called a backbone constraint. Backbone constraint trees are normally fully resolved, but are missing one or more taxa. A tree encountered during a search is consistent with a backbone constraint tree so long as pruning all taxa not in the constraint tree yields the constraint tree topology. One may wish to compare the support of the data for the best tree obtained under the constraint to the best tree without the constraint. Note that PAUP* offers much more flexibility in terms of topological constraints than is indicated here; the manual for version 3.1 explains constraints thoroughly.
Back to questions

How do I define and name a topological constraint?

Suppose you are studying bot flies that parasitize either lagomorphs or rodents depending on the species. You may be interested in finding the best tree in which the lagomorph-infecting species of bot flies form a monophyletic group. Assume that there are 10 taxa, and taxa 2, 3, 5, 7 and 9 are lagomorph-infecting species, while the others (1, 4, 6, 8 and 10) are rodent-infecting species.
constraints lagomorph (monophyly) = (1,4,6,8,10,(2,3,5,7,9));

Here, the word lagomorph is the name of the topological constraint, and the word monophyly is a keyword indicating the type of constraint (the other possible type is specified using the keyword backbone). Note that taxa connected directly to the root node do not have to be specified explicitly in constraint-tree definitions, and monophyly constraints are the default. The above example could thus also be written:
constraints lagomorph = ((2,3,5,7,9));

Back to questions

How do I load a topological constraint in the form of a tree file?

Suppose one or more constraint trees exist as tree definitions in a tree file named "foo.tree" (the names of the trees in the tree file will become the names of the corresponding constraint definitions when the treefile is loaded).
loadconstr file=foo.tree;

If the trees in "foo.tree" are to be considered backbone constraints, then the keyword "asbackbone" must be included (otherwise the trees are considered to be monophyly constraints):
loadconstr file=foo.tree asbackbone;

Back to questions

How do I apply a previously-defined topological constraint to a search?

The command below will perform an heuristic search using all default options except that the (predefined) topological constraint named lagomorph will be enforced:
hsearch constraints=lagomorph enforce=yes;

Other search-related commands for which the constraints and enforce options are available are illustrated in the examples below:
nj constraints=lagomorph enforce=yes;       [neighbor-joining]
alltrees constraints=lagomorph enforce=yes; [exhaustive search]
bandb constraints=lagomorph enforce=yes;    [branch-and-bound search]

Back to questions

How do I get a single majority-rule bootstrap consensus tree from the results of multiple bootstrap runs performed at different times or on different machines?

First, save the trees found during each bootstrap run. By default, PAUP* uses the system clock to seed the random number generator; thus, provided you do not change the value of bseed characters will be sampled differently from run to run. After the bootstrap runs have completed, retrieve the tree files, and compute the consensus tree using the options given below.
begin paup;
execute my_nexus_file.nex; 
bootstrap treefile=futz1.out nreps=10 bseed=0 search=heuristic;
end;
...
begin paup;
execute my_nexus_file.nex;
bootstrap treefile=futz3.out nreps=10 bseed=0 search=heuristic;
end;
begin paup;
execute my_nexus_file.nex;
gettrees file=futz1.out StoreTreeWts=yes mode=3;
gettrees file=futz2.out StoreTreeWts=yes mode=7;
gettrees file=futz3.out StoreTreeWts=yes mode=7;
contree all/strict=no majrule=yes usetreewts=yes;
end;

Back to questions

How do I tell PAUP* to save the trees currently in memory to a file?

Here's how to save the trees (and the estimated branch lengths) to the file 'foo.trees'
savetrees file=foo.trees brlens;

Back to questions

How do I tell PAUP* to read in trees previously saved in a file?

Here's how to load into memory the trees saved in the file 'foo.trees'
gettrees file=foo.trees;

Back to questions

Why can't I get PAUP* to save branch length on the bootstrap consensus tree?

The bootstrap tree is a consensus of the trees found for each replicate sample of the data. Since each replicate tree will have a different set of branch lengths none are displayed or saved on the bootstrap consensus tree.
Back to questions

How can I limit the number of rearrangements PAUP* evaluates during a heuristic search?

There are several different ways to go about this. First, there is a "rearrlimit=n" option on the hsearch command, which limits the total number of rearrangements for each search to n. Second, there is a "timelimit=n" option, where n is the number of seconds that PAUP* will use to search for a tree. Note that if you use these options in conjuction with random-addition-sequence searches, the "limitperrep=y|n" determines whether to apply this limit on a per replicate or overall basis. You can also specify reconlimit=n, where n is the maximum "reconnection distance" for an SPR or TBR reconnection (1 is equivalent to NNI, infinity to TBR, and values in between restrict the size of the neighborhood of trees that are tested).
Back to questions

Why are fractions listed in the bootstrap bipartition table when 100 bootstrap replicates are performed?

In some cases PAUP might find multiple optimal trees for a given replicate. If it does, PAUP will give the tree a weight that is equal to the reciprocal of the number of trees found in the replicate. You can see this for yourself if you use the treefile option under the bootstrap command to save all trees during search. For example:
bootstrap treefile=bstrees.tre;

Back to questions

How do I ask PAUP* to examine every possible tree topology?

Here's how to do this, but keep in mind that the number of possible unrooted bifurcating tree topologies increases factorially with the number of taxa. This means that for even a 14 taxon problem, it will take PAUP* several centuries to complete this analysis! It probably is not a good idea to try this command if you have more than ten taxa currently included.
alltrees;

Back to questions

How do I evaluate 500 random-addition replicates but prevent PAUP* from branch swapping on each one?

hsearch addseq=random nreps=500 swap=none;

Back to questions

How do I set a maxtree limit for each random addition sequence replicate?

If you are doing a number of random addition sequence replicates you'll need a way to get around the problem of hitting the maxtree limit on the first replicate and hence aborting the search before PAUP gets to remaining replicates. For example, if you want to apply a maxtree limit of 100 to each of 10 random addition sequence replicates then you will need to set the maxtree limit to 1000 and use two options under the the hsearch command. The syntax will look like this:
set maxtrees = 1000 increase=no; 
hsearch addseq=random nreps=10 nchuck=100 chuckscore=1; 

Back to questions

I have performed an heuristic likelihood search and specified 100 replicates within the hsearch command. When I examine the progress reports, it looks like PAUP* is finding many different tree islands, however the summary at the end says that only one island was found and that island was hit 100 times. What is going on here?

The problem is that PAUP* makes progress reports only once per minute by default. Once PAUP* encounters a tree in the same island as trees it has found previously, it immediately abandons the current replicate and begins working on the next replicate. Thus, even if you set the progress report interval to 1 second as follows:
set dstatus=1;

you will probably never catch PAUP* at just the moment when it is finishing one replicate and about to begin the next. As a result, it is very common for the last entry of a replicate to report a likelihood score that is worse than the best likelihood score found thus far.
Back to questions

Do you have equations for estimating the relative (or actual) time required for heuristic searches for sequences of different length and for different numbers of sequences?

Unfortunately, the time required to complete a heuristic search cannot be estimated based on the size of a data set. There are a number of reasons why this is so; however, one important reason has to do with the quality of the data (i.e., how homoplastic the data are).
Another important reason is that there is no simple expression for calculating the number of tree bisection-reconnection (TBR) or subtree pruning-regrafting (SPR) rearrangements that will be made on a given tree. That is, the shape of a starting tree will determine the total number of rearrangements that can be made using one of the aforementioned swapping techniques. The problem is further complicated by the fact that it is not known how many suboptimal trees will be found during a search before optimal trees are found, and what portion of potential rearrangements of a given tree will be performed before a better tree is found.
Back to questions

Is there a version of PAUP* that will work on my new Intel-based Mac?

Yes, Mac users who have upgraded to an Intel-based Mac must follow the instructions on this page to get a version of PAUP* that will work on this platform.
Back to questions

Is there a version of PAUP* that will work natively under Mac OS X?

Yes, we have compiled a command-line only version of PAUP* 4.0 beta that will run on Mac OS X in a terminal window. Note, this version takes full advantage of Mac OS X's memory protection and preemptive multitasking but LACKS a Graphical User Interface (GUI). Starting with the forthcoming release of Beta 11, Mac users will be given a choice to install the command-line version of PAUP* as well as the classic Mac GUI version. Work is currently underway to "carbonize" the GUI Mac version of PAUP*; however, at this time, we cannot speculate on when this version will be available. The Mac GUI version of PAUP* is compatible with Mac OS X when run in the classic layer. If you are only interested in the command-line version of PAUP* then you may purchase the portable version http://www.paup.csit.fsu.edu/port.html
Back to questions

I just purchased a new Mac and Classic support is not installed on the system. How do I run PAUP* without classic support?

You have two choices. The first is to install classic support on your machine. While Apple no longer installs classic support by default on new systems, you can install it yourself with very little effort. A classic support installer is included on the "Additional Software & Apple Hardware Test" CD. This CD is included with your set of system CDs. Open the file labeled "About the Additional Software & Apple Hardware Test Disc" on the "Additional Software & Apple Hardware Test" CD and you will find concise instructions for installing classic support.

Your second choice is to use the command-line version of PAUP* for OS X. Starting with the forthcoming release of Beta 11, Mac users can use a command-line version of PAUP* in addition to the classic Mac GUI version. The command-line version runs on Mac OS X in a terminal window and takes full advantage of Mac OS X's memory protection and preemptive multitasking but LACKS a Graphical User Interface (GUI). The Beta 11 installer and updater will automatically add the command-line program to your system path. To start command-line program type "paup" in terminal window. See the quick-start document http://paup.csit.fsu.edu/quickstart.pdf for more details regarding the use of the command-line version of PAUP*.
Back to questions

I get an error when I try to print from the classic version of PAUP*. How do I print from the classic version of PAUP*?

This is probably happening because you do not have a printer setup for the classic layer. A complete description of how to setup printing can be found in Apple's "Knowledge Base". The short version of this site is:
  1. If you plan to use an Appletalk printer then you will need to Turn on AppleTalk. Go to your System Preferences > Network > Configure ... Select the AppleTalk Tab and then the "Make AppleTalk active" toggle.
  2. Open the Desktop Printer Utility. This is typically located in the Utilities folder within the Applications (Mac OS 9) folder. A window named "New Desktop Printer" should open after a few seconds (give it some time).
  3. Select the printer type that you would like to use and follow the instructions.
If you are only interested in using the Mac tree preview window in PAUP*, you can also setup a "dummy" printer. Open the Desktop Printer Utility. Under "Create Desktop ..." select Translator and then click OK. After you do this you should be in business.
Back to questions

How do I increase the amount of memory available to PAUP*?

This is pretty much straight out of Mac's online help: First, quit PAUP* if it is open. Click the program's icon to select it. (Make sure to click the program icon itself, not an alias.) Open the File menu and choose Get Info. For Mac OS 8.1 and below, double-click the "Preferred size" box and type a new number. For Mac OS 8.5 and up, you'll need to select memory under the "Show" pull-down menu to get to the "Preferred Size" box. The program can use this amount of memory if enough memory is available.
Back to questions

Can I download a Mac updater to a PC and transfer the updater to a Mac that is not online?

Yes, download the BinHexed updater for the appropriate Mac version. From your PC click the BinHex link. Your browser will ask you if you want to save the file or run it. Select the save option. You'll get another dialog box allowing you to select a save location. Save the updater to a PC formatted floppy disk. Mount the floppy on your Mac's desktop. If your Mac doesn't already have one, you'll need an utility to decompress the BinHexed file. After the file is decompressed double click the updater icon and you should be good to go.
Back to questions

How do I tell PAUP to automatically close the heuristic search status window at the end of the search?

set autoclose=yes;

Back to questions

How do I keep information from scrolling off the screen before I have read it?

If the PAUSE option of the SET command equals Silent, Beep, or Msg the output will stop after every screenful and wait for you to press the return key.
set pause=No|Silent|Beep|msg

Back to questions

How do I recall a PAUP* command?

We strongly recommend using the public domain command-line editor CED, which provides command-line editing and recall capabilities within PAUP*.
Back to questions

How do I print trees using the Windows Interface?

To print an asci trees, direct the general output to a file using the log command, issue the command showtrees, stop the log, and print the log file using your favorite text editor.
log file=tree.log;
showtree 1;
log stop;

Note: The windows interface of PAUP* 4.0 does not print graphical trees. We plan to make graphical printing a part of the windows package but this feature will not be available in 4.0. The program TreeView written by Rod Page is an execellent program for creating and manipulating graphical trees from NEXUS files. To output NEXUS trees from any version of PAUP* use the savetrees command.
savetrees file=mytree.trees;

Back to questions

How does PAUP* deal with missing characters under the parsimony criterion?

The way that PAUP* deals with missing characters under the parsimony criterion is to assign to the taxa the character state that would be most parsimonious given its placement on the tree. Therefore, only the characters with no missing data will affect the placement of the taxa.
Back to questions

What options are available in PAUP* for dealing with multi-state taxa?

Under the "Set" or "Pset" commands you are given an option to change the way in which PAUP deals with multi-state taxa. When the data set below is analyzed under the parsimony criterion changing the designation of multi-state taxa to uncertain (default), variable, and polymorphic gives three different scores; 5, 6, and 7, respectively. For "Pset mstaxa=uncertain" paup picks the variable state that minimizes the tree length, for "Pset mstaxa=polymorphic" paup assumes that variable characters are a heterogeneous terminal group, and for "Pset mstaxa= variable" paup treats the characters inside the curly braces as uncertain and those inside the parentheses as polymorphic.

NOTE: For display reason, the curly braces are replaced by square brackets. To get the results described above replace the square brackets with curly braces.

#NEXUS
begin data;
dimensions ntax=4 nchar=4;
format symbols="012";
matrix
t1 11    00
t2 1[12] 10
t3 02    1(01)
t4 00    11
;
end;

Back to questions

How do I define multistate characters as ordered in PAUP?

There are several ways to assign character types to specific characters in the data matrix. One way is to define a typeset in an assumption block and then use the assume command to set the character type. For example:
begin assumptions;
typeset myTypesetName = ord: 1 4 5;
end;
begin paup;
assume typeset = myTypesetName;
end;

You can skip the assume command and set the character type from within the assumptions block if you precede the typeset name with an asterisk ("*"). For example:
begin assumptions;
typeset *myTypesetName = ord: 1 4 5;
end;

Yet another way to set character types is by using the ctype command from within a paup block or at the command line. For example the following command has the same effect as those given above:
ctype ord:1 4 5;

Back to questions

If a patristic distance is the sum of branch lengths on a path between a pair of taxa, why do the summed branch lengths between a pair of taxa not add up to the patristic distance reported under the "describetrees" command?

The most likely reason for this is that you have unordered multistate characters in your data matrix. PAUP does not included unordered multistate characters in the patristic distance calculation, because reconstuction of these characters can be ambiguous. To calculate branch lengths and by extension the entire tree length, PAUP will arbitrarily accept one of the possible ancestral state assignments. Therefore, the sum of the branch lengths is greater then the patristic distance because the branch length calculations included the multistate characters.

If you don't care about what ancestral states PAUP has used there is a way to get a patristic distance for all of the characters in your data set. First, save the tree in matrix representation including the branch lengths as a weight set.

matrixrep brlens=yes file=mytreefile.nex;

Next, open the matrix tree file and apply the weight set to all of the characters.
execute mytreefile.nex;
assume wtset=brlens;

Finally, rebuild the tree and generate the patristic distance matrix:
hs;
describetrees 1/ patristic=yes;

The patristic distances will now equal the summed branch lengths.
Back to questions

I did a search under the parsimony criterion and got two trees that look just alike. Why does PAUP consider them to be different?

The answer involves how PAUP collapses zero-length branches. The default collapsing rule is that a branch is retained if it is supported under at least one most-parsimonious reconstruction (MPR) of the ancestral states, for at least one character.

Here is a simple data matrix that will generate this result.

characters
taxa  1 23 45
-----------------
A     0 00 00
B     0 11 11
C     0 11 11
D     1 00 11
E     1 00 00
F     1 00 11

Analysis of this matrix using PAUP gives two most-parsimonious trees:

:     A  B   C D   F  E
:      \  \ /   \ /  /
:       \  *     *  /
:        \  \   /  /
:         \  \ /  /
:  tree1   \  *  /
:           \ | /
:            \|/
:             *
:
:     A  B   C   D   F  E
:      \  \ /   /   /  /
:       \  *   /   /  /
:        \  \ /   /  /
:         \  *   /  /
:          \  \ /  /
:  tree2    \  *  /
:            \ | /
:             \|/
:              *


An MPR on tree1 for character 1 requires two steps, and there are two of them:

:     A  B   C D   F  E    A  B   C D   F  E
:     0  0   0 1   1  1    0  0   0 1   1  1
:      \  \ /   \ /  /      \  \ /   \ /  /
:       \  0     1  /        \  0     1  /
:        \  \   /  /          \  \   /  /
:         \  \ /  /            \  \ /  /
:          \  0  /              \  1  /
:           \ | /                \ | /
:            \|/                  \|/
:             0                    1

Because one of these two MPRs assigns a change leading to the group DF, PAUP does not collapse the branch connecting DF to the remainder of the tree.

On the other hand, tree2 has only a single MPR for character 1:

:   A  B   C   D   F  E
:   0  0   0   1   1  1
:    \  \ /   /   /  /
:     \  0   /   /  /
:      \  \ /   /  /
:       \  1   /  /
:        \  \ /  /
:         \  1  /
:          \ | /
:           \|/
:            1

This character does not provide support for the BCD group, and since there are no other characters that support it, the branch leading to BCD is collapsed, yielding the tree:

:     A  B   C   D       F  E
:     0  0   0   1       1  1
:      \  \ /    |      /  /
:       \  0     |     /  /
:        \  \    |    /  /
:         \  \   |   /  /
:          \  \  |  /  /
:           \  \ | /  /
:            \  \|/  /
:             \  1  /
:              \ | /
:               \|/
:                1

PAUP considers both of these trees to be distinct, recognizing that there is a tree for which the group DF receives support (albeit ambiguous support) and another tree for which DF receives no support.

Back to questions

How do I perform a Kishino-Hasegawa test to see if the support for the first and second trees stored in memory is significantly different?

set criterion=parsimony;
pscores 1-2 / khtest;

Back to questions

How do I perform a partition homogeneity (congruence) test?

The following example uses the partition definition named "foo", specifies 1000 randomizations using the random number seed 1234567, and uses a branch and bound search to obtain the sum of tree lengths for each partition.
set criterion=parsimony;
hompart partition=foo nreps=1000 seed=1234567 search=bandb;

Back to questions

How do I downweight third position transitions only in a parsimony analysis?

First you need to identify the codon positions. Probably the most efficient way to do this is to set up a codons block where the reading frame for the coding genes is identified. Then you need to define the weighting for transitions and transversions by creating a step matrix within an assumptions block. Finally, use the ctype command within a paup block to apply the stepmatrix to 3rd position sites only.
begin assumptions;
charset coding = 2-457 660-896;
charset noncoding = 1 458-659 897-898;
charset 1stpos = 2-457\3 660-896\3;
charset 2ndpos = 3-457\3 661-896\3;
charset 3rdpos = 4-457\3 662-.\3;
usertype 5_1 stepmatrix = 4 acgt - 5 1 5 5 - 5 1 1 5 - 5 5 1 5 - ; end;
begin paup; ctype 5_1:3rdpos; end;
Back to questions

How do I weight specific character positions in my alignment?

You can give different weights to different character positions by using the "weights" command. There are several ways to identify the characters to be weighted. One efficient way to identify characters is to include them in a character set, which must be defined within an assumptions block. For example:
begin assumptions;
charset coding = 2-457 660-896;
charset noncoding = 1 458-659 897-898;
end;
Next, you can issue the "weights" command at the command line or within a paup block. In the example below, the first "weights" command assigns a weight of three to all characters defined as coding. The second "weights" command does the same thing except the character are directly identified.
begin paup;
weights 3:coding;
end;
or
weights 3:2-457, 3:660-896;

Back to questions

Do stepmatrices for character state transformations have to be symmetric?

User-defined stepmatrices do not need to be symmetric. The only requirement imposed on a stepmatrix is that it may not violate the triangle inequality .
Back to questions

Why does PAUP* tell me that my stepmatrix violates the triangle inequality?

The triangle inequality requires that a single edge of a triangle not be greater than the sum of the other edges. In terms of step matrices this means that

d(ac) <= d(ct) + d(at)

According to this rule the stepmatrix given below qualifies.

3 < 1+3
Stepmatrix "asym" (asymmetric):

       TO:  a  c  g  t

  FROM: a   -  3  1  3

        c   2  -  3  1

        g   1  4  -  3

        t   3  1  3  -

Whereas the following matrix would be inconsistent with the triangle inequality:

Stepmatrix "asymNT" (asymmetric triangle violation):

       TO:  a  c  g  t

  FROM: a   -  5  1  3

        c   2  -  3  1

        g   1  4  -  3

        t   3  1  3  -

and PAUP* would adjust the a to c transformation from 5 to 4.
Back to questions

Why does PAUP* warn me that the stepmatrix supplied in Xu and Miranker (2004, "A metric model of amino acid substitution", Bioninformatics 20:1214-1221) is "internally inconsistent"?

Symmetric stepmatrices in PAUP* are required to satisfy the triangle inequality. If they fail to do so, a warning is issued and the costs in the matrix are adjusted until the triangle inequality is satisfied for all possible triplets of states. Unfortunately, the matrix given in the paper by Xu and Miranker contained a minor error. A corrected matrix is available at the following location: http://www.cs.utexas.edu/users/mobios/Publications/mPAMErrata.pdf.
Back to questions

What do the indices under the "pscores" command mean?

PAUP outputs several indices that measure the "fit" of characters to particular trees. The indices can be defined in terms of the following three parameters:
  1. s= length (number of steps) required by the characters on the tree being evaluated
  2. m= minimum amount of change that the character may show on any conceivable tree
  3. g= maximum possible amount of change that a character could possible require on any conceivable tree (i.e., the length of the character on a completely unresolved bush).
You can calculate a value for each character using the following formulae:

ci= m/s

ri= (g-s)/(g-m)

rc= ri*ci

hi= 1-c

To get the overall value for a suite of characters you'll simply caculate the sums of s, m, and g for all the charachers in the suite and use the summed values in the equations described above.

Back to questions

How does PAUP* deal with missing characters under the likelihood criterion?

The likelihood is computed by summing the likelihoods over each possible assignment of A, C, G, or T to the taxon with the missing datum. Generally, if all of the nearby taxa have the same state, this sum will be dominated by the term with this same state assigned to the "missing" value, but each of the other states will contribute some small, nonzero, value to the likelihood. On the other hand, if there is considerable ambiguity in the sense that the surrounding taxa have different states, or the branch leading to a missing-data taxon is very long, each of the possible assignments makes a larger contribution to the total likelihood. It's all in the same spirit as likelihood in the absence of missing data--there are lots of ways that the pattern of nucleotides at the tips of the tree could have been generated, and all of them contribute something to the total likelihood (generally some much more than others). With missing data, there are several states that a taxon might have taken if an insertion/deletion event had not happened (or an ambiguity in the sequencing hadn't occurred) and likelihood considers the probability of each of those alternatives.
Back to questions

How do I tell PAUP* I want to use the JC69 model (Jukes & Cantor, 1969)?

set criterion=likelihood;
lset nst=1 basefreq=equal;

Back to questions

How do I tell PAUP* I want to use the K2P model (Kimura, 1980)?

set criterion=likelihood;
lset nst=2 basefreq=equal;

Back to questions

How do I tell PAUP* I want to use the F81 model (Felsenstein, 1981)?

set criterion=likelihood;
lset nst=1 basefreq=empirical;

Back to questions

How do I tell PAUP* I want to use the F84 model (i.e., the model used in DNAML)?

set criterion=likelihood;
lset nst=2 basefreq=empirical variant=f84;

Back to questions

How do I tell PAUP* I want to use the HKY model (Hasegawa, Kishino, & Yano, 1985)?

set criterion=likelihood;
lset nst=2 basefreq=empirical variant=hky;

Back to questions

How do I tell PAUP* I want to use the GTR model (i.e., the general time reversible model)?

set criterion=likelihood;
lset nst=6 basefreq=empirical;

Back to questions

How do I obtain likelihoods for all trees in memory?

lscores all;

Notes: you must first instruct PAUP* to use the likelihood criterion and you may also wish to change the current substitution model before issuing the above command.
Back to questions

How do I obtain likelihoods corresponding to each individual nucleotide site in my data using the first tree in memory?

lscores 1 / sitelikes;

Notes: you must first instruct PAUP* to use the likelihood criterion and you may also wish to change the current substitution model before issuing the above command.
Back to questions

How do I force PAUP* to use the branch lengths I specify when computing site likelihoods?

Assuming that you have a tree file (for example, "foo.tre") in which descriptions of trees contain branch length information, you could read in the trees from this file and preserve the branch length information as follows:
gettrees file=foo.tre storebrlens;
lscores 1 / sitelikes userbrlens;

Notes: you must first instruct PAUP* to use the likelihood criterion and you may also wish to change the current substitution model before issuing the above commands.

An example of a tree file containing one unrooted tree with branch length information is shown below. In this example, all branches in the four-taxon unrooted tree have length 0.1 except for the central branch, which has length 0.2

#nexus
begin trees;
utree best = (taxonA:0.1,taxonB:0.1,(taxonC:0.1,taxonD:0.1):0.2);
end;

Back to questions

How do I perform a Kishino-Hasegawa test to see if the support for the first and second trees stored in memory is significantly different?

lscores 1-2 / khtest;

Notes: you must first instruct PAUP* to use the likelihood criterion and you may also wish to change the current substitution model before issuing the above command.
Back to questions

What is the difference between the transition/transversion ratio and the transition/transversion rate ratio?

The transition/transversion rate ratio is simply the instantaneous rate of transitions divided by the instantaneous rate of transversions. I will refer to this quantity as k. If k is 1.0, this means that transitions are occurring at the same rate as transversions. The transition/transversion ratio, however, is the probability of any transition (over a single unit of time) divided by the probability of any transversion (over a single unit of time). To find the probability of any transition during a single unit of time, one must consider each of the ways a transition can occur (i.e., A to G, G to A, C to T, and T to C) and add together the probabilities of each (note that this will be a sum of four terms). Likewise, finding the probability of any transversion during a single unit of time involves a sum of eight terms (i.e., A to C, A to T, G to C, G to T, C to A, C to G, T to A, and T to G). The probability of the specific transition A to G can be determined as follows: it is the probability that one begins in state A and changes from state A to state G in a single unit of time. Using the Felsenstein 1981 substitution model, the probability of the second part of the above statement, namely the probability of changing from state A to state G, can be written as pGb. The first part of the statement, namely the probability of starting with state A, is simply the equilibrium nucleotide frequency of A, or pA. The transition/transversion ratio, then, involves the equilibrium base frequencies, whereas the transtition/transversion rate ratio does not. Still another definition of transition/transversion ratio exists. That definition is that this ratio is the observed number of transitions between two sequences divided by the observed number of transversions between two sequences. This definition is problematic because the magnitude of this measure depends on the amount of time separating the two sequences being considered. It is thus difficult to compare meaningfully transition/transversion ratios obtained in this way across different pairs of sequences, since these will generally be separated by different amounts of time. Also, one should be aware that the symbol k has been used in other contexts; for example, k as used in the model implemented in the program DNAML is not comparable to k as described here.
Back to questions

How do I tell PAUP* to estimate the transition/transversion ratio when using the HKY substitution model?

set criterion=likelihood;
lset nst=2 basefreq=empirical variant=hky;
lset tratio=estimate;

Back to questions

How do I take account of rate heterogeneity across sites using a discrete gamma distribution, four rate categories, and a shape value of 0.2?

set criterion=likelihood;
lset rates=gamma ncat=4 shape=0.2;

Back to questions

How do I estimate the shape parameter when I am using a four-category discrete gamma distribution to account for heterogeneity in rates across sites?

set criterion=likelihood;
lset rates=gamma ncat=4 shape=estimate;

Back to questions

How do I tell PAUP* to estimate the proportion of invariant sites?

set criterion=likelihood;
lset pinvar=estimate;

Back to questions

How do I tell PAUP* to assume there are no invariant sites?

set criterion=likelihood;
lset pinvar=0;

Back to questions

How do I tell PAUP* to estimate the proportion of invariant sites and and estimate the shape parameter of a discrete, four-category gamma distribution applied to the sites that are not invariant?

set criterion=likelihood;
lset pinvar=estimate;
lset rates=gamma ncat=4 shape=estimate;

Back to questions

I think most of the rate heterogeneity in my sequences are the result of codon structure. How can I tell PAUP* to assume a different rate for each codon position (i.e., estimate site-specific rates)?

set criterion=likelihood;
charpartition codons = firstpos:1-.\3, secondpos:2-.\3, thirdpos:3-.\3;
lset rates=sitespec siterates=partition:codons;


At this point, any command that causes likelihoods to be computed will make use of the charpartition named codons and a different rate will be estimated for each codon position class of sites.
Back to questions

How do I tell paup to use site-specific rates that I have already estimated?

How do I tell paup to use site-specific rates that I have already estimated? You can do this a couple of different ways. The first way is to estimate the rates on a given tree and then apply the estimated rates by using the previous option. In the following example, a character partition defines three genes and the site specific rates for each gene are estimated on a neighbor joining tree. Finally, a heuristic search is executed using the site-specific rates estimated on the neighbor joining tree.
charpartition genes=g1:1-300, g2:301-600, g3:601-700;
nj;
lscore 1/rates=sitespec siterates=partition:genes;
lset rates=sitespec siterates=previous;
hsearch;

The second way to use previously estimated site-specific rates is to define them explicitly in a rate set. In the following example 1st, 2nd, and 3rd positions are assigned a rate of 2, 1, and 3, respectively. Characters sets are used to defined which characters represent the codon positions.
charset 1stpos = 2-457\3 660-896\3;
charset 2ndpos = 3-457\3 661-896\3;
charset 3rdpos = 4-457\3 662-.\3;
rateset codonrates = 2.0:1stpos, 1.0:2ndpos, 3.0:3rdpos; 
lscore / rates=sitespec siterates = rateset:codonrates;

Back to questions

When I estimate the shape parameter of the gamma-distributed rates model and the proportion of invariable sites simultaneously, PAUP tells me that pinvar is zero even though the empirical number of invariable sites is about 30 percent. Why?

When you use gamma-distributed rates, invariable sites can sometimes be accommodated by the left tail of the gamma distribution (i.e., while these sites are technically not "invariable", they are changing slowly enough that a fair number of constant sites are expected when the gamma shape parameter is small). The two parameters are highly correlated; often similar likelihood scores can be achieved with a small pinv and small gamma shape or a larger pinv with a correspondingly larger gamma shape. When the gamma shape parameter is larger, fewer low-rate sites are expected, and the pinv must increase to account for the presence of these low-rate sites. The following article deals with this issue in more depth:

Sullivan, J.; Swofford, D. L., and Naylor, G. J. P. The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Molecular Biology and Evolution. 1999; 16:1347-1356.

Back to questions

How do I get the relative probabilities for each ancestral base assignment?

There are basically two step to getting the relative probabilities of each base assignment. First you need to tell PAUP to display the values when characters are reconstructed and then you'll need to reconstruct the characters. The following block shows how this may be done.
begin paup;
nj;
set crit=like;
lset allprobs=yes;
describetrees 1/plot=no xout=internal;
end;

Back to questions

How do a import a pairwise distance matrix from another program into PAUP*?

The easiest way to do this is to include the custom distance in a NEXUS formatted distance block. For example, below is a distance matrix for four sequences followed by a paup block that uses the distances to build a neighbor joining tree.
#NEXUS
[!user defined distances]
Begin distances;
Dimensions ntax=4;
format nodiagonal;
matrix
t1
t2 4
t3 3 4
t4 2 3 4 ;
end;
[! nj with user defined distances]
Begin paup;
dset distance=user;
nj;
end;

A more detailed description of the distance block is given in the command reference pdf document .
Back to questions

How does PAUP* distribute missing or ambiguous changes proportionally to unambiguous changes?

Take for exampe the following sequences:
t1 aaaaaccg
t2 tgca-gtt
t3 tgcaagtt

The distance p-distance or dissimilarity between sequences t1 and t3 is pretty easy to calculate. That is, 6 of the 8 comparisons do not match, therefore the p-distance between t1 and t3 is 3/4 or .75. If you chose to ignore missing sites, the comparison between sequences t1 and t2 would be equally straightforward; 6 of the 7 comparison do not match giving a p-distance of .85714. Deciding to distribute the missing comparisons to the unambiguous changes tells PAUP* to look at all the "a" pairs between sequence t1 and t2. For the example above these are:
1 a-t
1 a-g
1 a-c
1 a-a

Distributing the changes proportionally to each unambiguous change would give 1/4 to each "a" comparison. Therefore if we tallied the number of comparisons between sequence t1 and t2 we would get a matrix that looked like this:
.   a    c    g    t
a 1.25 1.25 1.25 1.25
c      0    1.00 1.00
g           0    1.00
t                0
To get the p-distance we add up the off diagonals to get 6.75 differences out of 8 comparisons or .84375.
Back to questions

We need to do a likelihood search on a UNIX machine with a general time reversible model (I+Gamma), i.e. some sites assumed to be invariable with gamma distributed rates at variable sites, with a heuristic search with 10 repetitions random addition taxa and TBR branch swapping ?

begin paup;
set criterion=likelihood;
lset nst=6 basefreq=empirical;
lset pinvar=estimate;
lset rates=gamma ncat=4 shape=estimate;
hsearch nreps=10 addseq=random swap=tbr;
end;

Notes: This analysis would be expected to take a very long time if more than four or five taxa are included in the analysis. Simply using the GTR model is going to cost a lot in terms of computation time, since there are many more rate parameters that need estimating in GTR compared with HKY or even simpler models. The amount of time could be reduced considerably by not estimating both the gamma shape parameter and the pinvar parameter. Instead of pinvar=estimate, for example, use pinvar=0.1, and instead of shape=estimate, use shape=0.25. These values need not come out of thin air, however. One could supply a pretty good tree, estimate these parameters using that tree, and then set the pinvar and shape parameters to those estimates for purposes of conducting a search. Once the search is finished, these parameters could be estimated again to see if they change much. If so, it might be worth redoing the search using the new, better estimates.
Back to questions

I have a sequence data set for which I would like to infer the phylogeny. What is a sequence of analyses that I can perform that will cover most potential pitfalls I am likely to encounter?

begin paup;
log file=log.txt start;
set criterion=parsimony; hsearch nreps=10 addseq=random swap=tbr; savetrees file=mp.tre brlens;
set criterion=distance; dset distance=logdet objective=me; hsearch nreps=10 addseq=random swap=tbr; savetrees file=me.tre brlens;
set criterion=likelihood; lset nst=2 basefreq=empirical rates=gamma ncat=4; lset tratio=estimate shape=estimate; lscore 1; lset tratio=previous shape=previous; hsearch nreps=1 swap=tbr start=1; savetrees file=ml.tre brlens;
log stop; end;
Notes: This PAUP block infers phylogeny using three different optimality criteria and stores all the output in a log file named log.txt. The first analysis uses the criterion of maximum parsimony to obtain a tree (or set of trees), which are then saved to a tree file named mp.tre. The second analysis uses the minimum evolution criterion in conjunction with LogDet/paralinear pairwise distances and saves the resulting tree(s) in a tree file named me.tre. The third analysis makes use of the maximum likelihood criterion in conjunction with the HKY-gamma substitution model. Estimates of the tratio (the transition/transversion ratio) parameter and the gamma shape parameter are obtained using the LogDet tree already in memory. Then, these two parameters are fixed at these estimated values for the duration of the heuristic search. The tree(s) resulting from the hsearch command are saved in the tree file ml.tre. Each phylogeny method has its Achilles Heel. Maximum parsimony can be mislead if there is too much heterogeneity in substitution rates among lineages (the classic "long edges attract" problem) in the underlying true phylogeny. Minimum evolution using LogDet distances can be mislead if there is too much site-to-site rate heterogeneity, or if some of the pairwise distances are undefined (use the "showdist" command to check). Maximum likelihood under the HKY-gamma model can be mislead if parameters that are assumed to be constant across the phylogeny (such as the tratio or base frequencies) actually vary among lineages in the true phylogeny. Because of these inherent weaknesses in individual methods, it is a good idea to try several methods that have strengths in different areas. If you get the same tree under all methods, then you are in good shape because apparently there are no major pitfalls in your data. Of course, there may be a major unknown pitfall affecting all methods, but there is not much you can do about that. You may get trees that are not identical, but are also not significantly different (in terms of data support) from one another. The Kishino-Hasegawa test can be used to see whether one tree is supported significantly less by the data than a second tree. The last possibility is that you get truly different trees from the different methods. In this case, it is in your best interest to examine these trees carefully for evidence that a particular method has fallen victim to its particular Achilles Heel. For example, if you log.txt file shows that there is strong rate heterogeneity in your data (let's say the shape parameter is estimated to be 0.01), then the LogDet and parsimony trees fall under a certain degree of suspicion compared to the likelihood tree, which should be relatively immune to this pitfall since the model used allows for rate heterogeneity. If the parsimony tree differs from the LogDet and likelihood tree, look for evidence of long branch (edge) attraction in the parsimony tree. If the LogDet tree differs from the parsimony and likelihood trees, see if the base frequencies vary considerably between tip taxa (a useful tool for this purpose is the basefreq command). In other words, use PAUP* as a tool for discovering what evolutionary factors are at work in your particular set of sequences, and use this knowledge to make an intelligent choice between the alternatives presented to you by different phylogeny methods.
Back to questions