1) open the data file combine2.dat from the file menu and execute it

2) exclude the charactersets cytb and junk2 from the data menu

3) start a heuristic search from the analysis menu

You could obtain the same result by executing a simple text file containing the following paup blocks.

begin paup; execute d:\data\combine2.dat; exclude cytb junk2; hsearch; end;

Note: Because there are a number of beta and test versions of the program you should mention the specific version of PAUP* somewhere in the methods.

- Felsenstein, J. 2002. Inferring Phylogenies. Sinauer Associates. Sunderland, Massachusetts.
- Li, W. 1997. Molecular Evolution. Sinauer Associates. Sunderland, Massachusetts.
- Nei, M. and Kumar, S. 2000. Molecular Evolution and Phylogenetics. Oxford University Press, New York, New York.
- Page, R. D. and Holmes, E. C. 1998. Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Oxford
- Hillis, D. M., Moritz, C., and Mable, B. Molecular Systematics (2nd ed.) Sinauer Associates. Sunderland, Massachusetts.

32 for a 32-bit machine

64 for a 64-bit machine

This limit stems from the use of bit manipulation to perform the state-set calculations in parsimony, and corresponds to the "word length" of the computer--usually 32 bits (e.g., most x86 PCs) but occasionally 64 bits (e.g., Alpha, G5, etc).

Begin characters; Dimensions nchar=200; Format datatype=dna interleave;

set criterion=likelihood;

set criterion=parsimony;

set criterion=distance; dset objective=me;

set criterion=distance; dset objective=lsfit;The default least-squares objective function is for weighted least squares, with the weights equal to the reciprocal of the square of the distance between each pair of taxa (see below).

set criterion=distance; dset objective=lsfit power=0;In general, the "power" specifies the power to which the reciprocal of the distance between each pair of taxa is raised. Raising this value to the zero(th) power is equivalent to weighting all pairwise deviations by the constant "1".

- FrePars
- GCG MSF
- Hennig86
- MEGA
- NBRF-PIR
- Phylip 3.X
- Simple test
- Tab-delimited text

tonexus format=gcg fromfile=mygcgfile.gcg tofile=mynexusfile.nex;If you are using the Mac interface you can get to the import dialog box by selecting File and then Import data...

delete P._articulata P._gracilis P._fimbriata P._robusta; delete 'P. articulata' 'P. gracilis' 'P. fimbriata' 'P. robusta'; delete 'P. articulata'-'P. fimbriata' P._robusta' delete 2 3 4 7; delete 2-4 7;Note: If you plan to refer to a set of taxa frequently, you may find it convenient to setup a

begin sets; taxset junk = P._articulata P._gracilis P._fimbriata P._robusta; end;After the taxset is defined, simply refer to the taxset to ignore these taxa in futher analyses. For example:

delete junk;

restore P._articulata P._gracilis P._fimbriata P._robusta; restore 'P. articulata' 'P. gracilis' 'P. fimbriata' 'P. robusta'; restore 'P. articulata'-'P. fimbriata' P._robusta' restore 2 3 4 7; restore 2-4 7;Note: If you've defined a taxset then you can use the following syntax:

restore junk;

exclude leaf_length leaf_width stamen_number carpel_number; exclude 'leaf length' 'leaf width' 'stamen number' 'carpel number'; exclude leaf_length-stamen_number 'carpel number'; exclude 2 3 4 7; exclude 2-4 7;If you planned to exclude these characters frequently it would be a good to define them in a characters set. This way you could exclude them by referencing the character set. For example:

charset foo = 1-4 7; exclude foo;Here's how to tell PAUP* to ignore nucleotide sites 359 to 367, 586 to 588 and 693 to the last site in further analyses.

exclude 359-367 586-588 693-.;Here's how to tell PAUP* to ignore every third nucleotide site in further analyses (starting with the third site).

exclude 3-.\3;

include leaf_length leaf_width stamen_number carpel_number; include 'leaf length' 'leaf width' 'stamen number' 'carpel number'; include leaf_length-stamen_number 'carpel number'; include 2 3 4 7; include 2-4 7;Here's how to tell PAUP* to include previously excluded nucleotide sites 359 to 367, 586 to 588 and 693 to the last site in further analyses.

include 359-367 586-588 693-.;Here's how to tell PAUP* to include every third nucleotide site (starting with site number 1) in further analyses.

include 1-.\3;

exclude constant;

exclude uninf;

#NEXUS Begin data; Dimensions ntax=5 nchar=20; Format datatype=protein interleave symbols="ACGT" gap=-; Matrix t1 VKYPNTNEEG t2 VKYPNTNEEG t3 VKYPNTNEEG t4 VKYPNTNEDG t5 VKYPNTNEDG

t1 AGCTAAACCT t2 AGCTAGACCT t3 AGCTAGACTT t4 AGCTAGACTT t5 AGCTAAACTT ; end;

Begin Assumptions; charset protein = 1-10; charset dna = 10-.;

usertype 5_1 stepmatrix = 4 acgt - 5 1 5 5 - 5 1 1 5 - 5 5 1 5 - ; end;

Begin paup; outgroup t2 t3; ctype 5_1:dna; hsearch addseq=random; end;

begin data; dimensions ntax=4 nchar=10; format datatype=dna gap=- interleave symbols="01"; options gapmode=missing; matrix one ATGGT-- two AtggT-- three A-GGTTG four A-GGTAG one 011 two 011 three 100 four 100 ; end;

charpartition codons = firstpos:1-.\3, secondpos:2-.\3, thirdpos:3-.\3;

charpartition genes = gene1:1-210, gene2:230-.;Next I'll need to exclude the characters contained in the NEXUS data set but not defined in either of the two partitions -- gene1 or gene2.

exclude 211-229;Now I can use the partition homogeneity test.

hompart partition=genes;

constraints lagomorph (monophyly) = (1,4,6,8,10,(2,3,5,7,9));Here, the word

constraints lagomorph = ((2,3,5,7,9));

loadconstr file=foo.tree;If the trees in "foo.tree" are to be considered

loadconstr file=foo.tree asbackbone;

hsearch constraints=lagomorph enforce=yes;Other search-related commands for which the constraints and enforce options are available are illustrated in the examples below:

nj constraints=lagomorph enforce=yes; [neighbor-joining] alltrees constraints=lagomorph enforce=yes; [exhaustive search] bandb constraints=lagomorph enforce=yes; [branch-and-bound search]

begin paup; execute my_nexus_file.nex; bootstrap treefile=futz1.out nreps=10 bseed=0 search=heuristic; end; ... begin paup; execute my_nexus_file.nex; bootstrap treefile=futz3.out nreps=10 bseed=0 search=heuristic; end; begin paup; execute my_nexus_file.nex; gettrees file=futz1.out StoreTreeWts=yes mode=3; gettrees file=futz2.out StoreTreeWts=yes mode=7; gettrees file=futz3.out StoreTreeWts=yes mode=7; contree all/strict=no majrule=yes usetreewts=yes; end;

savetrees file=foo.trees brlens;

gettrees file=foo.trees;

bootstrap treefile=bstrees.tre;

alltrees;

hsearch addseq=random nreps=500 swap=none;

set maxtrees = 1000 increase=no; hsearch addseq=random nreps=10 nchuck=100 chuckscore=1;

set dstatus=1;you will probably never catch PAUP* at just the moment when it is finishing one replicate and about to begin the next. As a result, it is very common for the last entry of a replicate to report a likelihood score that is worse than the best likelihood score found thus far.

Another important reason is that there is no simple expression for calculating the number of tree bisection-reconnection (TBR) or subtree pruning-regrafting (SPR) rearrangements that will be made on a given tree. That is, the shape of a starting tree will determine the total number of rearrangements that can be made using one of the aforementioned swapping techniques. The problem is further complicated by the fact that it is not known how many suboptimal trees will be found during a search before optimal trees are found, and what portion of potential rearrangements of a given tree will be performed before a better tree is found.

Your second choice is to use the command-line version of PAUP* for OS X. Starting with the forthcoming release of Beta 11, Mac users can use a command-line version of PAUP* in addition to the classic Mac GUI version. The command-line version runs on Mac OS X in a terminal window and takes full advantage of Mac OS X's memory protection and preemptive multitasking but LACKS a Graphical User Interface (GUI). The Beta 11 installer and updater will automatically add the command-line program to your system path. To start command-line program type "paup" in terminal window. See the quick-start document http://paup.csit.fsu.edu/quickstart.pdf for more details regarding the use of the command-line version of PAUP*.

- If you plan to use an Appletalk printer then you will need to Turn on AppleTalk. Go to your System Preferences > Network > Configure ... Select the AppleTalk Tab and then the "Make AppleTalk active" toggle.
- Open the Desktop Printer Utility. This is typically located in the Utilities folder within the Applications (Mac OS 9) folder. A window named "New Desktop Printer" should open after a few seconds (give it some time).
- Select the printer type that you would like to use and follow the instructions.

set autoclose=yes;

set pause=No|Silent|Beep|msg

log file=tree.log; showtree 1; log stop;Note: The windows interface of PAUP* 4.0 does not print graphical trees. We plan to make graphical printing a part of the windows package but this feature will not be available in 4.0. The program TreeView written by Rod Page is an execellent program for creating and manipulating graphical trees from NEXUS files. To output NEXUS trees from any version of PAUP* use the savetrees command.

savetrees file=mytree.trees;

NOTE: For display reason, the curly braces are replaced by square brackets. To get the results described above replace the square brackets with curly braces.

#NEXUS begin data; dimensions ntax=4 nchar=4; format symbols="012"; matrix t1 11 00 t2 1[12] 10 t3 02 1(01) t4 00 11 ; end;

begin assumptions; typeset myTypesetName = ord: 1 4 5; end; begin paup; assume typeset = myTypesetName; end;You can skip the assume command and set the character type from within the assumptions block if you precede the typeset name with an asterisk ("*"). For example:

begin assumptions; typeset *myTypesetName = ord: 1 4 5; end;Yet another way to set character types is by using the ctype command from within a paup block or at the command line. For example the following command has the same effect as those given above:

ctype ord:1 4 5;

If you don't care about what ancestral states PAUP has used there is a way to get a patristic distance for all of the characters in your data set. First, save the tree in matrix representation including the branch lengths as a weight set.

matrixrep brlens=yes file=mytreefile.nex;Next, open the matrix tree file and apply the weight set to all of the characters.

execute mytreefile.nex; assume wtset=brlens;Finally, rebuild the tree and generate the patristic distance matrix:

hs; describetrees 1/ patristic=yes;The patristic distances will now equal the summed branch lengths.

Here is a simple data matrix that will generate this result.

characters taxa 1 23 45 ----------------- A 0 00 00 B 0 11 11 C 0 11 11 D 1 00 11 E 1 00 00 F 1 00 11Analysis of this matrix using PAUP gives two most-parsimonious trees:

: A B C D F E : \ \ / \ / / : \ * * / : \ \ / / : \ \ / / : tree1 \ * / : \ | / : \|/ : * : : A B C D F E : \ \ / / / / : \ * / / / : \ \ / / / : \ * / / : \ \ / / : tree2 \ * / : \ | / : \|/ : *

An MPR on tree1 for character 1 requires two steps, and there are two of them:

: A B C D F E A B C D F E : 0 0 0 1 1 1 0 0 0 1 1 1 : \ \ / \ / / \ \ / \ / / : \ 0 1 / \ 0 1 / : \ \ / / \ \ / / : \ \ / / \ \ / / : \ 0 / \ 1 / : \ | / \ | / : \|/ \|/ : 0 1

Because one of these two MPRs assigns a change leading to the group DF, PAUP does not collapse the branch connecting DF to the remainder of the tree.

On the other hand, tree2 has only a single MPR for character 1:

: A B C D F E : 0 0 0 1 1 1 : \ \ / / / / : \ 0 / / / : \ \ / / / : \ 1 / / : \ \ / / : \ 1 / : \ | / : \|/ : 1

This character does not provide support for the BCD group, and since there are no other characters that support it, the branch leading to BCD is collapsed, yielding the tree:

: A B C D F E : 0 0 0 1 1 1 : \ \ / | / / : \ 0 | / / : \ \ | / / : \ \ | / / : \ \ | / / : \ \ | / / : \ \|/ / : \ 1 / : \ | / : \|/ : 1

PAUP considers both of these trees to be distinct, recognizing that there is a tree for which the group DF receives support (albeit ambiguous support) and another tree for which DF receives no support.

set criterion=parsimony; pscores 1-2 / khtest;

set criterion=parsimony; hompart partition=foo nreps=1000 seed=1234567 search=bandb;

begin assumptions; charset coding = 2-457 660-896; charset noncoding = 1 458-659 897-898; charset 1stpos = 2-457\3 660-896\3; charset 2ndpos = 3-457\3 661-896\3; charset 3rdpos = 4-457\3 662-.\3;

usertype 5_1 stepmatrix = 4 acgt - 5 1 5 5 - 5 1 1 5 - 5 5 1 5 - ; end;

begin paup; ctype 5_1:3rdpos; end;

begin assumptions; charset coding = 2-457 660-896; charset noncoding = 1 458-659 897-898; end;Next, you can issue the "weights" command at the command line or within a paup block. In the example below, the first "weights" command assigns a weight of three to all characters defined as coding. The second "weights" command does the same thing except the character are directly identified.

begin paup; weights 3:coding; end;or

weights 3:2-457, 3:660-896;

d(ac) <= d(ct) + d(at)

According to this rule the stepmatrix given below qualifies.

3 < 1+3Stepmatrix "asym" (asymmetric):

TO: a c g t

FROM: a - 3 1 3

c 2 - 3 1

g 1 4 - 3

t 3 1 3 -Whereas the following matrix would be inconsistent with the triangle inequality:

Stepmatrix "asymNT" (asymmetric triangle violation):

TO: a c g t

FROM: a - 5 1 3

c 2 - 3 1

g 1 4 - 3

t 3 1 3 -and PAUP* would adjust the a to c transformation from 5 to 4.

- s= length (number of steps) required by the characters on the tree being evaluated
- m= minimum amount of change that the character may show on any conceivable tree
- g= maximum possible amount of change that a character could possible require on any conceivable tree (i.e., the length of the character on a completely unresolved bush).

ci= m/s

ri= (g-s)/(g-m)

rc= ri*ci

hi= 1-c

To get the overall value for a suite of characters you'll simply caculate the sums of s, m, and g for all the charachers in the suite and use the summed values in the equations described above.

set criterion=likelihood; lset nst=1 basefreq=equal;

set criterion=likelihood; lset nst=2 basefreq=equal;

set criterion=likelihood; lset nst=1 basefreq=empirical;

set criterion=likelihood; lset nst=2 basefreq=empirical variant=f84;

set criterion=likelihood; lset nst=2 basefreq=empirical variant=hky;

set criterion=likelihood; lset nst=6 basefreq=empirical;

lscores all;Notes: you must first instruct PAUP* to use the likelihood criterion and you may also wish to change the current substitution model before issuing the above command.

lscores 1 / sitelikes;Notes: you must first instruct PAUP* to use the likelihood criterion and you may also wish to change the current substitution model before issuing the above command.

gettrees file=foo.tre storebrlens; lscores 1 / sitelikes userbrlens;Notes: you must first instruct PAUP* to use the likelihood criterion and you may also wish to change the current substitution model before issuing the above commands.

An example of a tree file containing one unrooted tree with branch length information is shown below. In this example, all branches in the four-taxon unrooted tree have length 0.1 except for the central branch, which has length 0.2

#nexus begin trees; utree best = (taxonA:0.1,taxonB:0.1,(taxonC:0.1,taxonD:0.1):0.2); end;

lscores 1-2 / khtest;Notes: you must first instruct PAUP* to use the likelihood criterion and you may also wish to change the current substitution model before issuing the above command.

set criterion=likelihood; lset nst=2 basefreq=empirical variant=hky; lset tratio=estimate;

set criterion=likelihood; lset rates=gamma ncat=4 shape=0.2;

set criterion=likelihood; lset rates=gamma ncat=4 shape=estimate;

set criterion=likelihood; lset pinvar=estimate;

set criterion=likelihood; lset pinvar=0;

set criterion=likelihood; lset pinvar=estimate; lset rates=gamma ncat=4 shape=estimate;

set criterion=likelihood; charpartition codons = firstpos:1-.\3, secondpos:2-.\3, thirdpos:3-.\3; lset rates=sitespec siterates=partition:codons;At this point, any command that causes likelihoods to be computed will make use of the charpartition named

charpartition genes=g1:1-300, g2:301-600, g3:601-700; nj; lscore 1/rates=sitespec siterates=partition:genes; lset rates=sitespec siterates=previous; hsearch;The second way to use previously estimated site-specific rates is to define them explicitly in a rate set. In the following example 1st, 2nd, and 3rd positions are assigned a rate of 2, 1, and 3, respectively. Characters sets are used to defined which characters represent the codon positions.

charset 1stpos = 2-457\3 660-896\3; charset 2ndpos = 3-457\3 661-896\3; charset 3rdpos = 4-457\3 662-.\3; rateset codonrates = 2.0:1stpos, 1.0:2ndpos, 3.0:3rdpos; lscore / rates=sitespec siterates = rateset:codonrates;

Sullivan, J.; Swofford, D. L., and Naylor, G. J. P. The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Molecular Biology and Evolution. 1999; 16:1347-1356.

begin paup; nj; set crit=like; lset allprobs=yes; describetrees 1/plot=no xout=internal; end;

#NEXUS [!user defined distances] Begin distances; Dimensions ntax=4; format nodiagonal; matrix t1 t2 4 t3 3 4 t4 2 3 4 ; end; [! nj with user defined distances] Begin paup; dset distance=user; nj; end;A more detailed description of the distance block is given in the command reference pdf document .

t1 aaaaaccg t2 tgca-gtt t3 tgcaagttThe distance p-distance or dissimilarity between sequences t1 and t3 is pretty easy to calculate. That is, 6 of the 8 comparisons do not match, therefore the p-distance between t1 and t3 is 3/4 or .75. If you chose to ignore missing sites, the comparison between sequences t1 and t2 would be equally straightforward; 6 of the 7 comparison do not match giving a p-distance of .85714. Deciding to distribute the missing comparisons to the unambiguous changes tells PAUP* to look at all the "a" pairs between sequence t1 and t2. For the example above these are:

1 a-t 1 a-g 1 a-c 1 a-aDistributing the changes proportionally to each unambiguous change would give 1/4 to each "a" comparison. Therefore if we tallied the number of comparisons between sequence t1 and t2 we would get a matrix that looked like this:

To get the p-distance we add up the off diagonals to get 6.75 differences out of 8 comparisons or .84375.. a c g t a 1.25 1.25 1.25 1.25 c 0 1.00 1.00 g 0 1.00 t 0

begin paup; set criterion=likelihood; lset nst=6 basefreq=empirical; lset pinvar=estimate; lset rates=gamma ncat=4 shape=estimate; hsearch nreps=10 addseq=random swap=tbr; end;Notes: This analysis would be expected to take a

begin paup; log file=log.txt start;Notes: This PAUP block infers phylogeny using three different optimality criteria and stores all the output in a log file named log.txt. The first analysis uses the criterion of maximum parsimony to obtain a tree (or set of trees), which are then saved to a tree file named mp.tre. The second analysis uses the minimum evolution criterion in conjunction with LogDet/paralinear pairwise distances and saves the resulting tree(s) in a tree file named me.tre. The third analysis makes use of the maximum likelihood criterion in conjunction with the HKY-gamma substitution model. Estimates of the tratio (the transition/transversion ratio) parameter and the gamma shape parameter are obtained using the LogDet tree already in memory. Then, these two parameters are fixed at these estimated values for the duration of the heuristic search. The tree(s) resulting from the hsearch command are saved in the tree file ml.tre. Each phylogeny method has its Achilles Heel. Maximum parsimony can be mislead if there is too much heterogeneity in substitution rates among lineages (the classic "long edges attract" problem) in the underlying true phylogeny. Minimum evolution using LogDet distances can be mislead if there is too much site-to-site rate heterogeneity, or if some of the pairwise distances are undefined (use the "showdist" command to check). Maximum likelihood under the HKY-gamma model can be mislead if parameters that are assumed to be constant across the phylogeny (such as the tratio or base frequencies) actually vary among lineages in the true phylogeny. Because of these inherent weaknesses in individual methods, it is a good idea to try several methods that have strengths in different areas. If you get the same tree under all methods, then you are in good shape because apparently there are no major pitfalls in your data. Of course, there may be a major unknown pitfall affecting all methods, but there is not much you can do about that. You may get trees that are not identical, but are also not significantly different (in terms of data support) from one another. The Kishino-Hasegawa test can be used to see whether one tree is supported significantly less by the data than a second tree. The last possibility is that you get truly different trees from the different methods. In this case, it is in your best interest to examine these trees carefully for evidence that a particular method has fallen victim to its particular Achilles Heel. For example, if you log.txt file shows that there is strong rate heterogeneity in your data (let's say the shape parameter is estimated to be 0.01), then the LogDet and parsimony trees fall under a certain degree of suspicion compared to the likelihood tree, which should be relatively immune to this pitfall since the model used allows for rate heterogeneity. If the parsimony tree differs from the LogDet and likelihood tree, look for evidence of long branch (edge) attraction in the parsimony tree. If the LogDet tree differs from the parsimony and likelihood trees, see if the base frequencies vary considerably between tip taxa (a useful tool for this purpose is the basefreq command). In other words, use PAUP* as a tool for discovering what evolutionary factors are at work in your particular set of sequences, and use this knowledge to make an intelligent choice between the alternatives presented to you by different phylogeny methods.

set criterion=parsimony; hsearch nreps=10 addseq=random swap=tbr; savetrees file=mp.tre brlens;

set criterion=distance; dset distance=logdet objective=me; hsearch nreps=10 addseq=random swap=tbr; savetrees file=me.tre brlens;

set criterion=likelihood; lset nst=2 basefreq=empirical rates=gamma ncat=4; lset tratio=estimate shape=estimate; lscore 1; lset tratio=previous shape=previous; hsearch nreps=1 swap=tbr start=1; savetrees file=ml.tre brlens;

log stop; end;