Why Mesquite was made
We give two answers, the practical and the poetic, and a comment on the relationship between MacClade and Mesquite.
The practical answer
Mesquite represents a new approach to computing for evolutionary biology.
In recent years there has been a proliferation of computer programs
for phylogenetic analysis, each designed for some particular analysis
(e.g., see Felsenstein's
compilation of
programs). As these often involve unique file formats and user interfaces,
it is difficult for users to move from one to another. Users tend to
become constrained to a few familiar analyses, since any given program
can't do
everything, and each program has costs in learning. As a programmer
one would like to respond by making a program that does everything,
but there are now
too many analyses available or conceivable for a single programmer
or programming team to keep up. We have seen the impact of these constraints
with MacClade: some users perform
particular analyses in MacClade not because they are the most appropriate
analyses for their questions, but simply because they are available
in a familiar program. We would like to add more flexibility to MacClade,
but
in a monolithic program this can be difficult to do, and even if easy,
there are more proposed methods than we could maintain in MacClade.
Hence, our goal was to design a general system for phylogenetic computing
to which different programmers could contribute modules. Bringing different
analytical tools into a common system increases possible analyses more
than additively. In the end, the system has grown beyond being strictly
phylogenetic,
including capabilities for calculations involving characteristics of
many organisms (e.g. population genetics and morphometrics) that need
not involve
phylogeny.
A second goal of Mesquite is to provide a graphical user interface that
will operate, more or less without modification, under different operating
systems (being written in Java).
Modularity and Flexibility
"Modularity" in computer progamming might follow different models.
It could follow the "Mr. Potato
Head" model, in which there is a central program to which different
peripheral calculations can be attached in specific places. This allows
useful, but limited, flexibility. Or, modularity could follow the "Lego" model,
in which building blocks are attached to other building blocks, and
so on indefinitely. This allows nearly unlimited flexibility. Mesquite's
modularity
is somewhat of a hybrid between these: there is a (small) central starting
point to which modules attach, but from there modules can be attached
to modules attached to modules, indefinitely, leading to considerable
flexibility in the
analyses that can be constructed.
To give an idea of the flexibility, consider the calculation of the parsimony score of a tree, the treelength. A treelength calculating module takes as input information a tree, and responds by returning its length. Such a module belongs to the general class of modules that return a number when passed a tree. Other modules belonging to this class ("NumberForTree") could return the likelihood of the tree, or a measure of the asymmetry of the tree's branching, or a measure of the tree's discordance with a containing species tree. A Tree Legend module can be written (and has been) that displays the treelength in a legend in the tree window, but the Legend module is designed so that the user can choose to display any other number for the tree, such as its likelihood, asymmetry, or discordance. If a programmer creates a new module to calculate a number for a tree such as the longest branch-length path from root to tip, and a user installs the module, then the longest path measurement would automatically become another option for the tree legend.
The Tree Legend is not the only place where analyses could use numbers for trees. A charting module could display the numbers calculated for a whole series of trees, or a tree search module could use the numbers to find a tree with minimum or maximum values for the number. When such modules are made, they can automatically have access to whatever NumberForTree modules are available. Thus, the chart could show treelength, or likelihood, or asymmetry, or discordance, or longest path. Likewise, the tree search module could seek to optimize any of those. If a programmer makes a new module to analyze numbers for trees, then suddenly all existing NumberForTree modules have a new context in which they can be analyzed. If a new NumberForTree module is made, it will appear as a new option under each of the modules making use of NumberForTree. Hence the number of alternative analyses rises as the product of numbers of modules of different interacting types.
Of course, the trees used had to come from somewhere. One module might supply the trees stored in a file, another might simulate trees using a simple markovian model of speciation and extinction, another might simulate trees as gene trees coalescing within a species tree. Characters likewise might come from a stored matrix, or might be simulated by a stochastic module of evolution, or might represent reshufflings of existing characters. This means that any calculations using trees or characters can either do their calculations on observed data and reconstructed trees, or can derive null distributions under stochastic models. The calculations don't have to do anything special to achieve this flexibility; they simply let the user choose the sources of trees and characters.
(For more details about modularity, see How Mesquite works)
A community of programmers
Our hope is that building-block style of the Mesquite system will encourage
programmers to write modules for their own favorite analyses. Another
attraction of the Mesquite system is that many of the details of reading
and writing
of files, user interface and graphical display are already taken care
of, and the programmer might worry only about a single calculation.
The system is built in Java and is therefore platform independent.
It is also
possible
for programmers to link in code written in C, C++, or some other language.
We have attempted to design the system so that a programmer's efforts can
be recognized as an independent, citable contribution. Modules or suites
of modules can have their own names, own manuals, be distributed and cited
separately. They simply run within the Mesquite system.
Mesquite source code is available for download.
This allows other programmers to modify existing source to create
new modules.
The poetic answer
The goals of Mesquite are these:
To change the economics of imagination in evolutionary biology There are three ways we envision Mesquite stimulating imaginative ideas and their successful spread:
- Stimulating the creation of ideas: analyses. With multiple alternative
modules available for various parts of an analysis, and with modules
specializing in questions from various branches of evolutionary
biology (e.g., phylogenetics,
molecular evolution, population genetics, geometric morphometrics)
the diversity and scope of analyses that can be constructed by
combining different modules
is great. Individual users can carry their imaginations through
to an analysis that no one has tried previously. Indeed, Mesquite,
by offering the alternatives
to be combined, doesn't merely provide analytical tools for questions
that have existed: it suggests and provokes new questions.
- Stimulating the creation of ideas: biology. As does MacClade, Mesquite has an emphasis on visualization and exploration. An idea whether a particular hypothesis about the evolutionary history of a group, or a stochastic model of a process can be followed through to its consequences, and visualized. A biologist can ask "What if this were the phylogenetic tree?" and a character's evolution can be reconstructed or simulated on this tree, and the results visualized. A biologist can ask "What if the population had population sizes fluctuating in this way?",
and coalescence can be simulated, and the results visualized. In providing
users with the tool to ask "What if?" questions, Mesquite provides an extension of the imagination. Such tools are vital in a field whose ideas have consequences that are difficult to predict or grasp without the aid of a computer.
- Enhancing the efficient distribution of ideas: programs. The imagination of theoreticians and programmers has produced many valuable ideas for approaches and methods, and many valuable programs to implement them. However, some of the ideas haven't been translated to programs, and many of the programs haven't been as much explored and used as would have been good. We don't know, as a field, how many important ideas will lie unused for decades until they are rediscovered. By allowing the programmer to focus on the precise idea proposed (Mesquite providing much of the housekeeping code for the programmer), Mesquite may allow some ideas, that might never have been implemented, to be realized as tools. By providing a fairly user-friendly context in which modules can operate, Mesquite may encourage some programs to be used more broadly and more easily than otherwise.
To continue to promote a phylogenetic perspective in evolutionary biology The
last few decades have seen the realization of the importance of viewing
organismal diversity and evolution in the light of phylogeny. This
revolution is analogous to and as fundamental to its field as
the revolution in cosmology from a Newtonian view of
space to an Einsteinian
view of space (Maddison and Pérez, 2000). Just as mass curves space,
phylogeny has curved the space of biological diversity, providing a
distortion on the distribution of traits
of organisms we see around us.
MacClade and Mesquite
are both designed to provide a corrective lens, to help us to see organisms
and their traits in their natural orientation within
this curved space along the phylogeny. Mesquite's modularity allows
this perspective to be extended to fields such
as morphometrics, in which a phylogenetic perspective has relatively
recently begun to suffuse the field.
Which to use, Mesquite or MacClade?
Version 4 of MacClade (Maddison
& Maddison, 2000) was released in October 2000, and the MacOS
X compatible version 4.04 in July 2002. The reader might wonder
why we have
been working
on two different programming efforts, and whether they are intended
for different uses. Although Mesquite's extensibility means
that eventually it could take on all of the functions of MacClade,
in fact for the near future Mesquite will not. Some calculations
and functions of MacClade's tree window might not be available
in Mesquite for a while, including particular charts (e.g.,
Changes
and Stasis), equivocal cycling, some of the parsimony options
(irreversible, stratigraphic, Dollo), and some options for
tree printing
(e.g., saving Tree as graphics file or to clipboard). The most
significant advances of MacClade 4 over MacClade 3 are in the
data editor, where editing of molecular sequences is much more
sophisticated, with tools for manual sequence alignment and
on-the-fly
translation to amino acids. MacClade's data editor might maintain
important advantages over that in Mesquite for a while.
In addition, for many of its functions MacClade will remain faster
and easier to use than Mesquite. The speed advantage is due primarily
to its being in native code instead of Java. MacClade, being a
non-extensible program written for a single operating system,
has its components more tightly integrated than Mesquite's modules
can be, and its user interface tailored for the MacOS. The means
that users may find MacClade easier and simpler to use than Mesquite.
While we have worked hard to make Mesquite easy to understand
and use, its modular nature means it is unlikely to be as simple
to the user as MacClade.
Thus, MacClade will continue to be used and useful, even though Mesquite
is based on a newer architecture. MacClade has its strengths,
and Mesquite will have different strengths. We are using MacClade
4 with our own data (when we get time to work on our own data...),
and expect to continue using it indefinitely.
We imagine that in the long-term future MacClade will give way
to Mesquite as Mesquite matures. For the next several years, however,
the two will coexist and be complementary.
References
Maddison, D.R. and W.P. Maddison. 2000. MacClade version 4: Analysis
of phylogeny and character evolution. Sinauer Associates, Sunderland
Massachusetts.
Maddison, W.and T. Pérez, 2000. Biodiversidad
y lecciones de la historia. In: Enfoques contemporáneos para
el estudio de la biodiversidad [Hernández, H.M., A. García
Aldrete, F. Álvarez and M. Ulloa, editors]. Instituto de Biología,
UNAM, Mexico. Pp. 201-220.