Proposed Research Plan


Taxonomic sampling. A list of about 70 taxa is shown in Table 1. There is some generic duplication, showing all relevant species that data are available for; we will not use more than one exemplar per genus in this study. This list was compiled after careful consideration of existing datasets and analyses in Hasebe et al. (53) and Pryer et al. (109). Sampling ensures representation of all major fern clades, but with particular emphasis given to basal groups identified in Pryer et al. (109). Vouchered DNA extracts are available for many taxa (ca. 20-30 of those required for this study) in Wolf (Utah State Univ.), Pryer (Field Museum) and Hasebe (Univ. of Tokyo) labs. No funds are requested for foreign travel. Living material is available for other taxa from several sources):

  1. Various U.S. botanical gardens (Univ. California; New York Botanical Garden; Missouri Botanical Garden);
  2. Large private fern gardens (e.g., Barbara Joe Hoshizaki, Los Angeles; Phyllis and Ed Bates, San Diego);
  3. Various botanical contacts, all knowledgeable about ferns, who have already provided living material to Smith for sequencing and other purposes, and who have agreed to provide material in their future collecting trips include (institutions and countries of possible visitation/field work are in parentheses): Dr. Henk van der Werff (Missouri Botanical Garden; Ecuador, Peru, Madagascar, and Vietnam); Dr. Blanca León (USDA-ARS; Peru); Mr. Donald Hodel (Univ. California Extension; Tahiti, New Caledonia, Honduras); Dr. John Game (Univ. California, Berkeley; Cook Islands, Fiji, Hawaii); Dr. Daniel Palmer (Hawaii); Dr. Kenneth Wilson (California State University, Northridge; Hawaii, Ecuador); Dr. Hanna Tuomisto, Univ. Turku, Finland; Amazonian Peru, Ecuador, and Colombia); Mr. Alexandre Salino (Univ. Federal de Minas Gerais, Brasil; Brazil); Dr. Ramón Riba (Universidad Autonoma Metropolitana, Mexico City; Mexico); Ms. Mónica Palacios-Rios (Instituto de Ecología, Xalapa; Mexico); Dr. Patrick Brownsey (Museum of New Zealand; New Zealand); Ms. Patricia Sánchez (Univ. California, Berkeley; Society Islands-Moorea; Colombia); Dr. Raul Rivero (Selby Botanical Garden; Costa Rica, Venezuela); Dr. Jefferson Prado (Instituto de Botanica, São Paulo; Brazil); Mr. Mateo Rutherford (Univ. California, Berkeley; Cuba, Costa Rica, Colombia); Dr. Iván Valdespino (Panama); Mr. Julián Mostacero (Venezuela); Dr. Robbin Moran (New York Botanical Garden; Ecuador); Dr. Dedy Darnaedi (Indonesian Institute of Sciences, Bogor; Indonesia); Drs. Masahiro Kato and Mitsuyasu Hasebe (Univ. Tokyo; China, Malaysia, Indonesia).


Vouchers for sequence and morphological data will be deposited at UC, F, and UTC. In addition, both Wolf (142) and Pryer (unpubl.) have successfully managed to extract fern DNA from herbarium material.Morphological information will be obtained for the same taxa for which we will collect sequence data and we will follow, for the most part, a relatively strict exemplar method (rationale, assumptions, benefits, and drawbacks described in detail in 109). In doing so, we will construct comparable morphological and molecular data sets, each with the same set of taxa. Outgroups: All recent land plant phylogenetic analyses that have included pteridophyte representatives indicate that ferns shared a most common recent ancestor with seed plants. As in Pryer et al. (109), gymnosperm represenatives will serve as outgroup taxa in the phylogenetic analyses.


Molecular techniques. DNA will be isolated from freeze-dried fresh collections following Doyle and Doyle (34), with modifications by Soltis et al. (128), and from herbarium material following Rogers and Bendich (113). PCR will be used to symmetrically amplify portions of cp and nr DNA for sequencing templates (direct sequencing). PCR products will be purified with Millipore Ultrafree-MC filter units, and sequenced using Prism Amplitaq-FS dye terminator kits (Applied Biosystems). Sequencing reactions will be purified with an ethanol precipitation protocol and sequenced (both strands, at least 85% overlap) using an Applied Biosystems (ABI) Prism 377 automated DNA sequencer, which has a capacity for large scale sequence analysis. Sequence fragments will be verified and assembled into contigs using Sequencher version 3.0 (GeneCodes, Ann Arbor, MI) on a Power-Mac. Sequences can be aligned in Sequencher and exported into Nexus or Phylip formats. Pryer and Wolf have devised a pragmatic collaborative strategy for division of labor of molecular work.


Choice of molecules for analysis. Because different genes can have different evolutionary rates, datasets that include multiple genes have the potential to be informative at various levels of phylogeny (24). In addition, because the role of molecular phylogenies as a prime basis for classification is controversial, and because the complex relationship between molecular phylogenies and taxonomic practice is still evolving (22), major taxonomic conclusions should not be proposed on the basis of a single gene sequence; congruent evidence from multiple, independent genealogical sources is preferred (14). The genes we have selected for analysis are, of the ones that are known and well-studied at present, the most likely to yield important phylogenetic results when studying higher-level vascular plant taxa. In addition to the rbcL data that are already in hand, this study will generate sequence data for two chloroplast genes (16S) and (atpB) and two nuclear genes (18S) and (26S, 5'end) for approximately 50-60 taxa (Table 1).


Morphological data set. Pryer et al. (109; Appendix 2) considered 115 vegetative and reproductive morphological/anatomical characters of the fern sporophyte and gametophyte. A full morphological data set of 77 parsimony-informative characters was developed for the 50 taxa included in that study. Pryer et al. (109) amply demonstrated the utility of taking morphology into account in formulating hypotheses on phylogenetic relationships of ferns, since morphology was able to provide increased support at the base of the tree where rbcL alone could not. Eight characters that were autapomorphic in that study will be considered for the taxa sampled here, as well as a few additional characters that have since been compiled by Pryer and Smith, which are mostly relevant to basal taxa identified in that analysis. The morphological matrix will also be revised to include the additional basal taxa that will be sequenced. We do not intend to simply "plug in" new taxa and their character scores into the existing matrix, but to also reevaluate many of our character definitions, character state choices, and also the theoretical decisions that we made for that initial morphological analysis [e.g., issues involving the scoring of missing data vs. missing characters (86, 99, 98, 132); character state ordering for multistate characters (57, 58, 79a); and we will also reassess our character homology decisions (25, 106, 107, 114-117)]. The advantage of a well-supported hypothesis of phylogenetic relationships that is based also on an extensive morphological data set is that it in turn can lead to a modern classification of ferns that uses morphological synapomorphies to circumscribe the monophyletic groups that are recognized. This cannot be achieved with a phylogeny based only on molecules. In future studies, we plan to coordinate our analyses with those of paleobotanists to add information from the fern fossil record(26, 30, 67a).


Data combinability issues. Because a sequence-based molecular phylogeny is necessarily a gene phylogeny, it may not agree with the organismal phylogeny due to such biological processes as introgression, lineage sorting, and gene duplication (33, 60, 79). Phylogenetic trees derived from different data sets may also differ due to sampling error or to the use of an inappropriate evolutionary model for a given data set (9, 112). The fit between assumptions made in a phylogenetic analysis and the evolutinary processes that generated the character data are of greatest importance (67b). Because our primary interest is the phylogeny of organisms rather than genes, this problem of differential phylogenetic history among data sets argues for the use of multiple data sets. Increased efforts are now being made to include both morphological and molecular data sets (or multiple molecular data sets) in phylogenetic studies (32, 35, 79, 97, 101). However, this practice is not without controversy regarding how best to integrate phylogentic information from disparate sources (2, 9, 11, 21, 35, 67b, 68, 79, 94, 111, 112, 130, 141). Three different approaches can be taken when dealing with multiple data sets: total evidence, separate analysis, and conditional combination. The latter involves performing heterogeneity tests: e.g., Kishino-Hasegawa ML test (71), T-PTP tests (36), Rodrigo et al. method (112), but most of these have not been empirically examined (but see 79), and so there is little consensus on how powerful these tests are at detecting heterogeneity. We will perform independent analyses to detect any strongly supported conflicts among data sets or strongly different rates of evolution among genes. We will also explore published heterogeneity tests before combining data. Conditional combination (excluding some taxa, differential character weighting) will be used if any strong conflicts are identified. Conflicts between morphological and molecular results in Pryer et al. (109) did not reflect serious contradictions, but rather different levels of resolution of the two sorts of data in different parts of the tree; combined analyses demonstrated that the phylogenetic signal from morphology and rbcL were complementary. Our goal is to obtain a best estimate of the organismal phylogeny that demonstrates a thoughtful and careful analysis taking into account current controversial issues just mentioned, so that we can have confidence in the overall phylogenetic framework proposed and in the patterns of character evolution that we will infer.


Phylogenetic analysis methods. Sequence alignment of protein coding genes (atpB, rbcL) are easily done by eye, since there are no indel regions. For nuclear and chloroplast ribosomal DNA (18S, 26S, and 16S), however, sequences will be aligned both manually and with the Clustal w multiple sequence alignment program (136); both primary and secondary structures will be used to assist in homology assignments (72, 88). Regions of the alignments for which homology of residues cannot be reasonably assumed will be excluded from phylogenetic analyses. Various weighting and coding options will be explored, including use of gaps as a fifth state (59). Effects of base compositional biases will also be considered (12, 20). Phylogenetic analyses will be done using maximum parsimony (PAUP: 132), and maximum likelihood methods (PHYLIP: 38 and fastDNAml: 54, 102, 103), evaluating discrepancies in the context of the assumptions underlying each method. Morphological analyses will be restricted to parsimony methods, though maximum likelihood procedures are being investigated (P.O. Lewis, pers. comm.). Bootstrap (37, 61, 124) and decay (6, 27) analyses will be conducted to provide a measure of support for the phylogenetic results.


Comparative methods analyses. Comparative biology searches for evidence of correlated evolution for two or more traits across a group of species. The key to comparative analyses depends on understanding the phylogenetic relationships of the taxa under consideration (8, 49). Explicit phylogenetic analyses, which include morphological data, not only permit generation of hypotheses of relationship among taxa, but also permit the study of character evolution (24, 23, 39, 122). A phylogeny allows estimation of the number of times that a trait evolved and the direction or temporal sequence of character transformation. Establishing the order of evolution of traits is critical in choosing among alternative evolutionary explanations. Taking advantage of MacClade's (87) interactive environment for exploring phylogeny, we will optimize morphological characters onto our best estimate of phylogeny according to parsimony. These studies can provide insight into the origin of characters that were key events in the early evolution and diversification of ferns.


Cladograms can help to identify not only the context in which a feature evolved, but also whether it may have been strictly correlated with the evolution of some other character or performance advantage (49, 83, 84, 85, 104). Based on the conclusions of our findings we will investigate statistical tests that can determine whether different characters that exist in discrete states show evidence for correlated evolution in the context of phylogeny. For example, we may want to test the hypothesis that species diversification in ferns was correlated to the "streamlining" (narrower stalk, fewer spores, etc.) of the sporangium. Tests have been developed to determine whether character changes are clustered (125) or concentrated in particular regions of the phylogeny (84, 87). A more recent test by Pagel (104) assesses whether a pattern of association across the taxa is evidence for correlated evolutionary change in two discrete characters. It can be used to test highly specific hypotheses about the temporal order and direction of changes in two variables.


N.B. Numbers in parentheses refer to literature citations listed in References.