| |
Review of March 27, 2002 Draft Guidelines for Action Effectiveness
Research Proposals for FCRPS Offsite Mitigation Habitat Measures
April 19, 2002 | document ISRP 2002-5
[Report as memorandum to Doug Marker]
TO: Doug Marker, Fish and Wildlife Division Director,
Northwest Power Planning Council
FROM: ISRP
SUBJECT: Review of March 27, 2002 Draft Guidelines for
Action Effectiveness Research Proposals for FCRPS Offsite Mitigation
Habitat Measures by C. Paulsen, S. Katz, T. Hillman, A. Giorgi, C. Jordon,
M. Newsom, and J. Geiselman
At your and BPA's request, the ISRP reviewed the March 27, 2002 Draft
"Guidelines for Action Effectiveness Research Proposals for FCRPS
Offsite Mitigation Habitat Measures" (Paulsen et al.) that BPA would
like to reference in their solicitation letter for the mainstem and
systemwide solicitation. The Action Agencies (Bonneville Power
Administration, United States Army Corps of Engineers, and the Bureau of
Reclamation) and the National Marine Fisheries Service (NMFS) have
developed the proposed guidelines for sponsors and reviewers of action
effectiveness research projects. For this review, the ISRP was asked
specifically to address three questions, namely:
- Do the guidelines clearly identify the general approach and scope of
research projects needed under the BO RPA action 183?
- Do the project specific guidelines bound the projects at an
appropriate level?
- Are the statistical design requirements for detectability of effects
at an appropriate level for the action categories targeted?
The short answers, to all three questions, are "No," but the
ISRP recognizes that the challenge to designing an adequate monitoring
program is very great, and the ISRP also recognizes the present document
for evaluating effectiveness of actions as an important step in the right
direction. Generally speaking, the ISRP was gratified to see that this
sort of attention is beginning to be paid to planning for monitoring, and
we hope that these discussions will continue until a really adequate
overall monitoring design is defined, and detailed guidance and criteria
are developed for monitoring proposals. In the interim, if the present
document is released, we think its effects will be incrementally positive,
but this document does not provide sufficient guidance to ensure
generation of the right mix of monitoring proposals, and it does not
provide a comprehensive set of criteria for review of such proposals.
The document makes clear that a carefully developed design will be
needed in order to generate monitoring data capable of answering the
check-in questions required by the BiOp, but the document does not provide
that design or explain how such a design will arise. The ISRP is concerned
that the premise of the document is that an adequate design may be arrived
at by some unspecified, bottom-up process during the course of reviewing
and funding many independent projects, whereas we believe that the design
requirements, especially with respect to documenting effects on salmon
survival, are not likely to be met without strong top-down articulation of
a design and strong top-down coordination of its implementation.
It will prove frustrating to all concerned parties if the hope is that
the review process by the ISRP will somehow, by itself, provide the
top-down coordination. For the next round of reviewing, this would put the
ISRP in the position of essentially rejecting all the proposals on the
grounds that there is no overarching design. A proper overall design has
to be developed first. It would be a good idea for the ISRP to review that
design, once a draft is available. When an acceptable design has been
agreed upon, then the ISRP can use it as one point of reference in
evaluating individual action effectiveness monitoring proposals.
FOUR KINDS OF MONITORING UNDER CONSIDERATION
It must be understood that the contemplated monitoring is intended to
provide data to answer four quite distinct kinds of questions, which
respectively present quite different kinds of requirements for data
quality, design of controls, and variables actually measured. It would be
a mistake to attempt a "one size fits all" approach to guidance
for the four respective data uses. The four data uses are:
- Determination of salmon survival response to mitigation actions for
BiOp check-in;
- Determination of salmon stock status (abundances and survival rates)
for BiOp check-in;
- Determination of response of habitat variables to mitigation
actions; and
- General monitoring of status and trends in habitat variables.
Use (1) seems to be the highest priority from the perspective of the
document under review, though use (2) will also be very important in the
decision making process under the BiOp check-in. Uses (1) and (2) are
explicitly defined in the BiOp, and the required data quality should be
evaluated with reference to the BiOp and in consultation with the office
in NMFS that will eventually use the data to arrive at the determinations.
Coincidentally, use (1) presents the greatest difficulties for design, and
probably presents the highest demands for data quality (as will be
discussed below).
Use (3), as described in the present document, apparently is proposed
as a surrogate for addressing use (1) when direct answers to the question
behind use (1) are not feasible. The motivation for this is not clearly
explained. Page 1 states "Because habitat actions may require time
beyond the BiOp planning horizon to manifest fish survival effects, there
is a need to establish cause-and-effect relationships between tributary
actions and physical/environmental effects that may be more
immediate." However, the implicit bottom line is still fish survival
(and reproduction). The ISRP is aware of the problem and conflict. We have
strongly and explicitly encouraged principal investigators on projects to
collect surrogate data on habitat variable that may respond more quickly
to mitigation actions than the salmon populations themselves. We have done
so under a working assumption that salmon will also respond to some of
these habitat changes. We have also been explicit that salmon
survival data will have to be collected to ground truth the habitat/salmon
response linkage. In the end, no amount of cause-and-effect studies
relating habitat actions to habitat responses will substitute for
cause-and-effect studies relating actions to fish responses or
cause-and-effect studies relating fish responses to habitat responses.
Without good measurement of salmon survival, the exercise is empty for
purposes of substituting use (3) for use (1).
Use (4) is not described at length in the document. This use is
implicit, in that there will doubtless be some desire to use available
monitoring data to ground truth some of the assumed maps of habitat
quality in EDT, and to verify some of the expert opinion that is the basis
for the assumed relationships between salmon production and habitat
quality in EDT, and to calibrate the relation between habitat variables
and salmon survival so as to validate use (3) for purposes of the BiOp
check-in.
FURTHER COMMENTS ON DETERMINING SURVIVAL RESPONSE TO MITIGATION, AND
DETERMINING STOCK ABUNDANCE AND SURVIVAL
This guidance document treats a data-gathering program that is
primarily directed at a specific pair of questions, which have already
been articulated by another set of parties who will be the data users. The
questions originate in the RPA section of the 2000 Hydro Biological
Opinion, where there was a commitment to collect data that would be the
basis for determinations, at a 5-year and 8-year check-in, to reassess the
status of the listed stocks, and to verify that mitigation activities have
had the intended effect. The key quantities, both for the stock status
evaluation and for the verification of mitigation are the stage-by-stage
survival rates of the salmon.
Presumably, the resolution in survival rate estimation needed for the
status evaluation, and the magnitude of the effect that is going to be
looked for in survival rate responses to mitigation activities, are
spelled out (or at least implied) in the BiOp itself. The authors of the
present document should consult with the appropriate office in NMFS to
establish what resolution in survival rate estimation is needed in order
to comply with the letter and/or spirit of the BiOp. This may be a
situation analogous to the Habitat Conservation Plan for the mid-Columbia
PUDS, which spelled out exactly the required (and very stringent)
resolution for their survival rate estimation to be in compliance.
The present document does not explain why the proposed blanket
performance characteristics (20% type I error rate, 0.8 power, for
detecting a 10% change in 5 yrs, referencing only the Oregon monitoring
plan) should be the appropriate resolution for the specific need of the
BiOp check-in. How was this determined? As discussed below, it seems
unlikely that these criteria are equally and simultaneously applicable to
all indicator variables, including for example sediment particle size,
water temperature, fish survival rates, and fish abundance statistics.
Once we establish what resolution is required, by the BiOp, for
measuring salmon survival rates for purposes of status evaluation at the
check-in, the calculation of the design parameters (how many fish need to
be marked, when and where, and how high a detection efficiency in
resighting is needed, when and where) to achieve this resolution is a
familiar exercise for professional statisticians. Then we would ask,
simply, whether there is a design, calculated to deliver that resolution.
The ISRP does not believe it reasonable to hope that this design
calculation, carried out independently in a flurry of independent
projects, will fortuitously converge on the correct overall design. We are
fully convinced that an overall design, with the proper characteristics,
must be drafted first, for the entire system. This overall design would in
effect constitute part of the specification of an RFP, and then individual
project proponents could submit their plan for satisfying the design
requirements in their proposed piece of the data gathering operation. Some
capable group with statistical expertise (perhaps the Paulsen et al. team)
should in fact produce the master design that the individual projects can
follow.
Once we establish what resolution is required, by the BiOp, for
measuring changes in salmon survival rates for purposes of verifying
effectiveness of mitigation actions, a much more difficult calculation
must be initiated to determine the design parameters that can deliver this
resolution. The difficulty here is that the intention is to measure a
change in response to a "treatment." In other words, the bottom
line is a comparison between a "treatment" and a
"control." Because of the known large interannual variation in
salmon survival rates, a "before/after" design will not prove
effective "control." Therefore, there will be a pervasive need
to establish actual control sites with concurrent measurements. This will
not be easy either, since there are inevitable site-specific differences.
Probably, the most promising general design strategy will be to combine
before/after measurements at treatment/control sites, using the
"before" baseline measurements at all sites to attempt to factor
out the site-specific differences, and then compare the "before/after
(in time)" change at the control sites with the "before/after
(treatment)" change at the treatment sites. Even so, some care will
have to go into the selection of control sites to ensure that there is not
too much site specific difference in responses to background temporal
variation.
The ISRP does not believe that it is reasonable to expect successful
specifications for such a complex and sophisticated design to arise
spontaneously from the collective of individual independent projects,
guidance or no guidance. Some capable, authoritative, and knowledgeable
group is really going to have to take responsibility for drafting an
adequate design. Even with such a draft design in existence, it will be an
institutional challenge, given the way the Fish and Wildlife Program
operates, to ensure implementation. Ultimately, the success of the
implementation will depend on the combined results of many sites and many
projects, where project A may be making measurements that serve as a
control for project B, etc. Considerable top-down planning and management
is needed to make this happen. If a technically correct overall design is
adopted as part of the specifications of the eventual solicitation,
conformity to the design could be an important review criterion that the
ISRP could apply to the individual proposals.
When some group is constituted to draft a design for the effectiveness
studies, it would be a good thing if the group communicated with the NMFS
parties who will be making the "check-in" determinations. Plans
for data gathering always benefit from communication between the data
providers and the data users.
FURTHER COMMENTS ON DETERMINING HABITAT RESPONSE TO MITIGATION
Use (3), as described in the present document, presents great
difficulties with respect to defining the appropriate resolution of
estimates. The idea is to measure responses of habitat variables to
mitigation, where it is hoped that the habitat response is a predictor of
eventual salmon response. So the design must consider both the habitat
effect size that is likely to be sufficient to eventually cause a
meaningful salmon response and the habitat effect size that is expected to
result from the mitigation action.
In this context, it seems quite implausible that the proposed blanket
performance characteristics stated in the present document (20% type I
error rate, 0.8 power, for detecting a 10% change in 5 yrs) are really the
correct resolution for this suite of needs. How could one decide that all
variables, from sediment particle size, to water temperature, to fish
survival rates, should be measured with the same level of resolution?
After all, some variables are inherently more difficult and expensive to
measure to a given level of resolution, and some variables will turn out
to be much more important than others for answering the actual question.
So, one would think that the actual demands for resolution should be
subjected to some cost-benefit analysis, variable by variable. Where is
the paper trail of that thought process?
FURTHER COMMENTS ON MONITORING HABITAT STATUS AND TRENDS
Because use (4) is the least specific in its data demands, and is in
some sense "exploratory," the proposed generic guidelines for
resolution stated in the document may be a reasonable starting point, but
only a starting point, with respect to this use. The proposed blanket
performance characteristics are 20% probability of type I error rate, and
0.8 power, for detecting a 10% change in 5 years. The document references
an Oregon monitoring plan as the source of this proposed resolution. In
many respects, use (4) is more directed at "effectiveness
research" than at "effectiveness monitoring." As such, it
should be understood essentially as a Tier III enterprise in the
classification scheme of the 2000 BiOp.
We would note that the proposed statement of performance (limits on
type I and type II errors) is framed in terms of hypothesis testing, which
is not the way the data actually will be used. The data will be used
predominantly for estimating effect size, so it might be more natural to
state (and justify) the desired resolution in those terms.
Most research analyses traditionally approached by classical
statistical testing of a null hypothesis can be formulated alternately in
terms of estimation of effects and accuracy and precision of estimates.
Mechanical application of classical hypothesis testing is prone to distort
design priorities when the real interest is in the size of an effect (see
Johnson, 1999). It should be noted also that a study may be planned to
achieve a certain expected precision for an anticipated effect size, but
the realized precision will depend in part on aspects of an unknown future
such as the actual effect size and the actual behavior of confounding
variables.
A design that is optimized for one indicator variable might not be
optimal for other indicator variables that in fact are equally, or more,
important. Effective guidance will have to give more consideration to the
spectrum of questions that are being asked, the scientific hypotheses that
are current candidates for answers, and the indicators that will be
measured to judge among the scientific hypotheses.
Planning for research projects could be conducted in terms of desired
precision of estimates of the differences (or ratios) of key indicator
variables on study (treatment) sites and reference sites. The ability to
estimate the size of an effect, in this sense, may depend on more
stringent data quality standards that would be required just to detect a
trend.
Because use (4) has an exploratory component, the guidance should offer
some room for a variety of possible analytical models and research
designs, such as bioequivalence testing, and fitting models with a
predictor variable selected along a gradient, and modern methods like
Akaike Information Criterion (as described, for example, in Burnham and
Anderson 1998) for selecting among alternative predictive models.
The value of gradient designs arises when adequate reference areas are
not available. Then a gradient of conditions can be represented in the
study, from which a model may be fitted to the indicator variables as a
function over the ranges of values of a collection of predictor
(independent) variables. And so on.
Bioequivalence testing corresponds to testing the reverse null
hypothesis (Dixon and Garrett 1992, Erickson and McDonald 1995, McDonald
and Erickson 1994, USEPA 1988, 1989). This approach to hypothesis testing
makes much more sense than the standard classical hypothesis testing of a
null hypothesis in the current setting. A brief introduction is given
here. Successful application requires thorough understanding of the
relationships between indicator variables, classification variables,
dependent variables, and independent variables.
The Action Agencies (Bonneville Power Administration, United States
Army Corps of Engineers, and the Bureau of Reclamation) and the National
Marine Fisheries Service (NMFS) wish to provide guidelines for sponsors
and reviewers of action effectiveness research projects. Projects are
being proposed to ?fix? problems in tributary habitat. In this sense,
projects are not research projects with no effect on the environment.
Projects are both action management (at least for the study areas) and
research. Evaluation is both adaptive management and analysis of research
results.
The reverse null hypothesis is: the study area is defective in one or
more indicator variables (Table 3 in the draft guidelines) when compared
to a reference area or standard value. The alternative hypothesis against
which this is tested is: the study area is bioequivalent to the reference
area or the standard. The fact that a project is proposed indicates a
prima facie belief that the area under study is deficient in one or more
of the indicator variables as listed in Table 3 of the document.
For example, the reverse null hypothesis for depth of fines might be:
the depth of fines in the study area is greater than 120% of depth of
fines in the reference area. This would be tested against the alternative:
the depth of fines in the study area is less than 120% of depth of fines
in the reference area.
In the example, the implicit definition is given that the study area is
bioequivalent to the reference area with respect to this indicator
variable if depth of fines is less than 120% of the depth of fines on the
reference area. The management action under evaluation in the research
project is judged to remedy the deficiency in the study area if the
reverse null hypothesis can be rejected.
The advantage of bioequivalence testing is in adaptive management. A
management action (or actions) in a research project is judged to be
successful if the affected area is bioequivalent to the reference area at
the end of the project. Burden of proof is placed on the principal
investigators and the analysis, not on the action agencies. The action
agencies do not have to interpret significance tests that may be
statistically significant but of no practical importance, compared to
non-significant results on large effects that may be of extreme
importance. Research projects would be judged successful if the treatments
result in movement to bioequivalence or toward bioequivalence.
NOTES ON STYLE AND CLARITY
The present document suffers from a certain obscurity, owing both to an
excess of jargon and ambiguity about the intended audience. The purported
audience is the community of researchers who potentially will submit
proposals to the program, but who are not statistical specialists. For
that audience, the present document would probably be too theoretical and
technical (i.e., not adequate as an instruction manual). Realistically,
the present document is more of a white paper, where the actual audience
is research planners, administrators, and reviewers such as the ISRP. As a
white paper, the document describes some aspects of the nature of the
problem, and it presents some ideas about a strategy to address the
problem. In that context, the theoretical and technical tone is fairly
appropriate, but still some effort could be made to improve clarity.
The document has lots of marginally defined and undefined terminology.
Bureaucratic language of the BiOp is used repeatedly without being
described in other, more common words. The document refers the reader to
Hillman and Giorgi (2002) for help with terms, etc., but this reference is
not listed in the Literature Cited section. Is this an internal,
un-published document?
CONCLUSION
The ISRP concludes that the draft RME guidance document is a useful
first step at developing necessary guidelines for planners, investigators,
and reviewers, although in the present form it is too narrow and
insufficiently targeted to actual information needs for universal
evaluation of action effectiveness research proposals. Such guidance is
sorely needed. Incremental revisions will be helpful to focus on the
audience(s), objectives, and intended outcomes. It is recommended that
this draft be revised in two ways. First, revision as a scoping document
for planners and administrators is needed to provide clear top-down
guidance that actually stipulates overall design specifications to address
the need for collecting data to answer the BiOp check-in questions about
effectiveness of mitigation actions on salmon survival. This document
would focus on the first two questions posed to the ISRP, with abbreviated
reference to the third question (and reference to a second document).
Second, revision as a more methodology-oriented document intended for
use in a bottom-up fashion by researchers and technicians, where guidance
on alternative methods, statistical approaches and statistical design
requirements are given in detail. This document would focus on the third
question posed to the ISRP. It would present the scope and guidelines in
an abbreviated fashion (with reference to the first document). The ISRP
recommends further discussion among authors and potential researchers and
the community of data users before consensus on the specific methods is
solidified for this second document. The first document is perhaps the
most appropriate and serviceable as an accompaniment for a solicitation;
the second document would be made available as reference for those
preparing and reviewing proposals.
REFERENCES
Burnham, K.P. and D.R. Anderson. 1998. Model selection and inference: A
practical information-theoretic approach. Springer-Verlag, NY.
Dixon, P. M., and Garrett, K. A. (1992). Statistical issues for field
experimenters. Technical Report. Savannah River Laboratory, University of
Georgia, Drawer E, Aiken SC 29802.
Erickson, W.P, and L.L. McDonald. 1995. Tests for bioequivalence of
control media and test media in studies of toxicity. Environmental
Toxicology and Chemistry 14:1247-1256.
Johnson, Douglas H. 1999. The insignificance of statistical
significance testing. Journal of Wildlife Management 63:763-772.
McDonald, L.L., and W.P. Erickson. 1994. Testing for bioequivalence in
field studies: Has a disturbed site been adequately reclaimed? In Statistics
in Ecology and Environmental Monitoring, pp. 183-197 [Eds. D.J.
Fletcher and B.F.J. Manly], Otago Conference Series No. 2, University of
Otago Press, Dunedin, New Zealand.
U.S. EPA. (1988). Guidance Document for Conducting Terrestrial Field
Studies. Ecological Effects Branch, Hazard Evaluation Division, Office
of Pesticide Programs, U.S. Environmental Protection Agency, 401 M Street,
S.W., Washington, DC 20460.
U.S. EPA. (1989). Methods for Evaluating the Attainment of Cleanup
Standards. Volume 1: Soils and Solid Media. Statistical Policy
Branch (PM-223), Office of Policy, Planning, and Evaluation, U. S.
Environmental Protection Agency, 401 M Street, S.W., Washington, DC 20460.
^ top
|
|