In this section, the committee presents its conclusions and recommendations on defining future evaluation objectives, strengthening the output assessment, and improving use of the APR to capture data for future evaluations. The goal is to address aspects of the process that might be reconsidered to improve future evaluations and to ensure that evaluation results optimally inform NIDRR's efforts to maximize the impact of its research grants.
Defining Future Evaluation Objectives
The primary focus of the summative evaluation was on assessing the quality of research and development outputs produced by grantees. The evaluation did not include in-depth examination or comparison of the larger context of the funding programs, grants, or projects within which the outputs were produced. Although capacity building is a major thrust of NIDRR's center and training grants, assessment of training outputs, such as the number of trainees moving into research positions, was also beyond the scope of the committee's charge.
NIDRR's program mechanisms vary substantially in both size and duration, with grant amounts varying from under $100,000 (fellowship grants) to more than $4 million (center grants) and their duration varying from 1 to more than 5 years. Programs also differ in their objectives, so the expectations of the grantees under different programs vary widely. For example, a Switzer training grant is designed to increase the number of qualified researchers active in the field of disability and rehabilitation research. In contrast, center grants and Model System grants have multiple objectives that include research, technical assistance, training, and dissemination. Model System grants (BMS, TBIMS, SCIMS) have the added expectation of contributing patient-level data to a pooled set of data on the targeted condition.
The number of grants to be reviewed was predetermined by the committee's charge as 30, which represented about one-quarter of the pool of 111 grants from which the sample was drawn. The committee's task included drawing a sample of grants that reflected NIDRR's program mechanisms. The number of grants reviewed for any of the nine program mechanisms included in the sample was small—the largest number for any single program was 10 (FIP). Therefore, the committee made no attempt to compare the quality of outputs by program mechanism.
NIDRR directed the committee to review two outputs for each of the grantee's projects. A grantee with a single project had two outputs reviewed, a grantee with three projects had six outputs reviewed, and so on. Although larger grants with more projects also had more outputs reviewed, the evaluation design did not consider grant size, duration, or the relative importance of a given project within a grant.
The committee was asked to produce an overall grant rating based on the outputs reviewed. Results at the grant level are subject to more limitations than those at the output level because of the general lack of information about how the outputs did or did not interrelate; whether, and if so how, grant objectives were accomplished; and the relative priority placed on the various outputs. In addition, for larger, more complex grants, such as center grants, a number of expectations for the grants, such as capacity building, dissemination, outreach, technical assistance, and training, are unlikely to be adequately reflected in the committee's approach, which focused exclusively on specific outputs. The relationship of outputs to grants is more complex than this approach could address.
Recommendation 6-3: NIDRR should determine whether assessment of the quality of outputs should be the sole evaluation objective.
Considering other evaluation objectives might offer NIDRR further opportunities to continuously assess and improve its performance and achieve its mission. Alternative designs would be needed to evaluate the quality of grants or to allow comparison across program mechanisms. For example, if one goal of an evaluation were to assess the larger outcomes of grants (i.e., the overall impact of their full set of activities), in addition to the methods used in the current output assessment, the evaluation would need to include interviewing grantees about their original objectives to learn about how the grant was implemented and any changes that may have occurred in the projected pathway, how various projects were tied into the overall grant objectives, and how the outputs demonstrated the achievement of the grant and project objectives. The evaluation would also involve conducting bibliometric or other analyses of all publications and examining documentation of the grant's activities and self-assessments, including cumulative APRs over time. Focusing at the grant level would provide evidence of movement along the research and development pathway (e.g., from theory to measures, from prototype testing to market), as well as allow for assessment of other aspects of the grant, such as training and technical assistance and the possible synergies of multiple projects within one grant.
If the goal of an evaluation were to assess and compare the impact of program mechanisms, the methods might vary across different program mechanisms depending on the expectations for each, but would include those mentioned above and also stakeholder surveys to learn about the specific ways in which individual grants have affected their intended audiences. With regard to sampling methods, larger grant sample sizes that allowed for generalization and comparison across program mechanisms would be needed. An alternative would be to increase the grant sample size in a narrower area by focusing on grants working in specific research areas across different program mechanisms or on grants with shared objectives (e.g., product development, knowledge translation, capacity building).
NIDRR's own pressing questions would of course drive future evaluations, but other levels of analysis on which NIDRR might focus include the portfolio level (e.g., Model System grants, research and development, training grants), which NIDRR has addressed in the past; the program priority level (i.e., grants funded under certain NIDRR funding priorities) to answer questions regarding the quality and impact of NIDRR's priority setting; and institute-level questions aimed at evaluating the net impact of NIDRR grants to test assumptions embedded in NIDRR's logic model. For example, NIDRR's logic model targets adoption and use of new knowledge leading to changes/improvements in policy, practice, behavior, and system capacity for the ultimate benefit of persons with disabilities (National Institute on Disability and Rehabilitation Research, 2006). The impact of NIDRR grants might also be evaluated by comparing grant proposals that were and were not funded. Did applicants that were not funded by NIDRR go on to receive funding from other agencies for projects similar to those for which they did not receive NIDRR funding? Were they successful in achieving their objectives with that funding? What outputs were produced?
The number of outputs reviewed should depend on the unit of analysis. At the grant level, it might be advisable to assess all outputs to examine their development, their interrelationships, and their impacts. A case study methodology could be used for related subsets of outputs. If NIDRR aimed its evaluation at the program mechanism or portfolio level, sampling grants and assessing all outputs would be the preferred method. For output-level evaluation, having grantees self-nominate their best outputs, as was done for the present evaluation, is a good approach.
Although assessing grantee outputs is valuable, the committee believes that the most meaningful results would come from assessing outputs in the context of a more comprehensive grant-level and program mechanism-level evaluation. More time and resources would be required to trace a grant's progress over time toward accomplishing its objectives; to understand its evolution, which may have altered the original objectives; and to examine the specific projects that produced the various outputs. However, examining more closely the inputs and grant implementation processes that produced the outputs would yield broader implications for the value of grants, their impact, and future directions for NIDRR.
Strengthening the Output Assessment
The committee was able to develop and implement a quantifiable expert review process for evaluating the outputs of NIDRR grantees, which was based on criteria used in assessing federal research programs in both the United States and other countries. With refinements, this method could be applied to the evaluation of future outputs even more effectively. Nonetheless, in implementing this method, the committee encountered challenges and issues related to the diversity of outputs, the timing of evaluations, sources of information, and reviewer expertise.
Diversity of Outputs
The quality rating system used for the summative evaluation worked well for publications in particular, which made up 70 percent of the outputs reviewed. Using the four criteria outlined earlier in this chapter, the reviewers were able to identify varying levels of quality and the characteristics associated with each. However, the quality criteria were not as easily applied to such outputs as websites, conferences, and interventions; these outputs require more individualized criteria for assessing specialized technical elements, and sometimes more in-depth evaluation methods. Applying one set of criteria, even though broad and flexible, could not guarantee sufficient and appropriate applicability to every type of output.
Timing of Evaluations
The question arises of when best to perform an assessment of outputs. Technical quality can be assessed immediately, but assessment of the impact of outputs requires the passage of time between the release of the outputs and their eventual impact. Evaluation of outputs during the final year of an award may not allow sufficient time for the outputs to have full impact. For example, some publications will be forthcoming at this point, and others will not have had sufficient time to have an impact. The trade-off of waiting a year or more after the end of a grant before performing an evaluation is the likelihood that staff involved with the original grant may not be available, recollection of grant activities may be compromised, and engagement or interest in demonstrating results may be reduced. However, publications can be tracked regardless of access to the grantee. Outputs other than publications, such as technology products, could undergo an interim evaluation to enable examination of the development of outputs.
Sources of Information
Committee members were provided with structured briefing books containing the outputs to be reviewed. They were also provided with supplemental information on which members could draw as necessary to assign quality scores. These other sources included information submitted through the grantees' APRs and information provided in a questionnaire developed by the committee (presented in Appendix B). The primary source of information used by committee members in assigning scores was direct review of the outputs themselves. The supplemental information played a small role in assessing publications, whereas for outputs such as newsletters and websites, this information sometimes provided needed context and additional evidence helpful in assigning quality scores. However, it is important to note that the supplemental information represented grantees' self-reports, which may have been susceptible to social desirability bias. Therefore, committee members were cautious in using this information to serve as the basis for boosting output scores. Moreover, the APR is designed as a grant monitoring tool rather than as a source of information for a program evaluation, and the information it supplied was not always sufficient to inform the quality ratings.
To illustrate the limitations of the information available to the committee, the technical quality of a measurement instrument was difficult to assess if there was insufficient information about its conceptual base or its development and testing. Likewise, for conferences, workshops, and websites, it would have been preferable for the grantee to identify the intended audience so that the committee might have better assessed whether the described dissemination activities were successful in reaching that audience. For the output categories of tools, technology, and informational products, grantees sometimes provided a publication that did not necessarily describe the output. In addition, some outputs were difficult to assess when no corroborating evidence was provided to support grantees' claims about technical quality, advancement of the field, impact, or dissemination efforts.
The committee did not use standardized reporting guidelines, such as CONSORT (Schulz et al., 2010) or PRISMA (Moher et al., 2009), used by journals in their peer review processes for selecting manuscripts for publication. The committee members generally assumed that publications that had been peer reviewed warranted a minimum score of 4 for technical quality. (In some cases, peer-reviewed publications were ultimately given technical quality scores above or below 4 following committee discussion.) Had reporting guidelines been used in the review of research publications, it is possible that the committee's ratings would have changed.
The committee was directed to assess the quality of four types of prespecified outputs. While the most common output type was publications, NIDRR grants produce a range of other outputs, including tools and measures, technology devices and standards, and informational products. These outputs vary widely in their complexity and the investment needed to produce them. For example, a newsletter is a more modest output than a new technology or device. To assess the quality of outputs, the committee members used criteria based on the cumulative literature reviewed and their own expertise in diverse research areas of rehabilitation and disability research, medicine, and engineering, as well as their expertise in evaluation, economics, knowledge translation, and policy. However, the committee's combined expertise did not include every possible content area in the broad field of disability and rehabilitation research.
Recommendation 6-4: If future evaluations of output quality are conducted, the process developed by the committee should be implemented with refinements to strengthen the design related to the diversity of outputs, timing of evaluations, sources of information, and reviewer expertise.
Corresponding to the above points, these refinements include the following.
Diversity of outputs
The dimensions of the quality criteria should be tailored and appropriately operationalized for different types of outputs, such as devices, tools, and informational products (including newsletters, conferences, and websites) and should be field tested with grants under multiple program mechanisms and refined as needed.
For example, the technical quality criterion includes the dimension of accessibility and usability. The questionnaire asked grantees to provide evidence of these traits. However, the dimensions should be better operationalized for different types of outputs. For tools, such as measurement instruments, the evidence to be provided should pertain to pilot testing and psychometrics. For informational products, such as websites, the evidence should include, for example, results of user testing, assessment of usability features, compliance with Section 508 standards (regulations from the 1998 amendment to the Rehabilitation Act of 1973 requiring the accessibility of federal agencies' electronic and information technology to people with disabilities). For technology devices, the evidence should document the results of research and development tests related to such attributes as human factors, ergonomics, universal design, product reliability, and safety.
The quality criterion related to dissemination provides other clear examples of the need for further specification and operationalization of the dimensions. For example, the dissemination of technology devices should be assessed by examining progress toward commercialization; grantees' partnerships with relevant stakeholders, including consumers and manufacturers; and the delivery of information through multiple media types and sources tailored to intended audiences for optimal reach and accessibility.
Timing of evaluations
The committee suggests that the timing of an output evaluation should vary by the output type. Publications would best be assessed at least 2 years after the end of the grant. However, plans for publications and dissemination and the audience for scientific papers could be included in the final report. As stated earlier, other outputs developed during the course of the grant should be evaluated on an interim basis to assess the development and evolution of products. Outputs that have the potential to generate change in practice or policy may require more time to pass before impact materializes and can be measured, and so would best be evaluated on an interim basis as well.
Sources of information
A more proactive technical assistance approach is needed to ensure that grantees provide the data necessary to assess the specific dimensions of each quality criterion. As stated earlier, the information supplied in the APR and the questionnaire was not always sufficient to inform the quality ratings. (See also the above discussion of information requested on the grantee questionnaire and the discussion below of the APR.)
The committee suggests that for future output evaluations, NIDRR should consider developing an accessible pool of experts in different technical areas who can be called upon to review selected grants and outputs. In addition, it is essential that future review panels include scientists with disabilities. Consumers also could also play a vital role as review panel members by addressing key criteria related to impact and dissemination.
Improving Use of the Annual Performance Report
NIDRR's APR system has many strengths, but the committee identified some improvements the agency should consider in building greater potential for use of these data in evaluations. The APR system (Research Triangle International, 2009) includes the grant abstract, funding information, descriptions of the research and development projects, and outcome domains targeted by projects, as well as a range of variables for reporting on the four different types of grantee outputs, as shown in Table 6-5. The system is tailored to different program mechanisms as needed. All of the descriptive information listed above, plus the output-specific variables listed in Table 6-5, were utilized in the committee's evaluation. The data were provided in electronic databases and in the form of individual grant reports.
Data Elements Related to Outputs That Are Covered in the APR.
The APR data set NIDRR provided to the committee at the outset of its work was helpful in profiling the grants for sampling and in listing all of the grantees' projects and outputs. It facilitated asking the grantees to nominate outputs for the evaluation since it enabled the committee to generate comprehensive lists of all reported projects and outputs to make the task of output selection less burdensome for the grantees. If grantees had more recent outputs originating from their NIDRR grants that they wished to nominate as their top two for the committee's review, they had the option of doing so.
NIDRR also provided grantees' narrative APRs from the last year of their grants, as well as their final reports. These narratives were highly useful to the committee for compiling descriptions of the grants.3
The purpose of this paper is to describe a new qualitative research method, summative analysis, which the researcher has been developing over the past four years to effectively manage, organize, and clarify large bodies of qualitative, textual data. Summative analysis is a collaborative analytic technique that enables a wide range of researchers, academics, and scientists to come together through group analysis sessions to explore the details of textual data. It uses consensus-building activities to reveal major issues inherent in data. The method has purposefully been called summative analysis rather than collaborative analysis or any other term, in spite of its collaborative aspect, to emphasize the value placed on the form of working: Summative analysis prepares people to grasp an essentialized understanding of text.
To emphasize the versatility of the method, the author has worked with others to apply summative analysis across a variety of research studies, settings, and textual data types, such as
• focus groups with people at risk of breast cancer to consider the value of electronic decision aids for supporting their decisions for care (Rapport, Iredale, et al., 2006);
• biographies from general practitioners and community pharmacists to clarify situated practice and patient-centered approaches to work across community workspaces (Rapport, Doel & Elwyn, 2007; Rapport, Doel, & Wainwright, 2008); and
• Interviews: with gastroenterologists and nurse managers to examine innovation in endoscopy services across 40 National Health Service (NHS) trusts in England (Rapport, Jerzembek, et al., 2009).
The author begins the paper by describing the choice of summative analysis as a method and its various uses before presenting the most recent use of the method: a Holocaust study that aimed to explore ‘how notions of health and well-being were in evidence in survivor testimonies and what narrative forms they took’. The Holocaust study aimed to clarify what survivor testimonies might tell us about the ongoing health needs and expectations of people who have suffered extraordinary, traumatic events like the Holocaust. The overall objective of the study was to consider how health professionals might best support survivors and their families, through well-attuned understanding of their mental and physical health needs and through a greater understanding of the value of the health narrative in assisting with the clinical encounter.
This paper does not present the Holocaust study as an empirical study per se but, rather, in relation to the analytic method, using examples taken from the study to illustrate different stages of the method and to situate that knowledge in very practical and real terms. Having described the Holocaust study, the author describes the four stages of summative analysis using examples from the study. The paper concludes with a discussion of the wider implications of summative analysis for qualitative researchers and social scientists so that others might consider its place in their own research portfolio.
Choosing summative analysis
Qualitative researchers and social scientists now have an array of choices regarding the analysis of qualitative data and the representation of their data as results. Consequently, it is important that principled, informed, and strategic decisions are made, in line with the specific purposes of the research in question. Qualitative researchers must also reflect on the methodological issues related to the construction of each kind of data representation. For example, constructing a standard realist tale from data can often be effective, as Van Maanen (1998) has noted, if one wishes to take account of experiential authority, the research subject's point of view (in the form of closely edited quotations), and the interpretive omnipotence of the researcher. Sparkes (2002) has commented that when well crafted, realist tales can provide useful, compelling, detailed, and complex depictions of social worlds. Summative analysis cannot offer interpreter omnipotence, nor can it enable the researcher to put to the background the ethnographer's intent, whilst giving the author unchallenged authority in writing (Sparkes, 2002; Van Maanen, 1998