The following is the text of Department of Defense Polygraph Institute report DODPI94-R-0009, Psychophysiological Detection of Deception Accuracy Rates Obtained Using the Test for Espionage and Sabotage. This report, dated August 1995, is available from the Defense Technical Information Center (DTIC) as report #ADA330774, and was also reprinted in the American Polygraph Association quarterly, Polygraph, Vol. 27, No. 1 (1998), pp. 68-73.
Note, however, that while the DTIC report is 48 pages long, the Polygraph reprint is only six pages. As the DTIC report was not available for comparison purposes, it is not clear whether the Polygraph reprint actually provides the full text of the DTIC report.
The text presented here is taken from the Polygraph reprint. Page numbers are provided between curly braces for citation purposes. Corrections have been added in brackets.
Department of Defense Polygraph Institute
Research Division Staff
The accuracy of decisions for determining deception in a mock screening situation, using
three psychophysiological detection of deception (PDD) formats has been compared [Department
of Defense Polygraph Institute (DoDPI) Research Division Staff, 1997]. Two of the formats were
counterintelligence scope polygraph (CSP) formats; one in which probable lie comparison (PLC)
questions are asked and the other in which directed lie comparison (DLC) questions are asked. The
third format was the test for espionage and sabotage (TES) in which: (a) the number of issues
being tested is reduced; (b) the number of repetitions of the questions used to calculate question
scores is restricted to three; (c) between test stimulation is eliminated; (d) the order of questions
within the question sequence is constant; (e) each relevant question is compared to the same
comparison questions; (f) the pretest is brief, more standardized and follows a logical sequence of
information presentation; and (g) DLC questions are asked in place of the standard PLC questions.
The decisions of the examiners who administered the TES format were significantly more accurate (83.3%) at identifying the programmed guilty (PG) examinees than were the decisions of the examiners who administered either the CSP-PLC (55.6%) or the CSP-DLC (58.6%) format. There were no significant differences among the accuracies of the examiners' decisions at identifying the innocent examinees.
This study replicated the procedures utilized in the previous study, but only the TES format was administered. In addition, the data were evaluated using a scoring method that was developed using the data collected during the previous study (DoDPI Research Division Staff, 1997).
Eighty-eight examinees were recruited by a local employment agency under contract to the
Department of Defense Polygraph Institute and were paid $30.00 for their participation.
Individuals who met the following criteria were excluded from participation: (a) less than 19 or
more than 60 years of age, (b) not in good health, (c) pregnant, or (d) did not have the equivalent
of a high school diploma. Thirty-four male (M = 27.2, SD = 9.7) and 54 female (M = 27.8, SD =
10.2) examinees were scheduled for testing. Thirty-three examinees were PG.
Ten certified examiners (9 males and 1 female) were selected by and from the Office of the
Secretary of the Air Force (OSAF) to conduct the examinations. Selection of the examiners was
determined by the agencies. Although examiner selection was not random (selection criteria
generally involve availability and experience), the examiners were considered representative of the
CSP examiner population. The examiners had been trained in the administration of the TES and,
for one month, had been conducting government examinations using the format. Examiners
conducted two practice examinations before conducting an examination for the project. Each
examiner completed two 4-hour examinations (morning and afternoon) on four days and one 4
hour examination on one day for a total of nine examinations each. The examiners were not given
any information regarding the base rates. They did not receive feedback regarding the accuracy of
their decisions until the end of the study, and they were blind as to whether the examinee was PG.
The examiners used standard analog field polygraphs manufactured by either Lafayette or Stoelting. Standard respiratory, electrodermal, and cardiovascular responses were recorded. The electrodermal component was operated in the manual mode. The examinations were conducted individually in large (6.2m x 6.2m) rooms in a building located on Fort McClellan. The scenarios used to program examinees guilty were enacted in another building located approximately two miles from the examination building. There were no video recording devices nor one-way mirrors in the examination rooms. The examinations were audio taped.
The PG examinees enacted one of four mock scenarios. Each scenario was representative of one of the four relevant questions. The espionage scenario required one examinee to steal a classified document from an office and to give the document to a second examinee. The second examinee received the document and placed it inside a vehicle located in the parking lot. Examinees who enacted the sabotage scenario, stole either a classified document or a classified computer disk. The examinee either put the document through a paper shredder or with a pair of scissors, cut the disk into pieces. An examinee who enacted the unauthorized contact scenario was asked to meet with a German agent who was sitting in a car in the parking lot. The agent requested that the examinee obtain some classified information to be given to the agent at a later time. During the enactment of the unauthorized disclosure scenario, the scenario setter was called out of his office midway through briefing the examinee regarding some classified computer information. A third person, who appeared to be fixing a window screen, entered the office and engaged the examinee in conversation regarding what the examinee had been told. All PG examinees received $100.00 as payment for their participation in the "crime." In addition, all PG examinees wrote a statement indicating that "for the purposes of this project" they had engaged in espionage, sabotage, unauthorized contact, or unauthorized disclosure, depending on which scenario they enacted.
Scoring and Decision Criteria
Scoring procedures developed during a previous study (DoDPI Research Division Staff, 1997) were used to evaluate the data. If the original decision was conclusive--significant responding (SR) or no significant responding (NSR)--then the decision was final. If a conclusive decision could not be made then the physiological responses to the first presentation of each relevant question were reevaluated by comparing them to the physiological responses only to the first presentation of the second comparison question. If, after the rescore, a conclusive decision was not possible, then the test was considered inconclusive (INC).
During each session, ten examinees were given information regarding the research project, their participation, and the PDD examination. If they agreed to participate, they signed a consent form indicating that they were voluntarily participating in the research project. The examinees were taken in groups of two either to another building to be programmed guilty, or to the testing site. the PG examinees received information regarding the purpose of the scenario they were to participate in and signed an additional consent form indicating that they agreed to participate in the scenario. After
they enacted one of the scenarios, they were transported to the testing site . The transportation of the examinees to the testing site was timed so the examiners were not able to discern which examinees were innocent and which were programmed guilty.
The examinations were conducted according to guidelines provided to the examiners. Each
examiner provided a numeric score and a decision (SR, INC, NSR) based on the numeric score,
for each test. An NSR decision concluded the subtest. If the decision was INC, the examiner
briefly discussed the questions with the examinee to determine if the examinee understood the
questions. Then, the test was administered again. If, based on the data from the second test, the
examiner's decision was INC, then the decision for that subtest was INC. When the examiner
rendered an SR decision, the examiner confronted the examinee with the results.
Programmed guilty examinees were instructed to confess their guilt if they were confronted by the examiner, but not to reveal any details of their activities. Once a PG examinee confessed, the examination was concluded. However, an innocent examinee who responded significantly to the relevant questions--a false positive (FP) decision--was questioned by the examiner to determine if there was a legitimate real-world explanation for the examinee's physiological responses to the relevant questions. The examiner recorded any information provided by the examinee and concluded the examination. Two examiners, otherwise not involved with the study, independently evaluated the information obtained from the examinees who received FP decisions. If the two examiners agreed that the information was significant enough to justify the examinee's physiological responding--a false positive decision with justification (FPWJ)--then that examinee's data were not included in the original data analyses.
If the decision for the first subtest was either NSR or INC, the examiner conducted the second subtest. If, however, the decision for the first subtest was SR, then the second subtest was not conducted. All of the examinees tested during a session were debriefed simultaneously. Examinees who participated in mock scenarios returned the $100.00.
Data reduction and analyses
The data from 82 examinees were included in the analyses. The remaining six examinees were excluded for the following reasons: One PG examinee confessed to the examiner prior to the examination, one PG examinee was unable to understand the instructions of the scenario setter, two examinations were incomplete, and two FPWJ examinees were excluded.
If the scoring based on the physiological responding during an initial test resulted in an
INC decision and a second test was conducted, unless otherwise indicated, only the result of the
second test was included in the analyses. The percentages of innocent examinees and PG
examinees correctly identified were calculated. Chi-square tests were conducted to determine if the
numbers of correct decisions in identifying innocent and PG examinees were significantly different
from chance. The significance criterion was set at .05.
Excluding the one inconclusive decision, 98% of the innocent examinees and 83.3% of the
PG examinees were correctly identified. The number of correct decisions, inconclusive decisions
and errors made by the examiners are presented in Table 1. Both the number of innocent examinees
correctly identified [X2 (1, N = 51) = 47.08, p < .001] and the number of PG examinees correctly
identified [X2 (1, N = 30) = 13.33, p < .001] were significantly greater than chance. When the
two FPWJ examinees are included in the analysis of the accuracy of decisions identifying innocent
examinees, 94.3% of the innocent examinees were correctly identified. Including the FPWJs, the
number of innocent examinees correctly identified was significantly greater than chance [X2 (1, N
= 53) = 41.68, p < .001]. There were a total of five (10%) innocent examinees and five (17%) PG
examinees for whom the initial test results were inconclusive. After retesting, the results remained
inconclusive for only one innocent examinee (1.9%).
Excluding the one inconclusive decision, 98% of the innocent examinees and 83.3% of the PG examinees were correctly identified. These results mirror the findings of the previous study, with respect to the accuracy of decisions obtained using the TES format. Although many questions remain regarding the generalizability of the TES format to field situations, the TES format appears to have greater validity than the format currently used by the federal government.
Further testing is required to answer some of the questions raised by the current and the previous studies (DoDPI Research Division Staff, 1997): (a) does the caveat "during this project" affect the accuracy of the decisions identifying innocent and PG examinees, (b) does the effect of the question caveat impact on the generalizability of the format to field situations, (c) does the reduced number of relevant issues addressed during the test contribute to the increase in the accuracy of identifying PG examinees, and (d) will the results generalize when different issues are addressed and different relevant questions are utilized.
Department of Defense Polygraph Institute Research Division Staff (1997). A Comparison of Psychophysiological Detection of Deception Accuracy Rates Obtained Using the Counterintelligence Scope Polygraph (CSP) and the Test for Espionage and Sabotage (TES) Question Formats. Polygraph, 26 (2), 79-106.