Test Scoring & Questionnaires
-
Formula for calculating the median for USRI reports
Introduction Universal Student Ratings of Instruction (USRI) course evaluations gather feedback from classes to help instructors, departments and faculties improve curriculum and instruction. As part of the results which are compiled, an average is calculated. This article will describe the formula that is used to calculate the median for USRI reports. Applicability This article was written for instructors and support staff at the University of Alberta. Details The arithmetic mean of a set of numbers is commonly called the average of the numbers. Its method of calculation is well known. A synonym for 'average' is 'typical' which may be a helpful word to use given the computational baggage that is associated with average. Average also becomes a troublesome word when it brings to mind 'mediocre' as a synonym. The median is another 'typical' value which may be used to represent a larger set of numbers. A simple definition of median is that it is the middle value in a set of numbers which have been ordered by magnitude. This definition causes a problem when there is no middle value because there are an even number of values, or when there are several numbers having the same value around the point where the middle should occur. A more general definition of median is that it is the 50th percentile of the frequency distribution formed by counting the number of times each value occurs. When the distribution of numbers is symmetrical, such as that of the so-called bell curve, the mean and median are equal. When the distribution is not symmetrical, debate arises concerning which is the more typical number. An example often used concerns average income where the arithmetic mean may be quite different than the income of the typical wage earner because of the skewed distribution of values. In this case, and often in the case of course/instructor rating scales, the median more closely approximates the income of the typical worker or the rating given by the typical student. Calculation of the median using the idea of a grouped frequency distribution allows one to recognize that, for example, a 5-point rating scale constrains responses to a small set of discrete values when the underlying attribute being measured is really a continuous scale. Evidence of this is observed in the collection of students' ratings of instruction when we observe that some students mark two consecutive values in an attempt to communicate that they're not sure if they want to award, for example, a 4 or a 5. We have also observed on a number of occasions that a respondent will make a mark, for example, between the 4 and the 5. Neither of these types of responses provide valid data but they do illustrate the presence of a continuous scale underlying the small set of discrete values. Calculation of the median in such situations proceeds as follows. If the distribution of responses given by a class of 25 students is: 1 Strongly Disagree, 1 Disagree, 4 Neutral, 8 Agree, and 11 Strongly Agree and the values 1 through 5 are assigned as ratings corresponding to Strongly Disagree through Strongly Agree, the mean is 4.08. The median is computed as the value attributed to the 50th percentile point in the distribution of ratings given by the 25 respondents. Six responses are Neutral or lower while 14 indicate Agree or lower. The point 12.5, 50% of 25, is thus in the interval corresponding to Agree which ranges from 3.5 to 4.5 when the distribution is considered to be continuous rather than consisting of the discrete values 1 through 5. We need to travel (12.5 - 6) = 6.5 out of 8 units along the interval between 3.5 and 4.5. Therefore the median is computed as 3.5 + (6.5 / 8) = 4.31 which is a value that more closely reflects the consensus of the raters; almost .25 of a 'rating' higher than the mean. The above can be summarized by the formula: where: L = lower limit of the interval containing the median (3.5 in the example above) I = width of the interval containing the median (1.0 in the example above) N = total number of respondents (25 in the example above) F = cumulative frequency corresponding to the lower limit (6 in the example above) f = number of cases in the interval containing the median (8 in the example above). Keywords: USRI, reports, Calculating the Median, tsqs, median, calculate, formula,
-
USRI Reference Data
Introduction Universal Student Ratings of Instruction (USRI) course evaluations gather feedback from classes to help instructors, departments and faculties improve curriculum and instruction. This article will describe the use of USRI results Reference Data. Applicability This article was written for deans, chairs, students and instructors at the University of Alberta. Details The columns of reference data display statistics from Tukey's box and whisker plot analysis(John W. Tukey, Exploratory Data Analysis, AddisonWesley Publishing Company, Inc. 1977).The values displayed are derived from all the classes in the indicated reference group. These statistics are chosen to achieve two main objectives: 1. summarize skewed distributions of data, and 2. identify outliers from the general population if they exist. The median value (middle of a ranked set of numbers) is generally preferred over the mean to identify the centre of a skewed distribution of scores. This is the value below which 50 percent of the medians from other classes lie. Please note that data for the items in the current set of mandated questions are accumulated from Academic Year 2005/06 and beyond.If an item(question) has not been used at least 15 times by the indicated reference group since then, the reference data cells will be filled with the text: "too few uses". It is theoretically possible for all median scores in a single year to be above, or below, the Reference Group median. The 25th and 75th percentiles provide information about the spread of scores around the median. By definition, twenty-five percent of the scores are above the 75th percentile and twenty-five percent are below the 25th percentile. Since this occurs by definition, these valuesshould not be used to determine whether a particular score is good or bad. The lower Tukey Fence, which is the 25th percentile minus 1.5 times the distance from the 25th to the 75th percentile, defines a reasonable limit beyond which a score can be considered an outlier. Outliers are scores that appear to be outside the usual distribution of scores for the population being tabulated, i.e., for the indicated reference group. Given the nature of the USRI data, the upper Fence will usually be above 5.0 and, therefore, need not be reported. Please note that some items can be expected to elicit higher ratings because they are closer to apple pie types of items, i.e., we would expect the item to be rated quite positively. This is illustrated by the campus-wide results accumulated in the years 20002004 for the two items shown below. Item Tukey Fence Reference Data 25% 50% 75% The instructor treated students with respect. 3.4 4.3 4.6 4.8 Overall, the quality of the course content was excellent. 2.9 3.8 4.1 4.3 This suggests that the median obtained for the first item in a particular class can be expected to be 0.5 of a rating above that for the second item simply because that has been found to be the case in results from thousands of classes surveyed at the University of Alberta. Note that the 25th percentile for the first item corresponds to the 75th percentile for the second item. Also, the reference group used for a particular class consists of all classes in the indicated department or faculty. One of the most consistent findings of researchers studying students' ratings of instruction is that the ratings obtained for items such as those addressing general satisfaction with a course or instructor depend on the discipline in which the course is taught. Franklin and Theall (1995) reported that "Professors in fine arts, humanities, and health-related professions are more highly rated than their science, engineering and math-related colleagues." There appears to be a combination of reasons for these differences including, differences in the characteristics of the students, in the nature of the subject matter, and in the course objectives that are emphasized in different disciplines. The sizes of the differences and the conclusion that they are not necessarily related to characteristics of the instructors in the different disciplines leads to the advice that "we must continue to be very cautious about, if not prohibited from using, the results of student evaluations to make comparisons across disciplines" (Marincovich,1995). For example, the item "Overall, this instructor was excellent." illustrates that results at the University of Alberta are consistent with the research studies. The reference data from some of the departments in which a large number of classes have been surveyed appear in the following table. Department Tukey Fence Reference Data 25% 50% 75% Physics 2.4 3.7 4.1 4.5 Computing Science 2.5 3.7 4.1 4.5 Electrical & Computer Engineering 2.7 3.9 4.2 4.6 Mathematical & Statistical Sciences 2.8 3.9 4.2 4.6 Earth & Atmospheric Sciences 3.0 4.0 4.3 4.6 Biological Sciences 3.1 4.0 4.3 4.6 English 2.8 4.0 4.4 4.7 Modern Languages & Cultural Studies 2.9 4.0 4.4 4.8 History & Classics 3.4 4.2 4.5 4.7 Elementary Education 2.7 4.0 4.5 4.8 Drama 2.9 4.1 4.7 4.9 References Franklin, J., and Theall, M. "The Relationship of Disciplinary Differences and the Value of Class Preparation Time to Student Ratings of Teaching." in N. Hativa and M. Marincovich (eds.), Disciplinary Differences in Teaching and Learning: Implications for Practice. San Francisco: JosseyBass, 1995. Marcinovich, M. "Concluding Remarks: On the Meaning of Disciplinary Differences." in N. Hativa and M. Marincovich (eds.), Disciplinary Differences in Teaching and Learning: Implications for Practice. San Francisco: JosseyBass, 1995. Keywords: tsqs, reference data, USRI reports
-
Additional Options for Generating Class lists with GPSCOR and MRSCOR
Introduction When completing the GPSCOR/MRSCOR Request for Service Form, the option to generating additional files may be selected as follows: 1. a file which is suitable for uploading to eClass. (This is done by marking the bubble next to eClass and writing your Class ID on the line immediately below.) 2. a comma-delimited file (*csv) that is easily opened using Excel This article will explain the file that is generated by making this request. For general information on the options available for generating class lists, please see the relevant section in 19000110253 for GPSCOR and in 19000110413 for MRSCOR. Documentation on uploading the eClass file is available here: Moodle, Canvas. Applicability This article was written for instructors and support staff at the University of Alberta. Details eClass Specific The eClass file will contain each student's last name, first name, id number, user id, and total score in the format expected by eClass (using comma delimiters). Normally, two files will be generated when you request an eClass file: The first is a file with the suffix .log which contains a report of the matching process used to associate the data from the scanner file with the class list obtained from the Registrar's database. The second file is the end product of the matching process. It will normally have the suffix _w.csv. If answer sheets are scored which we did not (could not) match to the Registrar's class list, a third file will be created with the suffix _u.csv. Please note that the first line in the *_w.csv file is a title line describing the data. It contains the text: Last Name,First Name,Student No,User Id,OMR Score. You will usually find it helpful to replace "OMR Score" with a more meaningful title. If the list of incorrect responses is requested the column heading for this field will be "Items Wrong" while the column heading "Scored Responses" will appear if the scored (R/w) responses have been requested. The matching process is an iterative one that should generally result in an accurate association of the data in the scanner file with the student identification information obtained from the Registrar's database. The following steps are used (and may be proofed by examining the *.log file): Answer sheets having Names and ID numbers on them which correspond exactly with the information in the database are identified, matched and eliminated from further consideration. Answer sheets having ID numbers that are identical and Names which are reasonably similar to those in the database are identified, matched and eliminated from further consideration. Answer sheets having ID numbers which match the last 6 digits and Names which are reasonably similar to those in the database are identified, matched and eliminated from further consideration. Answer sheets having Names which are reasonably similar to those in the database are identified, matched and eliminated from further consideration. Identification information on any remaining answer sheets is manually examined and associated with students remaining in the Registrar's class list if such an association appears reasonable. Answer sheets that, in the judgement of the TSQS operator, cannot be matched to the official class list are not included in the eClass score file. These records, if any, will be identified in the *.log file under the heading "Number of remaining unmatched scanner records" and then placed in a separate file with the suffix *_u.csv. The final information appearing in the *.log file is a listing of any students in the Registrar's class list who do not have an associated answer sheet identified in the scanner file. This list along with the list produced in step #6, in particular, should be reviewed before uploading the score file to eClass. Excel Specific The contents of the Excel file depends on which program is used to create it. If the Scansort program is used, the file will contain any biographical information that was read from the answer sheets (such as name, idnumber, special codes) plus all the scores derived from the keys that were used to process the job. If Examtab is used to create the Excel file, only 1 column will be created for the score which will be the score derived from the key for the appropriate test form. In this case, an option also exists to include a column indicating which key was applied to respective answer sheets. The first field in the Excel file contains the sheet number that was printed on the answer sheets when they were processed. Relevant to both types of files There are two optional text fields that may be included in these files. You may request either or both of these fields; the default is neither of them. The first option, obtained by marking the Wrongs bubble, is a list of item numbers corresponding to incorrect responses by the student. The field begins with W= which is followed by the numbers of the items that were incorrect. If a student has no incorrect responses, the field contains W=NONE. The second is a list of scored responses. This field begins with R=and is followed by alphabetic values corresponding to the student's responses. If a response is correct, the character will be in upper case. If an incorrect response will be in lower case. If the responses were scanned as numeric values, the numbers will be replaced by their corresponding alphabetic characters. If a response was not given for an item, '-' will appear in the corresponding position. If more than one response was given'*' will indicate this. If a correct response was not provided for an item (on the key), the corresponding character in the output will be '.'; 3 ... are used to indicate that 3 or more consecutive items were not keyed. If an operator was asked to confirm the match, a '?' will appear between the identification number and the name in the log file. Keywords: scoring reports, results, scoring, class list, eClass file, .csv, excel file, tsqs
-
Student Perspectives of Teaching (SPOT) - Team Teaching Format
Introduction This article describes the use of Student Perspectives of Teaching (SPOT) System in Team-Teaching, where multiple instructors are evaluated on a single questionnaire. Applicability This article was written for instructors and support staff at the University of Alberta. Details A special format is available in the SPOT system to support evaluation of courses which involve multiple instructors. The questionnaire is arranged such that the questions that apply to the overall course appear first. These are followed by questions that apply to the instructor. Instructor-related questions are repeated for each instructor involved in the course. This format avoids burdening students by asking the same course-related questions multiple times draws attention to the idea that the number of questions asked about a particular instructor should decrease as the number of instructors increases. When these questionnaires are processed, the data are separated such that a separate report is generated for each of the instructors appearing on the questionnaire. Results for the common, course-related questions are included on each instructor's report. Keywords: Course evaluations, USRI, Team Teaching, reports, SPOT, TSQS
-
TSQS Charges and Rates
Introduction This article will describe the charges and rates for Test Scoring & Questionnaire Services (TSQS). Also included is some information on how charges are assessed, including sections on Operator Charges, Scanning Charges and Web Surveys. Applicability This article was written for instructors and support staff at the University of Alberta. Details Different rates are charged for Test Scoring & Questionnaire Services depending on the source of funding. University rates are offered to clients when the bill is to be debited against a University-administered account. Bills paid in any other manner are assessed the External rate. Current rates are as follows: Item Unit University External Operator Time Hour 27.00 36.00 Scanning Charge Sheet/Booklet 0.06 0.08 Analyst Time Hour 60.00 60.00 Reports Printed sheet 0.20 0.20 A minimum of 10 minutes is assessed for each occasion of each activity. Operator Charges Charges for operator time are assessed for: scanning answer sheets. generating reports (beyond the first set which is normally included in the scanning charge.) routine processing of Web Surveys. Beyond the minimum charge, operator time can be inflated by the following situations: Generation of IDQ Questionnaires: Unusual requests (such as producing a questionnaire containing many unique questions). Submitting separate requisitions for each class. Requisitions causing uncertainty that require a number of phone calls in order to discover what is being requested. Scanning Time: Poorly marked forms (including those using liquid pens, miscoded id-numbers and/or cross-outs, rather than erasures) Jobs submitted in disarray (e.g., forms not oriented in a consistent direction, damaged forms) Use of MRSCOR (Multiple Response Scoring Program). Scanning Charges Use of the term 'booklet' refers to documents that consist of two (2) or more sheets that are processed as a single entity. Booklets are assessed the basic scanning charge, plus one cent for each sheet comprising the booklet. Volume discounts are offered for the scanning charges of single-sheet documents based on the total number of sheets scanned for a given account within each month: Typically, up to 400 sheets can be processed in ten (10) minutes. Large jobs requiring minimal operator intervention are processed at approximately 4000 sheets per hour. Web Surveys Analyst rates are assessed for designing web surveys that are deemed non-routine while operator rates are assessed for those that build on existing templates (i.e., are simple modifications of surveys that have been created earlier.) New (non-routine) surveys will usually require two (2) or more hours of analyst time. In addition, operator rates are assessed for e-mailing reminders and for downloading responses from the web server. The scanning sheet charge is applied for each response record that is retrieved. Keywords: TSQS, Charges, rate schedule, rates, cost, price
-
MRSCOR (Multiple Response Scoring Program)
Introduction Test Scoring and Questionnaire Services (TSQS) offers two types of scoring: GPSCOR (General Purpose Scoring Program) and MRSCOR (Multiple Response Scoring Program) This article provides some additional information on MRSCOR, which is utilized for situations in which students are allowed or expected to respond with more than one answer per question. Applicability This article was written for instructors and support staff at the University of Alberta. Details MRSCOR provides a variety of scoring options. (In all cases, only a single key sheet is used.) Scores may be computed by only focusing on the responses that the student made. In this case, the number of correct responses is counted, the number of incorrect answers is counted and a final score is reported which would normally be R - W. (The option is also provided to apply a weight other than one, to either R or W.) The total, unweighted, score possible in this case, is equal to the number of answers marked on the key sheet. Note that this means that the possible score for a question depends on the number of responses marked for that question on the key. Scores may also be computed by summing the number of correct behaviours performed by the student. i.e., summing both the number of times that a correct answer is marked and the number of times a choice is correctly left blank. The total, unweighted, score possible in this case, is equal to the number of questions on the test multiplied by the question length (the number of response choices) of each question. A third alternative is a two-step process. After using the second option (above), a program can be run that converts the output file into a GPSCOR (General Purpose Scoring Program) file. In the conversion process, complete questions are marked right or wrong (depending on whether or not the complete pattern of responses and omits matches the key that has been provided). In this case, the total score possible is equal to the number of questions on the test. Please refer to the instructions for completing the Optical Mark Reader Request for Service Form in KB0012170. Keywords: MRSCOR, multiple answers scoring, scoring options, TSQS
-
TSQS Quick Links
Introduction This article contains links to access TSQS (Test Scoring Questionnaire Services) online services. Applicability Any user of TSQS could potentially benefit from this article. Details The following links are available for your use primarily by Support Staff: Retrieve Block IDs Retrieve Class IDs Search Catalogs Order IDQ questionnaires online. Generate Test Scoring Request for Service Form. The following links are available for your use primarily by Instructors: Retrieve results from the U of A's Universal Students' Ratings of Instruction Retrieve Personal Statistical Reports for USRI surveys Access the CoursEval survey system to complete surveys or view available feedback Keywords: TSQS, Test Scoring Questionnaire Services, Links, reports, results, USRI
-
Features of IDQs
Introduction This article includes information on features available when generating Instructor Designed Questionnaires (IDQs) and administrative reports. Such features include: block-ids (which are used to simplify repeated requests for the same set of questions and to identify questions which should be included in an administrative copy of an IDQ report) team teaching evaluations group definitions used for comparative ratings Applicability This article was written for instructors and support staff at the University of Alberta. Details Block-ids for Common or Core Questions Block-ids are 4-character codes associated with specific subsets of questions. When selecting questions for a given questionnaire, it is possible to include one or more of these subsets of items by simply specifying the appropriate block-id. This feature simplifies the task of requesting a particular set of questions that have been identified for repeated usage over a number of occasions. Please refer to the block-id pages in order to determine the names of the block-ids available for a particular catalog and to display the questions associated with particular block-ids. Departments or individuals wishing to make use of this feature should contact Test Scoring and Questionnaire Services (TSQS) who will assist with defining and implementing the desired block-id. Block-ids for Administrative Reports In addition to the Instructor Report which is intended for the use of the individual instructor, three types of reports may be generated for administrative purposes: Administrative Copies of the Instructor Report Aggregate Report Administrator's Summary These administrative reports summarize the responses to specific subsets of questions on the questionnaire by building on the idea of core questions and associated block-ids. As many as three Administrative Copies of the Instructor Report, each summarizing the responses to different subsets of questions from the questionnaire may be requested. A mailing address is associated with each administrative block-id at the time it is defined to the IDQ system making it possible to automate the process of identifying a subset of questions on a questionnaire summarize the responses to the subset on one of the Administrative Copies return the special report to the person, e.g., the department chair, who was associated with the block-id when it was defined. When the first character of a block-id is the digit 1, 2, or 3, the associated questions are summarized on the corresponding first, second, or third Administrative Copy of the report. As this is the purpose of such special block-ids (those beginning with a 1, 2, or 3), it is understood that when an instructor includes one or more of these block-ids on an IDQ Requisition form, permission is granted to generate a report summarizing the responses to the block-id's questions as well as to mail the report to the designated recipient. The Aggregate Report and the Administrator's Summary may also be generated by special request from the appropriate administrator (the report recipient associated with the block-id indicated on the Requisition Form). Team Teaching Evaluation The University of Alberta's version of the IDQ system supports the collection of students' ratings for classes which involve team-teaching. This feature allows for questions that concern the course, per se, and an additional set that are relevant to each of a number of instructors of a single class. The IDQ system addresses this situation in all phases, from generating a questionnaire (which includes the course-related questions, plus as many sets of instructor-related questions as possible on a single form) to providing separate reports for each of the individual instructors appearing on the combined form. For further information about this feature, please refer to KB0012144. Reference Groups for Comparative Ratings The IDQ system accumulates ratings from each class for each question appearing on a questionnaire, in order to provide comparative data on ensuing reports. By default, the comparative data are derived from the reference group consisting of all classes to which the particular question has been administered. Alternatively, you may request that the reference group be restricted to classes having sizes similar to yours. The IDQ system automatically groups classes according to size as follows: 1 - 15 students 16 - 35 students 36 - 100 students 100+ students. For some of the catalogs, rating data are also grouped and accumulated according to values of three other variables: Faculty, Department, and Course Level. You may request any combination of these variables, including Class Size, to be used in defining the reference group for extraction of comparative ratings. Course Level has six (6) values: 100-level 200-level 300-level 400-level Grad-level Service. Grad-level is assumed to be any course in the 500 or 600 series. The first five (5) of these levels can thus be determined by examining the course number when this is provided at the time the questionnaires are generated. If the data are to be isolated as belonging to a Service course rather than to one of the first five levels, this desire must be noted at the time the IDQ Requisition form is submitted to TSQS. (i.e., before the questionnaires are printed). Keywords: IDQ system, Block ids, questions, course evaluations, block, ID, id, block-id, repeat question, team-teaching, compare, TSQS
-
Instructor Designed Questionnaire (IDQ) Reports
Introduction This article describes the use of Instructor Designed Questionnaire (IDQ) System and the reports generated by the Universal Student Ratings of Instruction (USRI) system. Applicability This article was written for instructors and support staff at the University of Alberta. Details A one-page report is generated for each class from which students' ratings have been collected. The Instructor Report contains the text of each of the rating questions appearing on the questionnaire. The questions are reported in the sequence that they were printed on the questionnaire. If the question is a unique question, xxx is printed to indicate that the question is not catalogued. Following the text of each question, the numbers of students responding to the rating scale Strongly Disagree (SD), Disagree (D), Neutral (N), Agree (A), and Strongly Agree (SA) are reported. These frequencies are followed by the median rating of the responses. Reference Data: For the catalogued questions, the median is followed by four reference data values based on Tukey's box-and-whisker plot analysis (John W. Tukey, Exploratory Data Analysis, Addison-Wesley Publishing Company, Inc. 1977). Ratings from other classes are generally accumulated over a number of years in order to obtain a large enough sample to provide reasonably stable values for the reference data. If there has been insufficient usage of the question to generate the reference values, this will be indicated by the appearance of the text too few uses for rating. Reference data are not reported for unique questions even for those situations in which a large number of classes have requested the same unique (uncatalogued) question. The group from which the reference data are computed is indicated at the bottom of the report. The reference values are provided to help you assess your medians in relation to those of your colleagues. In the example provided, the RANKS OF MEDIANS FROM OTHER CLASSES indicate that 25 percent of the classes rated on catalog-question "Overall, this instructor was excellent" obtained median ratings below 3.2; 50 percent of the classes were given median ratings below 3.6 and 75 percent of the classes were given median ratings below 3.9. Since 25% of the classes obtain medians above the 75th percentile and 25% obtain medians below the 25th percentile by definition, these values should not be used to determine whether a particular score is good or bad. The column titled Tukey Fence refers to Tukey's inner fence statistic from his box-and-whisker values. It is computed as 1.5 times (75th percentile minus 25th percentile) which is then subtracted from the 25th percentile. (Please note that the numbers displayed have been rounded to one decimal point while the calculations use non-rounded numbers.) This value identifies a point below which scores (medians) may be considered outliers, i.e., scores which appear to be outside the usual distribution of scores for the reference group being tabulated. In the example below, the obtained median for catalog-question number "In-class time was used effectively" is 2.8 while the Tukey fence value is 3.0. In this case, an obtained median of 2.8 could be considered a low score relative to the reference group under consideration. Please note that the precision of the instructor's median ratings is affected by the number of students responding to the question while the precision of the reference values is affected by the number of classes that have been rated on the particular question. Administrative Copies of the Instructor Report: The IDQ system provides the option of generating additional copies of the Instructor Report for administrative purposes. Only the questions referenced in the associated block-id will be reported on these administrative copies. Aggregate Report: The Aggregate Report, illustrated in the example, is presented in the same format as the Instructor Report. It is provided to accommodate aggregation of the responses from a number of classes into a single report. This option might be most appropriate for situations involving a number of sections within a course when a subset of questions focus on evaluating features associated with the course rather than those associated with individual instructors. Provided that the subset of questions have been requested via inclusion of the appropriate block-id on the IDQ Requisition form, it is possible for an administrator to request that the responses gathered from the various sections be aggregated into a single Aggregate Report. There are two differences in presentation between this report and the Instructor Report. The tally of responses to the questions are presented as percentages rather than raw frequency counts. The option is provided for selecting the Reference Group to consist of only the individual classes which have been included in the aggregation for the report. Please note that use of this last option will often produce a relatively small reference group resulting in unstable estimates of the quartile ranks (if the reference group does not consist of more than 15 classes, the message "too few uses" will appear rather than the desired reference data). When using this option, the information gleaned from the percentile ranks of the Reference Group medians is not so much concerned with indicating how the obtained median compares with medians from other classes but rather with indicating the variability of the medians of the classes which have been aggregated for inclusion in the report. As indicated above, the Aggregate Report begins with a list describing the classes which have been included in the aggregation. The statistical report appears on the following page. Administrator's Summary: The Administrator's Summary, illustrated in the following example, is subject to the same constraints as the Aggregate Report concerning the need for a subset of questions to be associated with an appropriate block-id. It also allows for the possibility of restricting the Reference Group to only the classes which are listed in the given report. The Administrator's Summary is provided in response to a request for a report which summarizes the results for a number of classes in a condensed format. As illustrated in the example, this is accomplished by reporting the results for each class on a single line with the medians for each question aligned in a column under the corresponding reference data. For convenience, an additional page is provided which indicates the text of the questions associated with each of the question numbers appearing in the main body of the report. A distinctive feature of the Administrator's Summary is that it shows in bold-face those medians which exceed the 75th percentile of the class medians in the Reference Group provided that the class median is computed from a minimum of 6 responses. In addition to the printed copy of this report, a file may be requested which contains the data in a "tab-delimited" format which is suitable for importing into Microsoft Excel. Keywords: IDQ Reports, Usri, instructor reports, course evaluations, reference data
-
Itemanal and Examtab Programs for Item Analyses
Introduction Test Scoring Questionnaire Services (TSQS) supports two programs that may be used for test (item) analysis: Itemanal Examtab This article will describe the two programs and explain which scanner programs they may be used with. Applicability This article was written for instructors and support staff at the University of Alberta. Details Itemanal Of the packages available at TSQS, Itemanal produces the most comprehensive test and item analysis. It begins by displaying the key that was used to score the test. A value of * is used to indicate that no correct answer was supplied for an item on the key that was used. The following pages display the FREQuency of occurrence of each SCORE, the Z-score that would be associated with that raw SCORE if the distribution was normally distributed, the PERCENTILE associated with the SCORE (which is the CUMulative PERCENTAGE of scores occurring below the current score plus half of the current scores) and the CUMulative PERCENTAGE of examinees obtaining scores at or below the current value. Finally, the cutting points are reported which will be used to divide the class into high, middle and low groups. By default, the program attempts to place 27% of the class into each of the high and low groups with the remainder in the middle. The success in identifying 27% will depend on the size of the class and the presence of tied scores at the cutting points. After reviewing the distribution of scores in the frequency table and histogram and the overall test statistics that appear in the sample output, you may find it useful to start at the back of the report where a scattergram is displayed which summarizes the biserial correlations and difficulty levels of all the items. Items having low values of either (say, below .3) and, particularly, items having negative correlations should then be given special attention in examining the item-wise reports. If the class-size is large enough, plots are also provided that display the success rates (difficulty levels) of quintile groups for each item. These, too, can alert you to items that are behaving strangely. In the sample output, the plot for Item 34 illustrates a nicely behaving item while those for Items 13 and 37 indicate a need for closer examination. In the item-wise reports, items for which the HIGH group chooses a particular incorrect response more often than they choose the correct response should, especially, be examined more closely. Also, items for which the LOW group apparently gives the correct response more frequently than the HIGH group or for which the HIGH group chooses a particular incorrect response more often than does the LOW group should also be closely examined. Examtab Examtab produces significantly less information than Itemanal. It has none of the graphical output, the frequency distribution does not include the percentile column, and the itemwise report consists of only one line per item. As may be discovered by comparing the output from the two programs, the Point Biserial reported by Examtab is the correlation between the score on the item and the score on the remainder of the test (rather than the total test score). Main Parameters 'E' and 'F' on the Request for Service sheet should both be zero if you intend to use this program. Additional Considerations (Notes) These programs are designed to be used with the GPSCOR scanner program and the General Purpose Answer sheets that can be processed by GPSCOR. In addition, the Itemanal program can be used with data files produced by the MRSCOR scanner program by treating each response to each item as a True/False item. Appropriate output will be returned along with your processed answer sheets if you indicate that you want one of these programs run by indicating your choice under the Data Processing Required: option on the Request for Service sheet. A popular practice in a number of departments is to create multiple forms of a test for purposes of administration in large (crowded) classes. This is usually done by reordering the items in the test. In order to analyse all forms of a test as a single set, mapping instructions must be provided so that the items can be aligned to correspond from one form to another before proceeding with the item analysis. If these mapping instructions are not provided, a separate analysis will be performed for each version of the test. The answer sheets from the different forms of the test do not need to be sorted into separate groups but, if they are submitted as one group, a version number must be coded on each key and the students must be instructed to code the same (appropriate) number on their answer sheets using the Special Codes section on the answer sheet. We recommend that the numbers assigned to the different versions be several digits in length in order to guard against students making undetected errors when coding this number. During scanning, the answer sheets are scored against the keys for all forms of the test. When the subsequent analyses are done, the appropriate version of the test and the corresponding score are selected for the reports. Keywords: tsqs, examtab, itemanal, statistical analysis, reports, test, item, analysis