DEVELOPMENT OF PHYSIC PROBLEMS FOR THE FINAL ASSESSMENT IN HIGH SCHOOL LEVEL WITH RASCH MODEL ANALYSIS

___________________________________________________________________ The aim of this research is to develop Physics problems that are appropriate to be used for the final assessment at High School level with Rasch Model Analysis. Research method used in this study is Research and Development with the ADDIE model. Physics problems developed refers to the USBN blueprint issued by BSNP. Problems that have been developed will be validated by Physics experts, then tested on students, and analyzed for their quality using Rasch Model analysis. Validation results from Physics experts gave a score of 86.25% with very good criteria, it shows that Physics problems are feasible in terms of Physics material. The problems that have gone through the validation stage are then tested to several students and analyzed using the Rasch Model. Results of the analysis show that problems that have been developed have good validity and reliability. Problems that have been developed also have varying degrees of difficulty. Results of this analysis also show that problems that have been developed are not biased for a certain group. Based on the results, it can be concluded that Physics problems that have been developed are appropriate to be used for the final assessment at High School level.


INTRODUCTION
Physics is one of fundamental knowledges which discuss about life phenomenon and can be used to predict a certain event by manipulating variable/s (Hırça, 2013;Hodosyová et al., 2015;Suyidno et al., 2018). Physics is one part of science that integrates with other science fields such as Mathematics, Chemistry, and Biology. In Indonesia, at the level of High School Education, Physics is one of the vocational subjects for the students who major in Mathematics and Natural Sciences. At the end of a learning process, students will take several assessment activities to evaluate their knowledge that has been absorbed (Maba, 2017;Tighe-mooney et al., 2016). The assessment in Indonesia has been regulated by Minister of Education and Culture Regulation No. 23 year 2016 (Maulana & Ningtiyas, 2019). There're three types of assessment, one of them is an assessment by the education unit level. In Indonesia, the assessment is known as the National Standardized School Examination (Ujian Sekolah Berstandar Nasional or USBN).
Even though it has the word "National" in its name, the implementation of USBN is not same as the implementation of the National Examination. The National Examination is a test that has been organized nationally, so the problems that used in one school with the other schools must be the same, while the USBN is not the case. The results of observations show that problems used for the implementation of USBN are not from the national government. Problems used for USBN are divided into two, about 20% -25% of the problems are developed in Teacher Subject Deliberations (Musyawarah Guru Mata Pelajaran or MGMP) in certain cities / districts, while the remaining 75% -80% are problems developed by each teacher in their respective schools.
Based on the observation, problems that developed by each school's teacher and used in the previous USBN are not in good quality. It is because of problems which have been developed are taken from irresponsible sources such as blog. This raises a big question about "Are problems used in the past USBN really able to measure what should be measured?". Therefore, the development of Physics problems for the implementation of USBN is needed.
Rasch Model is one of analytical techniques that not only can determine the quality of the item, but also knows the quality of test participants (Ishak et al., 2018). Rasch Model is an analysis technique of one logic parameter that refers to difficulty level of an item in determining the quality of respondents (students) (Motta et al., 2015;Sasmoko et al., 2018). This is clearly different from analysis technique in general, which makes the number of questions answered correctly as a criteria for determining the quality of students.
The Rasch Model provides several advantages (Simpelaere et al., 2017). In addition to knowing whether the problems that have been developed are valid and reliable (Czuba et al., 2016;Ee et al., 2018;Hassan et al., 2017), as well as the level of difficulty of each problems, Rasch Model can also provide information about the quality of test instrument as a whole (Susongko, 2016), the quality of students, and interactions that occur between tests used with students (Cecilio-fernandes et al., 2017;Kean, Bisson et al., 2018;Lang et al., 2019;Morán et al., 2018;Wang et al., 2017). Quality of respondents (students) can also be obtained better (students with low, medium, and high ability), because the quality of students is seen from questions answered with a certain level of difficulty, not from the number of questions answered correctly (Lo et al., 2015;Mamat et al., 2018;Zubairi & Kassim, 2006). Rasch Model is also capable of analyzing problems that have the potential to be biased towards certain groups, thus helping educators and problem developers to consider improving a problem item later (Baghaei et al., 2017;Ismail et al., 2015;Mclaughlin et al., 2016;Sumintono, 2018). Rasch Model is also useful for detecting the presence of students with inappropriate response patterns (Imran et al., 2017;Ireland et al., 2018;Li et al., 2016). The pattern of responses that are not appropriate is the pattern of answers given with their incompatibility (Maat & Rosli, 2016;Mohammad et al., 2016). This is used to indicate the consistency of students' thinking to know students who are careless or lucky and students who cheat in test (Saidi & Siew, 2019).
Based on the background issue which have been described above, the aim of this research is to develop Physics problems that are appropriate to be used for the final assessment at High School level with Rasch Model Analysis.

METHODS
This research was conducted from January 2019 to April 2019 in the Research and Development of Physics Education Laboratory, Universitas Negeri Jakarta. The trial test of problems that have been developed was carried out in several high schools in Bekasi city, with 100 students as research subject. This research is research and development with the ADDIE model. ADDIE's research and development model consists of five core steps, they are Analysis, Design, Development, Implementation, and Evaluation (Muruganantham, 2015). In analysis step, several activities were carried out, those are observing the current USBN implementation, analyzing the problems that had been used in the USBN a few years before, and conducting literature studies on USBN blueprint issued by BSNP. In design step, a framework and design of Physics problems will be designed to be developed, creating a blueprint of problems which will be developed, along with the problem rubric for the description type. In development step, the development of Physics problems for the implementation of USBN is carried out, where the questions that have been developed will be analyzed by Physics experts to assess their quality. Input from experts was used as material to revise the problems before being tested in the field. In implementation step, questions that have been declared feasible in terms of Physics material are tried out to several students. Results of the trials were then analyzed using Rasch Model analysis to obtain information about the quality of instrument as a whole, the quality of problems that had been developed, and the quality of students used in conducting the trial. Finally, in evaluation step, a conclusion is made whether the questions that have been developed are suitable to be used for USBN Physics material or not.
In this study, validation data from Physics experts were obtained using a validated questionnaire beforehand. The questionnaire used was Likert scale 1-4. Validation results from Physics experts were processed and analyzed to determine the feasibility of the questions that had been developed in terms of the material. While the results of trials by a number of students were processed and analyzed by Rasch Model analysis with the help of the Winsteps application.

Product
In this study, a research product is Physics problems that will be functioned for the implementation of USBN in High School level. The problems that have been developed refer to the USBN blueprint issued by BSNP.

Theoretical Validation
Problems that have been developed then validated by two experts in terms of Physics material. Theoretical validation by Physics experts aims to determine the quality of the questions that have been developed from Physics material aspects. The results of theoretical validation by material experts are shown in the following graph. .

Figure 2. Theoretical validation from Physics experts
The graph above shows that the quality of problems that have been developed has very good criteria with an average score of 86.25%. The more complete results of the material expert validation in the "material problems" section showed an average score of 85.42%, the "problem construction" section was 85.00%, and the "language" section was 90.63%, where every section has very good criteria. In courtesy, the results of this validation show that the quality of the Physics problems that have been developed have very decent quality to be used on USBN in High School level for Physics subject.

Field Test and Analysis
The trial was carried out by asking students to answer Physics problems that had been arranged in one complete instrument. Answers given by students are then processed and analyzed using the Rasch Model. The analysis carried out is an analysis to determine the quality of the response pattern between respondents (students) with items (problems), overall quality of the instrument, overall quality of students, validity of the items, level of difficulty of the items, individual ability, and degree of conformity individuals in responding (answering questions). Summary statistic shows overall info about the quality of the overall response pattern, overall quality of the instrument, and the interaction between person and item. The Summary statistic has an average value of -0.88 logit which shows the value of all respondents in working on the items given. The average value that is smaller than the logit value of 0.0 indicates the tendency of respondents to be smaller than the difficulty level of the items. In addition, person reliability and item reliability in data are used to determine the level of reliability of respondents and items. If it's less than 0.67, it means weak reliability; 0.67 to 0.80 is enough, 0.80 to 0.90 is good; 0.91 to 0.94 is very good; and more than 0.94 is said to have special reliability. Person reliability has a value of 0.56 where good consistency of answers from respondents has weak reliability, while item reliability has a value of 0.89, where the quality of the items in the first test device shows good reliability.
Item fit explains whether the item functions normally in measuring or not. If the problem is found not to be fit, it is an indication that there is a possibility of misconception in students for the item. This information will be used as a reference to improve the quality of the items which are not fit. The outfit means-square, outfit z-standard, and correlation measure point are the criteria used to see the level of item quality. The item is declared valid if it has one of the criteria, but if nothing is fulfilled, then it can be ascertained that the item is not good, so it needs to be repaired or replaced. The suitability criteria for the items: (1) The received Outfit Mean Square (MNSQ) values range from 0.5 <MNSQ <1.5; (2) The value of Outfit Z-Standard (ZSTD) received ranges from -2.0 <ZSTD <+2.0; and (3) The value of Point Measure Correlation (Pt Mean Corr) ranges from 0.4 <Pt Measure Corr <0.85. In this research, it was found that each item fulfilled at least one of the criteria of an item fit. This shows that the problems that have been developed are valid and able to measure what should be measured.
The approach to use the Rasch Model also provides an interpretation of the difficulty level in item measure. Through the Winsteps application, the difficulty level of the problem can be seen from the logit value. A high logit value indicates a high level of difficulty, and vice versa. Positive logit values show difficult and negative items for easy items. While the standard deviation (SD) value can also be combined to classify the difficulty level. 0.00 logit + 1SD is a group of difficult problems, more than + 1SD is a group of very difficult problems; 0.00 logit -1SD is a group of easy problems, and smaller than 1SD is a group of very easy problems. In the test instruments that have been developed, problems number 38, 11, 34, 40, 1, 2, and 20 show very difficult problems; problems number 39,3,25,31,12,21,32,35,10,13,36,8,19, and 14 including difficult problems; problems number 9, 24, 7, 18, 30, 37, 12, 22, 28, 5, and 33 including easy problems, as well as problems number 23, 27, 29, 6, 4, 16, 26, and 25 including the problems that very easy.
Besides for item analysis, the Rasch Model can also be used to analyze the respondent's ability (person measure). Through Rasch Model analysis, respondents can be divided into a several various levels of ability. The average logit value is used as the identifier of the respondent group. While the standard deviation (SD) value is combined to classify the respondent's ability level. In the range of "average logit -1SD" to "average logit + 1SD" is a group of respondents with moderate ability, more than "average logit + 1SD" is a group of highly capable respondents, and smaller than "average logit average -1SD "is a group of low-ability respondents. The results show that the average logit value is -0.88 and the SD value is 0.53. So that the group of highly capable respondents must have a logit value greater than -0.30; the group of respondents who are capable have logit values ranging from -1.46 to -0.30; and groups of respondents with low abilities have a logit value of less than -1.46.
Person Fit is used to detect the presence of individuals who have inappropriate response patterns. The different response patterns are patterns of incompatibility of answers given based on their ability. This is used to indicate the consistency of respondents' thinking and to find out respondents who are careless or lucky and respondents who cheat in test. The outfit means-square, outfit z-standard, and point measure correlation are the criteria used to see the level of individual fit (person fit). Individual suitability is declared valid if it fulfills one of the criteria, but if nothing is fulfilled, it is certain that the individual is inconsistent, careless, lucky, or cheating. Following are individual suitability criteria: show that there were no respondents who were inconsistent, careless, lucky, or cheating. But when viewed on a scalogram, it appears that 97 respondents including respondents who are careless because the third easiest item (number 16) cannot answer it, but the hardest item (number 38) can be answered. In addition, respondents 11 included respondents who were lucky because they were able to answer the most difficult items (number 38).

Discussion
In this study, a research product is Physics problems that will be functioned for the implementation of USBN in High School level. The problems refer to the USBN blueprint issued by BSNP.
The problems that have been developed then validated by several experts in Physics material aspects. The results of the validation of material experts show that the quality of the problems that have been developed has very good criteria with an average score of 86.25%. The more complete results of the material expert validation in the "material problems" section showed an average score of 85.42%, the "problem construction" section was 85.00%, and the "language" section was 90.63%, where every section has very good criteria. In courtesy, the results of this validation show that the quality of the Physics problems that have been developed have very decent quality to be used on USBN in High School level for Physics subject.
Physics problems that have been developed and validated by several experts in terms of Physics material, then it will be tested in the field. The trial was carried out by asking for responses from several students by answering the problems that had been developed. Each given answer was processed and analyzed using the Rasch Model to find out not only the quality of the items developed, but also the quality of the test participants.
Summary statistic data shows that the Summary statistic has an average value of -0.88 logit which shows the value of all respondents in working on the items given. The average value that is smaller than the logit value of 0.0 indicates the tendency of the respondents to be smaller than the difficulty level of the items. In addition, person reliability and item reliability in data are used to determine the level of reliability of respondents and items. If it's less than 0.67, it means weak reliability; 0.67 to 0.80 is enough, 0.80 to 0.90 is good; 0.91 to 0.94 is very good; and more than 0.94 is said to have special reliability. Person reliability has a value of 0.56 where good consistency of answers from respondents has weak reliability, while item reliability has a value of 0.89, where the quality of the items in the first test device shows good reliability.
Item fit explains whether the item functions normally in measuring or not. If the problem is found not to be fit, it is an indication that there is a possibility of misconception in students of the item. This information will be used as a reference by researchers to improve the quality of the problems if items that are not fit are obtained. Based on the results of the study, it appears that each item at least fulfills one of the criteria of an item fit. This shows that the problems that have been developed are valid and able to measure what should be measured.
Rasch Model can also be used to analyze respondents. Based on the data, the students with high ability among them are student number 17 and 23. The students with moderate ability include number 31 and 42, while students with low ability are number 13 and 19.
Person Fit is used to detect the presence of individuals who have inappropriate response patterns. The different response patterns are patterns of incompatibility of answers given based on their ability. This is used to indicate the consistency of respondents' thinking and to find out respondents who are careless or lucky and respondents who cheat in test. The results of person fit show that there were no respondents who were inconsistent, careless, lucky, or cheating. But when viewed on a scalogram, it appears that 97 respondents including respondents who are careless because the third easiest item (number 16) cannot answer it, but the hardest item (number 38) can be answered. In addition, respondents 11 included respondents who were lucky because they were able to answer the most difficult items (number 38).
The results of the analysis show that the quality of Physics problems that have been developed can be declared feasible to be used for the implementation of USBN for High School level. In addition to providing an analysis of the quality of the problems that have been developed, the implementation of Rasch Model also provides information on the quality of students who carry out the test. This informs educators in improving and / or improving the quality of their students. This is in line with some of the results of previous studies which explained that Rasch Model provides more in-depth information about the quality of students (Zamri & Nordin, 2015), thus helping educators in evaluating the learning process in the classroom (Mursidi & Soeharto, 2016;Rahmani, 2018;Suranata et al., 2018). In addition, the quality of the items (problems) that are informed can be more accountable because the quality of the questions is analyzed using the difficulty level of each item CONCLUSION Based on the results and discussion, it can be concluded that Physics problems that have been developed are appropriate to be used for the final assessment at High School level.

ACKNOWLEDGMENT
In this section, thanks are given to several parties who helped in carrying out this research. Thank you to Mr. Esmar Budi and Mr. Iwan Sugihartono as Physics material expert validators. In addition, we also give thanks to all education personnel / citizens of SMA Negeri 1 Bekasi, SMA Negeri 2 Bekasi, SMA Negeri 4 Bekasi, SMA Negeri 9 Bekasi, SMA Negeri 14 Bekasi, and SMA Negeri 20 Bekasi for providing convenience in carrying out this research trial test to several of their students.