Evaluating Analytic Rubric Quality for Assessing Pre-service Biology Teachers' Creative Thinking Skills
DOI:
https://doi.org/10.15294/jpii.v14i3.21851Keywords:
Analytic rubric, Creative thinking skills , divergent thinking, preservice biology teachersAbstract
This study validates an analytic rubric designed to assess divergent thinking as a core dimension of pre-service biology teachers' creative thinking. The research addressed the absence of psychometrically sound assessment tools in Indonesian biology education, where creativity remains underdeveloped despite policy mandates, such as Kurikulum Merdeka, and international benchmarks, like PISA 2022. The study pursued two objectives: (1) theoretical validation of the rubric through expert judgment and inter-rater reliability, and (2) empirical validation through exploratory and confirmatory factor analyses. Eight subject-matter experts evaluated the rubric's descriptors, and 218 pre-service biology teachers completed divergent-thinking tasks based on human physiology scenarios. Content validity was calculated using Aiken's V, inter-rater agreement with Kendall's W, and construct validity and reliability through EFA and CFA. Findings indicated strong content validity (Aiken's V = 0.78–1.00) and fair to good inter-rater reliability (Kendall's W = 0.50–0.79). Factor analyses confirmed a unidimensional structure, with CFA demonstrating good model fit (RMSEA = 0.04; CFI = 0.97). All factor loadings exceeded 0.30, and composite reliability was ≥0.70 for three dimensions, though solution variety was marginally reliable. Notably, feasibility emerged as more stable than variety, suggesting that appropriateness is a stronger indicator of creativity in applied STEM contexts. The validated rubric provides theoretical insights into the structure of divergent thinking and practical tools for formative assessment in biology education. Future studies should extend validation across institutions, examine measurement invariance, and explore predictive validity to ensure broader applicability.
References
Abdellatif, R., & El-Wakeel, H. (2025). Assessing creative outcomes in studio-based learning: a comparative assessment of analytical rubrics. International Journal of Design Creativity and Innovation, 13(1), 41–66.
Aiken, L. R. (1985). Three Coefficients for Analyzing the Reliability and Validity of Ratings. Educational and Psychological Measurement, 45(1), 131–142.
Alajami, A. (2020). Beyond originality in scientific research: Considering relations among originality, novelty, and ecological thinking. Thinking Skills and Creativity, 38, 100723.
Amelia, R. N., Listiaji, P., Dewi, N. R., Heriyanti, A. P., Atmaja, B. D., Shoba, T. M., & Sajidi, I. (2024). Developing and Validating a Rubric for Measuring Skills in Designing Science Experiments for Prospective Science Teachers. Jurnal Inovasi Pendidikan IPA, 10(1), 32–46.
Anderson, R. C., & Graham, M. (2021). Creative potential in flux: The leading role of originality during early adolescent development. Thinking Skills and Creativity, 40, 100816.
Arabacı, D., & Baki, A. (2023). An analysis of the gifted and non-gifted students creativity within the context of problem-posing activity. Journal of Pedagogical Research.
Asli, N. F., Matore, M. E. E. M., & Yunus, M. M. (2024). Construct validity of primary trait writing rubrics based on assessment use argument (AUA) validation framework. Heliyon, 10(22), e40053.
Azmi, C., Hadiyanto, H., & Rusdinal, R. (2023). National Curriculum Education Policy "Curriculum Merdeka And Its Implementation." International Journal of Educational Dynamics, 6(1), 303–309.
Azwar, S. (2012). Reliabilitas dan Validitas (4th ed.). Pustaka Pelajar. https://pustakapelajar.co.id/product/reliabilitas-dan-validitas/
Baer, J. (2012). Domain Specificity and the Limits of Creativity Theory. The Journal of Creative Behavior, 46(1), 16–29.
Barak, M., & Levenberg, A. (2016). Flexible thinking in learning: An individual differences measure for learning in technology-enhanced environments. Computers & Education, 99, 39–52.
Barth, P., & Stadtmann, G. (2021). Creativity assessment over time: Examining the reliability of cat ratings. Journal of Creative Behavior, 55(2), 396–409.
Beghetto, R. A., & Karwowski, M. (2017). Toward Untangling Creative Self-Beliefs. In The Creative Self (pp. 3–22). Elsevier.
Bollen, K. A., & Long, J. S. (1993). Testing Structural Equation Models. Sage.
Breckler, S. J. (1990). Applications of covariance structure modeling in psychology: Cause for concern? Psychological Bulletin, 107(2), 260–273.
Brookhart, S. M. (2018). Appropriate Criteria: Key to Effective Rubrics. Frontiers in Education, 3.
Browne, M. W., & Cudeck, R. (1993). Alternative Ways of Assessing Model Fit. In Testing Structural Equation Models (Vol. 154). SAGE
Publications, Inc. https://us.sagepub.com/en-us/nam/testing-structural-equation-models/book3893#contents
Chan, J., & Schunn, C. D. (2023). The Importance of Separating Appropriateness into Impact and Feasibility for the Psychology of Creativity. Creativity Research Journal, 35(4), 629–644.
Chowdhury, F. (2018). Application of rubrics in the classroom: A vital tool for improvement in assessment, feedback and learning. IES, 12(1), 61.
Connell, J., Carlton, J., Grundy, A., Taylor Buck, E., Keetharuth, A. D., Ricketts, T., Barkham, M., Robotham, D., Rose, D., & Brazier, J. (2018). The importance of content and face validity in instrument development: lessons learnt from service users when developing the Recovering Quality of Life measure (ReQoL). Quality of Life Research, 27(7), 1893–1902.
Cooper, G. (2023). Examining science education in ChatGPT: An exploratory study of generative artificial intelligence. J Sci Educ Technol, 32(3), 444–452.
DiStefano, C., & Hess, B. (2005). Using Confirmatory Factor Analysis for Construct Validation: An Empirical Review. Journal of Psychoeducational Assessment, 23(3), 225–241.
Doll, W. J., Xia, W., & Torkzadeh, G. (1994). A confirmatory factor analysis of the end-user computing satisfaction instrument. MIS Quarterly, 18(4), 453.
Elangovan, N., & Sundaravel, E. (2021). Method of preparing a document for survey instrument validation by experts. MethodsX, 8, 101326.
Elkington, S., & Chesterton, P. (2025). Embedding assessment flexibilities for future authentic learning. Teaching in Higher Education, 30(3), 700–716.
Elosua, P. (2022). Validity evidences for scoring procedures of a writing assessment task. A case study on consistency, reliability, unidimensionality and prediction accuracy. Assessing Writing, 54, 100669.
Embretson, S. E., & Reise, S. P. (2013). Item Response Theory (0 ed.). Psychology Press.
Fleiss, J. L., Levin, B., & Paik, M. C. (2003). Statistical Methods for Rates and Proportions (1st ed.). Wiley.
Forthmann, B., Paek, S. H., Dumas, D., Barbot, B., & Holling, H. (2020). Scrutinizing the basis of originality in divergent thinking tests: On the measurement precision of response propensity estimates. British Journal of Educational Psychology, 90(3), 683–699.
Garcimartín, C. F., Pastor, V. M. L., Nieto, T. F., & Alcalá, D. H. (2024). Creating Assessment Rubrics for Final Teacher Education Degree Projects: A Qualitative Case Study. The Qualitative Report.
Gerbing, D. W., & Anderson, J. C. (1992). Monte Carlo Evaluations of Goodness of Fit Indices for Structural Equation Models. Sociological Methods & Research, 21(2), 132–160.
Ghadi, I., Alwi, N. H., Abu Bakar, K., & Talib, O. (2012). Construct validity examination of critical thinking dispositions for undergraduate students in University Putra Malaysia. HES, 2(2), p138.
Gorsuch, R. L. (2013). Factor Analysis (0 ed.). Psychology Press.
Goudarzian, A. H. (2023). Challenges and recommendations of exploratory and confirmatory factor analysis: A narrative review from a nursing perspective. Journal of Nursing Reports in Clinical Practice, 1(3), 133–137.
Guilford, J. P. (1959). Three faces of intellect. American Psychologist, 14(8), 469–479.
Gunawan, Ferdianto, F., Mulyatna, F., & Untarti, R. (2025). The profile of creative thinking process: Prospective mathematics teachers. Jurnal Eduscience (JES), 2(12). https://jurnal.ulb.ac.id/index.php/eduscience/article/view/6915
Hadi, S. (2001). Metodologi Research Jilid III. Andi. https://onesearch.id/Record/IOS2726.slims-67126?widget=1
Hair, J. F., Ringle, C. M., & Sarstedt, M. (2013). Partial Least Squares Structural Equation Modeling: Rigorous Applications, Better Results and Higher Acceptance. Long Range Planning, 46(1–2), 1–12.
Hammitt, J. K., & Zhang, Y. (2013). Combining experts' judgments: Comparison of algorithmic methods using synthetic data. Risk Analysis, 33(1), 109–120.
Han, C., Zheng, B., Xie, M., & Chen, S. (2024). Raters' scoring process in assessment of interpreting: an empirical study based on eye tracking and retrospective verbalisation. The Interpreter and Translator Trainer, 18(3), 400–422.
Hattie, J. (1985). Methodology review: Assessing unidimensionality of tests and ltenls. Applied Psychological Measurement, 9(2), 139–164.
Hayati, K., Ulfa Tenri Pada, A., & Mawarpury, M. (2023). Content validity of collective efficacy questionnaire for natural disasters based on Aceh Local wisdom. E3S Web Conf., 447, 4004.
Hettithanthri, U., Hansen, P., & Munasinghe, H. (2023). Exploring the architectural design process assisted in conventional design studio: a systematic literature review. International Journal of Technology and Design Education, 33(5), 1835–1859.
Hu, L., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3(4), 424–453.
Imbler, A. C., Clark, S. K., Young, T. A., & Feinauer, E. (2023). Teaching second-grade students to write science expository text: Does a holistic or analytic rubric provide more meaningful results? Assessing Writing, 55, 100676.
Isbell, T., & Goomas, D. T. (2014). Computer-assisted rubric evaluation: Enhancing outcomes and assessment quality. Community College Journal of Research and Practice, 38(12), 1193–1197.
Jönsson, A., & Panadero, E. (2017). The Use and Design of Rubrics to Support Assessment for Learning. In D. Carless, S. M. Bridges, C. K. Y. Chan, & R. Glofcheski (Eds.), Scaling up Assessment for Learning in Higher Education (Vol. 5, pp. 99–111). Springer Singapore.
Karunarathne, W., & Calma, A. (2024). Assessing creative thinking skills in higher education: deficits and improvements. Studies in Higher Education, 49(1), 157–177.
Kern, F. B., Wu, C., & Chao, Z. C. (2024). Assessing novelty, feasibility and value of creative ideas with an unsupervised approach using GPT‐4. British Journal of Psychology.
Kind, P. M., & Kind, V. (2007). Creativity in science education: Perspectives and challenges for developing school science. Studies in Science Education, 43(1), 1–37.
Koswara, D., Dallyono, R., Suherman, A., & Hyangsewu, P. (2021). The analytical scoring assessment usage to examine Sundanese students' performance in writing descriptive texts. CP, 40(3), 573–583.
Lange, R. T. (2011). Inter-rater Reliability. In J. S. Kreutzer, J. DeLuca, & B. Caplan (Eds.), Encyclopedia of Clinical Neuropsychology (p. 1348). Springer New York.
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28(4), 563–575.
Lee, J. E., Recker, M., & Yuan, M. (2020). The Validity and Instructional Value of a Rubric for Evaluating Online Course Quality: An Empirical Study. Online Learning, 24(1).
Legendre, P. (2005). Species associations: the Kendall coefficient of concordance revisited. JABES, 10(2), 226–245.
Lertsakulbunlue, S., & Kantiwong, A. (2024). Development of peer assessment rubrics in simulation-based learning for advanced cardiac life support skills among medical students. Advances in Simulation, 9(1), 25.
Marzano, G. (2022). Sustaining Creativity and the Arts in the Digital Age. IGI Global.
Mezo, P. G., & Short, M. M. (2012). Construct validity and confirmatory factor analysis of the Self-Control and Self-Management Scale. Canadian Journal of Behavioural Science / Revue Canadienne Des Sciences Du Comportement, 44(1), 1–8.
Mooi, E., & Sarstedt, M. (2011). A Concise Guide to Market Research. Springer Berlin Heidelberg.
Morris, G., & Sharplin, E. (2013). The assessment of creative writing in Senior Secondary English: A colloquy concerning criteria. English in Education, 47(1), 49–65.
Mrangu, L. (2022). Rubric as assessment tool for lecturers and students in higher education institution. Acta Pedagogia Asia, 1(1), 26–33.
Mui So, W. W., & Hoi Lee, T. T. (2011). Influence of teachers' perceptions of teaching and learning on the implementation of Assessment for Learning in inquiry study. Assessment in Education: Principles, Policy & Practice, 18(4), 417–432.
Natalya, L., & Purwanto, C. V. (2018). Exploratory and Confirmatory Factor Analysis of the Academic Motivation Scale (AMS)–Bahasa Indonesia. Makara Human Behavior Studies in Asia, 22(1), 29.
Neupane, S. M., & Bhattarai, P. C. (2024). Constructing the scale to measure entrepreneurial traits by using the modified delphi method. Heliyon, 10(7), e28410.
Nkhoma, C., Nkhoma, M., Thomas, S., & Quoc Le, N. (2020). The Role of Rubrics in Learning and Implementation of Authentic Assessment: A Literature Review. 237–276.
Nsabayezu, E., Iyamuremye, A., Mukiza, J., Habimana, J. C., Mbonyiryivuze, A., Gakub, E., Nsengimana, T., & Niyonzima, F. N. (2022). Teachers' and students' perceptions towards the utilization of formative assessment rubric for supporting students' learning of organic chemistry. Journal of Educational Sciences, 45(1), 124–134.
OECD. (2024). Pisa 2022 Results (Volume III): Factsheets – Indonesia. https://www.oecd.org/content/dam/oecd/en/publications/reports/2024/06/pisa-2022-results-volume-iii-country-notes_72b418f8/indonesia_cf276198/a7090b49-en.pdf?utm_source=chatgpt.com
Olson, J. M., & Krysiak, R. (2021). Rubrics as Tools for Effective Assessment of Student Learning and Program Quality (pp. 173–200).
Pada, A. U. T., Kartowagiran, B., & Subali, B. (2015). Content validity of creative thinking skills assessment. Proceeding of International Conference On Research, Implementation And Education Of Mathematics And Sciences. https://core.ac.uk/download/pdf/33519344.pdf
Pada, A. U. T., Kartowagiran, B., & Subali, B. (2016). Separation index and fit items of creative thinking skills assessment. REiD, 2(1), 1–12.
Pada, A. U. T., Mustakim, S. S., & Subali, B. (2018). Construct validity of creative thinking skills instrument for biology student teachers in the subject of human physiology. Jurnal Penelitian Dan Evaluasi Pendidikan, 22(2), 119–129.
Panadero, E., Delgado, P., Zamorano, D., Pinedo, L., Fernández-Ortube, A., & Barrenetxea-Mínguez, L. (2025). Putting excellence first: How rubric performance level order and feedback type influence students' reading patterns and task performance. Learning and Instruction, 99, 102168.
Pancorbo, G., Primi, R., John, O. P., Santos, D., Abrahams, L., & De Fruyt, F. (2020). Development and psychometric properties of rubrics for assessing social-emotional skills in youth. Studies in Educational Evaluation, 67, 100938.
Ramazanzadeh, N., Ghahramanian, A., Zamanzadeh, V., Valizadeh, L., & Ghaffarifar, S. (2023). Development and psychometric testing of a clinical reasoning rubric based on the nursing process. BMC Med Educ, 23(1), 98.
Reckase, M. D. (1979). Unifactor Latent Trait Models Applied to Multifactor Tests: Results and Implications. Journal of Educational Statistics, 4(3), 207–230.
Reddy, Y. M., & Andrade, H. (2010). A review of rubric use in higher education. Assessment & Evaluation in Higher Education, 35(4), 435–448.
Rosnawati, R., Kartowagiran, B., & Jailani, J. (2015). A formative assessment model of critical thinking in mathematics learning in junior high school. REiD, 1(2), 186–198.
Runco, M. A. (1985). Reliability and convergent validity of ideational flexibility as a function of academic achievement. Percept Mot Skills, 61(3_suppl), 1075–1081.
Runco, M. A., & Alabbasi, A. M. A. (2024). Interactions among dimensions of divergent thinking as predictors of creative activity and accomplishment. Thinking Skills and Creativity, 53, 101583.
Said-Metwaly, S., Noortgate, W. Van den, & Kyndt, E. (2017). Approaches to Measuring Creativity: A Systematic Literature Review. Creativity. Theories – Research - Applications, 4(2), 238–275.
Scanlon, D., MacPhail, A., Walsh, C., & Tannehill, D. (2023). Embedding assessment in learning experiences: enacting the principles of instructional alignment in physical education teacher education. Curriculum Studies in Health and Physical Education, 14(1), 3–20.
Schilling, L. S., Dixon, J. K., Knafl, K. A., Grey, M., Ives, B., & Lynn, M. R. (2007). Determining content validity of a self-report instrument for adolescents using a heterogeneous expert panel. Nursing Research, 56(5), 361–366.
Shafiei, S. (2024). A proposed analytic rubric for consecutive interpreting assessment: implications for similar contexts. Language Testing in Asia, 14(1), 13.
Shook, C. L., Ketchen, D. J., Hult, G. T. M., & Kacmar, K. M. (2004). An assessment of the use of structural equation modeling in strategic management research. Strategic Management Journal, 25(4), 397–404.
Sireci, S., & Faulkner-Bond, M. (2014). Validity evidence based on test content. Psicothema, 1(26), 100–107.
Sireci, S. G. (1995). The central role of content representation in test validity. The Construct of Content Validity: Theories and Applications. https://files.eric.ed.gov/fulltext/ED387508.pdf
Stevens, D. D., & Levi, A. J. (2005). Introduction to Rubrics: An Assessment Tool to Save Grading Time, Convey Effective Feedback and Promote Student Learning. Stylus Publishing, LLC. https://eric.ed.gov/?id=ED515062
Subali, B., & Suyata, P. (2013). Standardisasi penilaian berbasis sekolah. Jurnal Penelitian Dan Evaluasi Pendidikan, 17(1), 1–18.
Sudaryanto, M., & Akbariski, H. S. (2021). Students' competence in making language skill assessment rubric. REiD, 7(2), 156–167.
Sumekto, D. R., & Setyawati, H. (2018). Students' Descriptive Writing Performance: The Analytic Scoring Assessment Usage. CP.
Sureeyatanapas, P., Sureeyatanapas, P., Panitanarak, U., Kraisriwattana, J., Sarootyanapat, P., & O'Connell, D. (2024). The analysis of marking reliability through the approach of gauge repeatability and reproducibility (GR&R) study: a case of English-speaking test. Language Testing in Asia, 14(1), 1.
Thakral, P. P., Yang, A. C., Addis, D. R., & Schacter, D. L. (2021). Divergent thinking and constructing future events: dissociating old from new ideas. Memory, 29(6), 729–743.
van Dalen, D. B. (1973). Understanding Educational Research: An Introduction. McGraw-Hill.
Wang, C., Zhang, M., Sesunan, A., & Yolanda, L. (2023). Driving education reform in Indonesia through technology: Exploring the current status of the Merdeka Belajar program. Oliver Wyman. https://www.oliverwyman.com/our-expertise/insights/2023/dec/technology-driven-education-reform-indonesia.html?utm_source=chatgpt.com
Wang, Y., & Hou, Q. (2018). Insight or Originality: A Spray in the River of Creative Thinking. OALib, 05(09), 1–6.
Weiss, S., & Wilhelm, O. (2022). Is Flexibility More than Fluency and Originality? Journal of Intelligence, 10(4), 96.
Wiersma, W. (2000). Research Methods in Education: An Introduction (7th ed.). Allyn and Bacon. https://books.google.co.id/books/about/Research_Methods_in_Education.html?id=MAUmAQAAIAAJ&redir_esc=y
Williams, R. L. (1999). Operational definitions and assessment of higher-order cognitive constructs. Educational Psychology Review, 11(4), 411–427.
Yildiz, C., & Yildiz, T. G. (2021). Exploring the relationship between creative thinking and scientific process skills of preschool children. Thinking Skills and Creativity, 39, 100795.

