Developing Artificial Intelligence Tools For Institutional Review Board Pre-review: A Pilot Study On ChatGPT’s Accuracy And Reproducibility

URL

https://www.medrxiv.org/content/10.1101/2024.11.19.24317555v2.full-text.pdf

Stage

Normal Science

Paradigm framing

Artificial intelligence; Natural language processing; Large language models; Institutional review board review; Research ethics

Highlights

This preprint presents a pilot study exploring the use of ChatGPT for pre-institutional review board (IRB) review of clinical research documents. It focuses on evaluating the accuracy and reproducibility of ChatGPT in extracting information from Japanese-language research protocols and informed consent forms. The study compares GPT-4 and GPT-40, as well as customized versus standard prompts. The results demonstrate promising baseline performance in terms of accuracy and reproducibility, suggesting the potential utility of large language models (LLMs) in this context. This work falls under normal science as it operates within the existing paradigm of AI and NLP, applying these tools to a specific problem (IRB review) without challenging fundamental assumptions or proposing radical new theories. The study contributes to the ongoing refinement and application of established techniques within the field. While the authors mention future integration of advanced methods like retrieval-augmented generation (RAG) and fine-tuning, the current study itself does not involve these techniques and focuses on the basic capabilities of existing LLM models. This focus on incremental improvement within the current paradigm reinforces the classification as normal science.

Developing artificial intelligence tools for institutional review board pre-review: A pilot study on ChatGPT’s accuracy and reproducibility

Leave a Comment Cancel Reply