Response Letter to `Journal of Biomedical Informatics` Submission
《Journal of Biomedical Informatics》投稿
Paper ID: JBI-25-1401
纸张编号:JBI-25-1401
Paper Title: An Optimized Code-Free AI Approach for Efficient and Accurate Literature Screening in Bone Organoid Research
论文标题: 一种优化的无代码 AI 方法,用于骨类器官研究中的有效和准确的文献筛选
We want to thank you for your valuable comments.
感谢您发送编修。
We submitted 3 files, including a revised manuscript, a track changes file (to highlight differences between the revised and the original manuscript), and this file, i.e., the response letter (a complete response to the editor). In the following, we respond to each of your concerns and recommendations.
我们提交了3个文件,包括修订稿,跟踪更改文件(以突出修订稿和原稿之间的差异),以及此文件,即,回复函(对编辑的完整回复)。在下文中,我们对您的每一个问题和建议做出回应。
(EC: Editor’s Comment, RC: Reviewer’s Comment, AR: Author's Response)
(EC:编者的评论,RC:审稿人的评论,AR:作者的回应)
Response to the Editor
回复编辑
EC0: xxxx(第一点是整体对文章的评价)
AC0: We thank the editor for your kind comments and responded in a point-by-point manner.
AC0:我们感谢编辑的善意评论,并以逐点的方式做出了回应。
EC1: xxxx (假设是编辑没看明白,需要你解答)
AC1: xxxxx(解答的时候最好给出文章具体行的内容,作为支撑)
Reviewer #1
审核人#1
RC1.0: About knowledge discovery in biomedical literature
RC1.0: 关于生物医学文献中的知识发现
Many researchers and clinicians are not familiar with the AI approaches for literature review and screening. Proposed in this paper and approach based on AI that allows the users to screen literature and find knowledge with 90% less work and with very high accuracy, f1, precision and so on. The work is evaluated with a specific bone subject, bone organoid. The proposed system is user friendly AI based and code free system. They provide an improvement and enhancement to LitSuggest which is already published system. The paper assumes the reader is familiar with the LitSuggest system. The results are good and it is a good work.
许多研究人员和临床医生不熟悉文献综述和筛选的 AI 方法。本文提出了一种基于人工智能的方法,允许用户筛选文献并以 90%的工作量找到知识,并具有非常高的准确性,f1,精度等。这项工作是用特定的骨主题,骨类器官进行评估的。所提出的系统是用户友好的基于 AI 和代码自由的系统。他们提供了一个改进和增强 LitSuggest 这是已经发布的系统。本文假设读者熟悉 LitSuggest 系统。结果是好的,这是一个很好的工作。
AC1.0: We thank the reviewer for the kind comments.
AC1.0:我们感谢评论者的友好评论。
RC1.1:The paper is unnecessarily too long and bulky. In my opinion I like to see it a single space with a total of 7 to 10 pages of relevant information. For example, in sections 2.1 and 2.2 started taking about collecting articles from pubmed; why? What do we want to do? The introduction started with introducing the problem and immediately in 2.1 and 2.2 started collecting papers why?
RC1.1:文件不必要地太长和太大。在我看来,我喜欢看到它的一个单一的空间,共 7 至 10 页的相关信息。例如,在第 2.1 和 2.2 节中,开始从 pubmed 收集文章;为什么?我们要做什么?引言从介绍问题开始,并立即在 2.1 和 2.2 开始收集论文,为什么?
The paper should give a clear algorithm and sequence of steps in a simple language of what is the approach, step by step.
论文应该给出一个明确的算法和步骤顺序,用简单的语言说明是什么方法,一步一步来。
AC1.1: (补充信息)Thank you for your suggestion, according to your suggestion, we added the information you suggested... The added information is as follows (see details in lines xx-xx of the revised manuscript):
AC1.1:( 补充信息 ) 谢谢您的建议,根据您的建议,我们添加了您建议的信息。增加的信息如下(详见修订稿第 xx-xx 行):
Add:"xxxxxx..."
加上:“xxxxxx. "
RC1.2: The graphical abstract is not very helpful (in page 4) I do not see any value.
RC 1。2: 图形摘要不是很有帮助(在第 4 页)我没有看到任何价值。
RC1.3: Table 1 mentioned and referenced in page 7 and the table is in page 18; this is inconvenient
RC 1。3: 第 7 页提到并引用了表 1,该表在第 18 页;这是不方便的
RC1.4: the paper is too long because of the double spacing and so much waste which make hard to read and navigate through it.
RC 1。4: 由于双倍行距和太多的浪费,纸张太长,难以阅读和浏览。
Reviewer #2
审核人#2
RC2.0: Dear Authors thanks for your study. Please revisit following suggestions.
RC2.0: 尊敬的作者,感谢您的研究。请重新考虑以下建议。
RC2.1: TECHNICAL CORRECTNESS
RC 2。1: 技术正确性
Overall Soundness: The methodology is rigorous, metrics are appropriate, and claims are supported by data.
总体合理性:方法是严格的,指标是适当的,索赔是由数据支持。
Minor Gaps: Class imbalance, threshold justification, and model transparency could be clarified.
小差距:类不平衡,阈值的理由,和模型的透明度可以澄清。
Suggested Improvements:
建议的改进:
Add AUPRC/balanced accuracy for imbalanced data.
增加不平衡数据的 AUPRC/平衡准确度。
Disclose base model architecture of LitSuggest.
揭示 LitSuggest 的基本模型架构。
Discuss threshold selection criteria (e.g., cost of false negatives).
讨论阈值选择标准(例如,假阴性的代价)。
Include a sample dataset in supplements.
在补充中包括样本数据集。
RC2.2: NOVELTY/ORIGINALITY
RC2.2: 新奇/原创性
Limitations in Originality are listed as follow:
独创性的局限性如下:
1.LitSuggest is Not New
1. LitSuggest 不是新的
The platform itself (Allot et al., 2021) was pre-existing; the novelty lies in workflow optimization and domain-specific validation.
平台本身(Allot 等人,2021)是预先存在的;新颖之处在于工作流程优化和特定于域的验证。
2.No Comparison to Other AI Tools
2.无法与其他 AI 工具进行比较
Lacks benchmarking against alternatives (e.g., ChatGPT, ASReview).
缺乏针对替代品的基准测试(例如,ChatGPT、ASReview)。
Without this, claims of superiority are suggestive but not proven.
没有这一点,声称的优越性是暗示性的,但没有得到证实。
Strong methodological innovations but built on an existing platform. First in bone organoids, but some aspects (e.g., trend analysis) have precedents.
强大的方法创新,但建立在现有的平台上。首先是在骨类器官中,但在某些方面(例如,趋势分析)有先例。
Areas for Improvement to Strengthen Novelty
加强新奇的改进领域
Compare with Other Tools (e.g., "How much better is this than ChatGPT screening?").
与其他工具比较(例如,“这比 ChatGPT 筛查好多少?").
Expand Beyond PubMed (e.g., Scopus, preprint servers) to show broader applicability.
扩展到 PubMed 之外(例如,Scopus,预印本服务器),以显示更广泛的适用性。
User Study (e.g., "Do biologists find this easier to use than traditional tools?").
用户研究(例如,“生物学家是否发现这比传统工具更容易使用?").
Recommendation for Authors:
作者推荐:
Emphasize boundary refinement and score-based bibliometrics as key advances.
强调边界细化和基于分数的文献计量学是关键的进步。
Address limitations by comparing with other tools or testing on additional datasets.
通过与其他工具进行比较或在其他数据集上进行测试来解决局限性。
RC2.3: REFERENCE TO PRIOR WORK
RC 2。3: 参考以前的工作
The manuscript builds solidly on prior work (e.g., LitSuggest, active learning) but could better articulate its advancements by:
手稿建立在先前工作的基础上(例如,LitSuggest,主动学习),但可以通过以下方式更好地阐述其进步:
Explicitly comparing to competing tools (ASReview, DistillerSR).
与竞争工具(ASReview,DistillerSR)进行了详细比较。
Citing recent ML/bibliometric advances (semi-supervised learning, LDA).
引用最近的 ML/文献计量学进展(半监督学习,LDA)。
Adding a comparative table/analysis to highlight novelty.
添加比较表/分析以突出新奇。
Impact if Revised: Would elevate the paper from "incremental improvement" to "clear advancement" in AI-driven literature screening.
修订后的影响:在人工智能驱动的文献筛选中,将论文从“渐进式改进”提升为“明显进步”。
RC2.4: QUALITY OF ART
RC 2。4: 艺术品质
Weaknesses (Areas for Improvement)
缺点(需要改进的地方)
I. Technical Depth & Transparency
I.技术深度和透明度
Black-Box Model: LitSuggest's underlying algorithm (SVM? BERT?) is not detailed.
黑箱模型:LitSuggest 的底层算法(SVM?BERT?)没有详细说明。
Class Imbalance: High accuracy (98.83%) may be inflated; AUPRC/balanced accuracy should be reported.
类别不平衡:高准确度(98.83%)可能会被夸大;应报告 AUPRC/平衡准确度。
PubMed Limitation: Excludes preprints, Scopus, and non-English studies.
PubMed 限制:不包括预印本、Scopus 和非英语研究。
II. Missing Benchmarking
二.缺少基准
No Comparison to Alternatives:
不与替代品进行比较:
How does it perform vs. ASReview, DistillerSR, or ChatGPT?
与 ASReview、DistillerSR 或 ChatGPT 相比,它的性能如何?
Is boundary refinement better than uncertainty sampling?
边界精化比不确定性抽样更好吗?
Limited User Study: Do biologists actually find it easier to use?
有限的用户研究:生物学家真的觉得它更容易使用吗?
III. Discussion of Limitations
三.局限性讨论
Abstract-Only Screening: May miss full-text nuances.
仅摘要筛选:可能会错过全文的细微差别。
Manual Label Dependency: Initial human input required.
手动标签依赖性:需要初始人工输入。
Generalizability: Tested on bone organoids—how well does it transfer to cancer or neuroscience?
普适性:在骨类器官上进行测试-它转移到癌症或神经科学的效果如何?
IV. Suggested Revisions
四.建议修订
To Improve Technical Soundness
提高技术稳健性
Disclose LitSuggest's Base Model (e.g., "Uses a fine-tuned BERT architecture").
披露 LitSuggest 的基本模型(例如,“使用微调的 BERT 架构”)。
Add AUPRC/Confusion Matrix for imbalanced data.
为不平衡数据添加 AUPRC/混淆矩阵。
Test on Additional Databases (e.g., Scopus, bioRxiv).
测试其他数据库(例如,Scopus,bioRxiv).
To Strengthen Novelty Claims
加强新奇性要求
Compare with ASReview/ChatGPT in a table.
在表中与 ASReview/ChatGPT 进行比较。
Cite Semi-Supervised Learning papers to justify small training sets.
引用半监督学习论文来证明小训练集的合理性。
Clarify Boundary Refinement's Innovation (vs. standard active learning).
明确边界细化的创新(与标准的主动学习)。
To Enhance Readability
增强可读性
Glossary for ML Terms (e.g., "F1-score") for non-CS readers.
ML 术语表(例如,“F1-评分”)。
Flowchart of LitSuggest's UI (since "code-free" is a key selling point).
LitSuggest 的 UI 流程图(因为“无代码”是一个关键卖点)。
RC2.5: QUALITY OF EXPERIMENTAL RESULTS
RC 2。5. 实验结果的质量
Limitations & Areas for Improvement
局限性和需要改进的地方
1. Statistical Concerns
1.统计问题
A.Class Imbalance Issues
A.阶级不平衡问题
84% of articles scored 0-0.1 (irrelevant), while only 2.5% scored 0.9-1 (relevant).
84%的文章得分为0-0.1(不相关),而只有2.5%的文章得分为0.9-1(相关)。
Accuracy (98.83%) is misleadingly high—report balanced accuracy, AUPRC, or MCC (Matthews Correlation Coefficient).
准确度(98.83%)是误导性的高报告平衡准确度,AUPRC 或 MCC(马修斯相关系数)。
B. No Confidence Intervals
B。无置信区间
Precision/recall values are point estimates. Bootstrapping or cross-validation would show variability.
精确度/召回率值是点估计值。自助法或交叉验证法会显示出变异性。
2. Data & Model Transparency
2.数据和模型透明度
A. Small Training Set (n=80)
A.小训练集(n=80)
Only 40 positive/40 negative labeled articles were used initially.
最初仅使用了40件阳性/40件阴性标记供试品。
No discussion of sample size adequacy (e.g., power analysis).
未讨论样本量充分性(例如,功率分析)。
B. Black-Box Model Details
B。黑盒模型详情
LitSuggest's base algorithm (SVM? BERT?) is not specified.
LitSuggest 的基本算法(SVM?BERT?)未指定。
Hyperparameters (learning rate, batch size) are omitted.
超参数(学习率,批量大小)被省略。
3. External Validity
3.外部效度
A. PubMed-Only Data
a.仅 PubMed 数据
Excludes preprints (bioRxiv), Scopus, or Embase, risking selection bias.
排除预印本(bioRxiv)、Scopus 或 Embase,存在选择偏倚风险。
No multilingual studies—could miss non-English literature.
没有多语种的研究-可以错过非英语文学。
B No Human-in-the-Loop Evaluation
B 无人在环评价
Does real-world usability improve for biologists? A user study (e.g., time saved vs. manual screening) would strengthen impact.
现实世界的可用性对生物学家来说有改善吗?用户研究(例如,与人工筛选相比节省的时间)将加强影响。
Recommended Improvements
建议改进
1. Address Statistical Weaknesses
1.解决统计上的弱点
Add AUPRC/balanced accuracy for imbalanced data.
增加不平衡数据的 AUPRC/平衡准确度。
Report 95% CIs for precision/recall via bootstrapping.
通过自举法报告精确度/召回率的 95% CI。
Perform cross-validation (e.g., 5-fold) to assess stability.
执行交叉验证(例如,5倍),以评估稳定性。
2. Enhance Reproducibility
2.增强生殖能力
Release labeled dataset (even a subset) in supplementary materials.
在补充材料中发布标记的数据集(甚至是子集)。
Disclose LitSuggest's architecture (e.g., "BERT-based classifier").
披露 LitSuggest 的架构(例如,“基于 BERT 的分类器”)。
3. Strengthen External Validity
3.加强外部有效性
Test on additional databases (Scopus, Web of Science).
测试其他数据库(Scopus,Web of Science)。
Include a usability study with biologists (e.g., "How intuitive is LitSuggest?").
包括生物学家的可用性研究(例如,“LitSuggest 有多直观?").
4. Clarify Limitations
4.澄清限制
Explicitly discuss:
解释性讨论:
Abstract-only screening risks (vs. full-text).
仅摘要筛选风险(vs.全文)。
PubMed bias (missing gray literature/non-English papers).
PubMed 偏倚(缺失灰色文献/非英文论文)。
RC2.6: APPROPRIATENESS TO JOURNAL
RC 2。6: 期刊的适当性
It fits to JBI.
它适合 JBI。
RC2.7: IMPORTANCE TO THE FIELD
RC 2。7. 对外地的重要性
This manuscript presents a robust, user-friendly AI methodology for literature screening, with demonstrated novelty in boundary refinement and score-based analysis. Its generalizability is supported by validation across related domains, though broader database integration could enhance impact. The focus on methodological innovation rather than a specific clinical problem aligns well with the journal's scope, making it a strong candidate for publication after minor clarifications.
本文提出了一种用于文献筛选的强大的,用户友好的 AI 方法,在边界细化和基于评分的分析方面具有新奇。它的普遍性得到了相关领域验证的支持,尽管更广泛的数据库集成可能会增强影响。对方法创新的关注,而不是一个具体的临床问题,与该杂志的范围很好地吻合,使其成为一个强有力的候选人后,轻微的澄清出版。
RC2.8: ORGANIZATION AND CLARITY
RC 2。8: 组织和清晰度
The manuscript is well-structured and logically organized, but some sections could be streamlined or clarified to improve readability and impact. Below is a detailed analysis:
该手稿结构良好,组织合理,但有些部分可以简化或澄清,以提高可读性和影响力。下面是详细的分析:
1. Overly Long Introduction
1.过长的介绍
Problem statement could be condensed (currently ~4 paragraphs).
问题陈述可以压缩(目前约为4段)。
LitSuggest background could be moved to Methods or Supplementary Materials.
LitSuggest 背景可移至方法或补充材料。
2. Methods Section: Too Detailed for Casual Readers
2.方法部分:对于普通读者来说太详细了
Highly technical subsections (e.g., "Boundary Refinement") may lose non-ML audiences.
高度技术性的小节(例如,“边界细化”)可能会失去非 ML 受众。
Recommendation: Add a simplified workflow summary early in Methods.
建议:在方法的早期添加简化的工作流摘要。
3. Results: Some Redundancy
3.结果:一些红点
Performance metrics are repeated in text, tables, and figures.
性能指标在文本、表格和图中重复。
Recommendation: Use bullet points or subheaders to highlight key findings.
建议:使用项目符号或小标题来突出关键发现。
4. Discussion: Could Better Link to Prior Work
4.讨论:可以更好地链接到以前的工作
Comparison to other tools (ASReview, ChatGPT) is missing.
缺少与其他工具(ASReview、ChatGPT)的比较。
Recommendation: Add a dedicated paragraph benchmarking results.
建议:增加一个专门段落,对成果进行基准测试。
5. Minor Language Issues
5.次要语言问题
Occasional passive voice (e.g., "was performed" → "we performed").
偶尔的被动语态(例如,“执行”→“我们执行”)。
Long sentences (e.g., >30 words) could be split.
长句(例如,(30字)可以拆分。
1. Streamline the Introduction
1.简化导言
Current: 4+ paragraphs on literature screening challenges.
当前:4+关于文献筛选挑战的段落。
Revised: Focus on 3 key points:
修订:重点关注3个要点:
Problem: Information overload in bone organoid research.
问题:骨类器官研究中的信息过载。
Gap: Lack of accessible, code-free AI tools.
差距:缺乏可访问的、无代码的 AI 工具。
Solution: This study's optimized LitSuggest workflow.
解决方案:本研究的优化 LitSuggest 工作流程。
2. Simplify the Methods Section
2.方法科
Add a workflow summary box (e.g., "Step 1: Data collection → Step 2: Model training").
添加工作流摘要框(例如,“步骤1:数据收集→步骤2:模型训练”)。
Move technical details (e.g., hyperparameters) to Supplementary Materials.
移动技术细节(例如,hyperparameters)到补充材料。
3. Improve Results Readability
3.提高结果可读性
Use subheaders (e.g., "3.1 Model Performance" → "3.2 Validation Results").
使用子标题(例如,“3.1模型性能”→“3.2验证结果”)。
Replace repetitive text with a performance summary table.
用性能汇总表替换重复文本。
4. Strengthen the Discussion
4.加强讨论
Add a benchmarking table comparing this work to ASReview, DistillerSR, etc.
添加一个基准表,将此工作与 ASReview、DistillerSR 等进行比较。
Explicitly state limitations (e.g., PubMed-only data) in a dedicated subsection.
明确的状态限制(例如,仅 PubMed 数据)。
5. Proofread for Conciseness
5.校对简明
Example revision:
修订示例:
Before: "The model was trained using 40 positive and 40 negative examples, which were manually labeled by two independent reviewers."
该模型使用40个阳性和40个阴性样本进行训练,这些样本由两名独立的评审员手动标记。"
After: "Two reviewers labeled 80 articles (40 relevant, 40 irrelevant) for training."
之后:“两名评审员标记了80篇文章(40篇相关,40篇无关)进行培训。"
RC2.9: LENGTH
RC 2。9: 长度
Section Word Count (Est.) Evaluation
章节字数(估计)评价
Abstract ~250 Concise and clear.
第250章简洁明了
Introduction ~1,200 Too long (could be 800).
导言~ 1,200太长(可能是800)。
Methods ~2,500 Overly detailed (move some to Supplements).
方法~ 2,500过于详细(将一些移动到补充)。
Results ~3,000 Redundant metrics (can summarize in tables).
结果~ 3,000个冗余指标(可在表格中汇总)。
Discussion ~2,000 Well-balanced.
讨论~ 2,000均衡。
Total ~8,950 Could be trimmed to ~7,500 without losing substance.
总计~ 8,950可以削减到~ 7,500而不损失实质内容。
Where to Cut Without Losing Value
在哪里削减而不失去价值
1. Introduction (Reduce by ~30%)
1.介绍(减少约30%)
Current: Spends ~4 paragraphs on general literature screening challenges.
当前:花费约4段时间讨论一般文献筛选挑战。
RC2.10: Revised Focus:
RC 2.10: 修订后的重点:
1 paragraph on the problem (information overload in bone organoids).
1段关于问题(骨类器官中的信息过载)。
1 paragraph on gaps (lack of user-friendly AI tools).
1段关于差距(缺乏用户友好的人工智能工具)。
1 paragraph on this study's solution.
1段关于本研究的解决方案。
2. Methods (Move Technical Details to Supplements)
2.方法(将技术细节移至补充)
Trim:
修剪:
Hyperparameter tuning → Supplementary Table.
超参数调整→补充表。
LitSuggest architecture details → Supplement (unless novel).
LitSuggest architecture details →补充(除非是新的)。
Keep: Workflow, optimization strategies, validation steps.
保持:工作流程,优化策略,验证步骤。
3. Results (Summarize Repetitive Metrics)
3.结果(总结重复测试)
Instead of: Repeating precision/recall in text + tables + figures...
而不是:在文本+表格+图表中重复精确度/召回率......
Use:
用途:
Bullet points for key findings.
关键发现要点。
One consolidated performance table (training + validation).
一个统一的绩效表(培训+验证)。
4. Discussion (Keep Intact but Add Benchmarking)
4.讨论(保持完整,但添加基准)
Already well-structured, but:
结构良好,但:
Add a short table comparing to ASReview/DistillerSR/ChatGPT.
添加与 ASReview/DistillerSR/ChatGPT 进行比较的简短表格。