ACL’23 Peer Review Form

Please read the detailed explanation of the form before entering your review.

1. In-Depth Review

The answers to the following questions are mandatory, and will be shared with both the committee and the authors.

What is this paper about and what contributions does it make?

Please describe what problem or question this paper addresses, and the main contributions that it makes towards a solution or answer.

The following kinds of contributions are all welcome at ACL: computationally-aided linguistic analysis, NLP engineering experiment, reproduction study, new data resources (particularly for low-resource languages), approaches for data- and compute efficiency, position papers, surveys, publicly available software and pre-trained models.

Reasons to accept

What are the strengths of this paper and what would be the main benefits to the NLP community if this paper were to be presented at the conference or accepted into Findings?

Reasons to reject

What are the weaknesses of this paper and what would be the main risks of having this paper presented at the conference (other than lack of space to present better papers) or accepted into Findings?

Please be sure to follow the conference policies on what should not be considered weaknesses. The authors and meta-reviewers will be aware of these guidelines.


2. Questions and Additional Feedback for the Author(s)

The following review elements are optional. They will be shared with both the committee and the authors, but are primarily for the authors.

Questions for the Author(s)

Please write any questions you have for the author(s) that you would like answers for in the author response, particularly those that are relevant for your overall recommendation. Please letter these questions (we prefer letters over numbers since reviewers will have numbers), so that the authors and other reviewers can easily refer to them in the discussion (e.g., Question 2C is Question C from Reviewer 2).

Some common questions are covered in the Responsible NLP Checklist. Please see the authors’ answers first.

Question A:

Question B:

Question C:

Missing References

Please list any references that should be included in the bibliography or need to be discussed in more depth. If you believe that this work is not novel, misses previous work or baselines, give the full citation(s) below. Remember that contemporaneous work (in the three months before the deadline) is not required to be cited and should not be held against the authors.

Typos, Grammar, Style, and Presentation Improvements

Please list any typographical or grammatical errors, as well as any stylistic issues that should be improved. In addition, if there is anything in the paper that you found difficult to follow, please suggest how it could be better organized, motivated, or explained. Be sure to include line numbers for easy reference.


3. Reproducibility, Ethics Review, Anonymity Requirement and Overall Recommendation

The answers to the following questions, except for the overall recommendation, will be shared with the committee only, not the authors.

Overall Recommendation

Soundness

Should this paper be accepted to any of the venues associated with ACL 2023? For this question, do not consider novelty/excitement (we will ask you about this more subjective issue separately.)

Rather, please focus on the more objective question of how sound and thorough is this study? Does the paper clearly state scientific claims and provide adequate support for them? Please consider:

  • For experimental papers: the depth and/or breadth of the research questions investigated, technical soundness of experiments, methodological validity of evaluation.
  • For position papers, surveys: the current state of the field is adequately represented, and main counter-arguments acknowledged.
  • For resource papers: data collection methodology, resulting data & the difference from existing resources are described in sufficient detail.

Please adjust your baseline to an average *ACL paper of the given type. A long conference paper is necessarily not as thorough as a 30-page journal paper, and a short paper cannot be as thorough as a long paper.

Please adhere to the following score definitions. If you believe that the paper is not relevant to ACL, and hence the judgement of soundness or excitement is not relevant, please select “1” in both questions in this rubric.

  • 5 = Excellent: This study is one of the most thorough I have seen, given its type.
  • 4 = Strong: This study provides sufficient support for all of its claims/arguments. Some extra experiments could be nice, but not essential.
  • 3 = Good: This study provides sufficient support for its major claims/arguments, some minor points may need extra support or details.
  • 2 = Borderline: Some of the main claims/arguments are not sufficiently supported, there are major technical/methodological problems.
  • 1 = Poor: This study is not yet sufficiently thorough to warrant publication or is not relevant to ACL.

Excitement (Long paper)

How excited/enthusiastic are you for this paper to be accepted to ACL 2023? Excitement is a more subjective category than soundness, and it could come from one or more sources, including:

  • your perception of the novelty of this paper: you would like to see it accepted because its contributions change its subfield. This includes not only conceptual breakthroughs, but experimental evidence for common intuitions or assumptions, corrections or negative results for a widely held belief/practice, evidence of previously unknown issues with specific datasets or metrics that change how they should be used, etc.
  • your feeling that the paper is interesting to you personally: you’ve learned a lot from it, you see something that would be useful practically, help to establish cross-disciplinary connections etc. You are likely to grab your closest colleagues and say “you’ve got to read this paper”.
  • your perception of the paper’s potential impact: you believe that it could be very influential in e.g. lowering the barriers to performing certain work, reducing the computation or annotation costs, providing a resource/artifact enabling applications impossible before. A contribution can be impactful for broad or narrow community: it does not have to target a popular task/architecture or be backed by an industry PR campaign.

Please adhere to the score definitions below when scoring papers.

  • 5 = Transformative: This paper is likely to change its subfield or computational linguistics broadly. It should be considered for a best paper award. This paper changes the current understanding of some phenomenon, shows a widely held practice to be erroneous in some way, enables a promising direction of research for a (broad or narrow) topic, or creates an exciting new technique.
  • 4.5 = Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
  • 4 = Strong: This paper deepens the understanding of some phenomenon or lowers the barriers to an existing research direction.
  • 3.5 = Leaning positive: While worthy of acceptance, the work it describes is not particularly interesting and/or novel, so it will not be a big loss if people don’t see it in this conference.
  • 3 = Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., it describes incremental work), and it can significantly benefit from another round of revision. However, I won’t object to accepting it if my co-reviewers champion it.
  • 2.5 = Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
  • 2 = Mediocre: This paper makes marginal contributions (vs non-contemporaneous work), so I would rather not see it in the conference.
  • 1.5 = Weak: I am pretty confident that it should be rejected.
  • 1 = Poor: I cannot identify the contributions of this paper, or I believe the claims are not sufficiently backed up by evidence. I would fight to have it rejected.

Excitement (Short Paper)

How excited/enthusiastic are you for this paper to be accepted to ACL 2023? Excitement is a more subjective category than soundness, and it could come from one or more sources, including:

  • your perception of the novelty of this paper: you would like to see it accepted because its contributions change its subfield. This includes not only conceptual breakthroughs, but experimental evidence for common intuitions or assumptions, corrections or negative results for a widely held belief/practice, evidence of previously unknown issues with specific datasets or metrics that change how they should be used, etc.
  • your feeling that the paper is interesting to you personally: you’ve learned a lot from it, you see something that would be useful practically, help to establish cross-disciplinary connections etc. You are likely to grab your closest colleagues and say “you’ve got to read this paper”.
  • your perception of the paper’s potential impact: you believe that it could be very influential in e.g. lowering the barriers to performing certain work, reducing the computation or annotation costs, providing a resource/artifact enabling applications impossible before. A contribution can be impactful for broad or narrow community: it does not have to target a popular task/architecture or be backed by an industry PR campaign.

This is a short paper, please adjust your baseline accordingly. Short papers are not meant to describe a big conceptual breakthrough, but they can still be novel/impactful by e.g providing a core resource for a new subfield/language, a well-argued counter-perspective, a significant negative result for something that is commonly used, etc.

Please adhere to the score definitions below when scoring papers.

  • 5 = Transformative: This paper is likely to change its subfield or computational linguistics broadly. It should be considered for a best paper award. This paper changes the current understanding of some phenomenon, shows a widely held practice to be erroneous in some way, enables a promising direction of research for a (broad or narrow) topic, or creates an exciting new technique.
  • 4.5 = Exciting: It changed my thinking on this topic. I would fight for it to be accepted.
  • 4 = Strong: This paper deepens the understanding of some phenomenon or lowers the barriers to an existing research direction.
  • 3.5 = Leaning positive: While worthy of acceptance, the work it describes is not particularly interesting and/or novel, so it will not be a big loss if people don’t see it in this conference.
  • 3 = Ambivalent: It has merits (e.g., it reports state-of-the-art results, the idea is nice), but there are key weaknesses (e.g., it describes incremental work), and it can significantly benefit from another round of revision. However, I won’t object to accepting it if my co-reviewers champion it.
  • 2.5 = Leaning negative: I am leaning towards rejection, but I can be persuaded if my co-reviewers think otherwise.
  • 2 = Mediocre: This paper makes marginal contributions (vs non-contemporaneous work), so I would rather not see it in the conference.
  • 1.5 = Weak: I am pretty confident that it should be rejected.
  • 1 = Poor: I cannot identify the contributions of this paper, or I believe the claims are not sufficiently backed up by evidence. I would fight to have it rejected.

Reviewer Confidence

How confident are you in your assessment of this paper?

  • 5 = Positive that my evaluation is correct. I read the paper very carefully and I am very familiar with related work.
  • 4 = Quite sure. I tried to check the important points carefully. It’s unlikely, though conceivable, that I missed something that should affect my ratings.
  • 3 = Pretty sure, but there’s a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper’s details, e.g., the math, experimental design, or novelty.
  • 2 = Willing to defend my evaluation, but it is fairly likely that I missed some details, didn’t understand some central points, or can’t be sure about the novelty of the work.
  • 1 = Not my area, or paper was hard for me to understand. My evaluation is just an educated guess.

Recommendation for Best Paper Award

Do you think this paper should be considered for a Best Paper Award? There will be separate Best Paper Awards for long and for short papers. In addition, we will have several Outstanding Paper Awards.

  • Yes
  • No

Justification for Award Recommendations
Please describe briefly why you think this paper should receive an award. Your comments will not be shared with the authors. However, if the paper receives an award, it is possible that some of your comments may be made public (but remain anonymous) in the award citation.

Reproducibility and Ethics

Reproducibility

How do you rate the paper’s reproducibility? Will members of the ACL community be able to reproduce or verify the results in this paper?

  • 5 = Could easily reproduce the results.
  • 4 = Could mostly reproduce the results, but there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method.
  • 3 = Could reproduce the results with some difficulty. The settings of parameters are underspecified or subjectively determined; the training/evaluation data are not widely available.
  • 2 = Would be hard pressed to reproduce the results. The contribution depends on data that are simply not available outside the author’s institution or consortium; not enough details are provided.
  • 1 = Could not reproduce the results here no matter how hard they tried.
  • N/A = Doesn’t apply, since the paper does not include empirical results.

Checklist feedback

Are the authors’ answers to the Responsible NLP Checklist useful for evaluating the submission? Note that this question is for us to collect feedback regarding the usefulness of the checklist, and is not about evaluating the paper.

Ethical Concerns

Independent of your judgement of the quality of the work, please consider any ethical implications. Please review the relevant Ethics review questions and the Ethics FAQ, as needed. Should this paper be sent for an in-depth ethics review?

  • No
  • Yes

We have a small ethics committee that can specially review very challenging papers when it comes to ethical issues. If this seems to be such a paper, then please explain why here, and we will try to ensure that it receives a separate review.

Anonymity

Do you know the identity of some authors of this paper?

  • 5 = Yes, I have seen a non-anonymized version of the paper (including the case where only the title and authors are posted), posted online by authors or others after December 20, 2022.
  • 4 = Yes. I have seen a non-anonymized version of the paper, posted online on or before December 20, 2022.
  • 3 = Yes. I know the authors’ identities via other means (e.g., being a senior area chair of another conference to which the paper was submitted)
  • 2 = Not sure but I have a good guess. While I have not seen a non-anonymized version of the paper online, I have a pretty guess of the authors based on the paper content.
  • 1 = No. I don’t know who the authors are.

Notice that only option 5 may be in violation of the anonymity policy. The reviewer should provide a detailed review regardless of the answer to this question. Note that for ICLR submissions, if they were deanonymized upon acceptance/rejection, this does not count as a violation if there was another publicly accessible version. For papers withdrawn from ICLR before December 20, this also does not count as an anonymity violation.

If you choose 3, 4, or 5 for the Author Identify question, please provide more details (e.g., the URL of the version posted online).


4. Changes after the Rebuttal Period

The answers to the following questions will be shared with the committee only, not the authors.

Author Response

Have you read the author response?

  • 4 = N/A: this is before the rebuttal period.
  • 3 = N/A: the authors did not provide response during the rebuttal period.
  • 2 = Yes: I have read the response.
  • 1 = No: I have not read the response.

Review Update

Review Update

After reading the author response and having discussions with other reviewers, have you changed your scores?
We are aware that there was some confusion about what the scores meant, please see our clarification. If you changed your scores because you now interpret them differently and not due to response or discussion, please choose “No, I didn’t change my mind”.

  • 6 = N/A: this is before the rebuttal period.
  • 5 = N/A, as the authors did not provide response during the rebuttal period.
  • 4 = Yes, the response/discussion changed my mind on ‘soundness’.
  • 3 = Yes, the response/discussion changed my mind on ‘excitement’.
  • 2 = Yes, the response/discussion changed my mind on both.
  • 1 = No, I didn’t change my mind.

5. Suitability for Media Dissemination

The answers to the following questions will be shared with the committee only, not the authors. This should also be completed after the rebuttal period.

Recommendation for Media Dissemination

We plan to invite some authors to write lay summaries of their work and share those summaries to journalists. Do you think the paper might have particular public interest?

  • Yes
  • No

Public interest justification: if yes, please describe your reason briefly.


6. Confidential Information

The answers to the following questions will be shared with the committee only, not the authors.

Confidential Comments to the Area Chair and Peer Reviewers

Enter any information that you want to share with the area chair and other reviewers assigned to this paper. For instance, a very strong (negative) opinion on the paper, which might offend the authors in some way, or something that would expose your identity to the authors.

Confidential Comments to Senior Area Chairs and PC Chairs

Is there anything you want to say to the Senior Area Chairs and PCs only? For example, anything that you don’t want other reviewers and the area chair to see?