In language modeling, which approach excels at capturing long-range dependencies in text data?
N-gram models
Transformer-based neural language models
Overlook minor misbehaviors
Impose harsh punishments for any infraction

Computational Linguistics Exercises are loading ...