Evaluating NER HuggingFace models for a domain
Confused about Huggingface Transformers for NER models
Named-Entity Recognition of Long Texts Using HuggingFace's "ner" Pipeline
Training NER models for detecting custom entities
Have you tried Flair or training a custom BERT model with HuggingFace? We also use spaCy's very robust pattern matching rules that allow for regex, POS, etc. Might be worth a try depending on your situation.
More on reddit.comVideos
Hi everyone,
I'm comparing off-the-shelf NER systems to one another to see how they perform on literary-historical data (more specifically: a set of books from the 17th century -19th century). I'm not training or improving the models, but trying to use the ones which are available to see how they perform, to decide if they can later be used in historical research and information extraction contexts.
I think I understand how to evaluate tools such as spaCy and NLTK, by transforming the output labels into the formats required by e.g. the Python packages nervaluate and seqeval. These both return quantitative metrics (F1, precision, recall,...) necessary to evaluate how the models perform on this data type/domain.
I'm not experienced with HuggingFace/transformers and it's quite hard to find sources that have done this before (or is it just me?). I'm wondering if there's an "elegant" way to evaluate these models for a domain. Does it make sense to do it the same way as I evaluated spaCy and NLTK, as specified below?
At my disposal: a small gold standard dataset labelled with the "location" entity (IOB2).
Steps:
-
align tokenizations and labels of the gold standard and the model output to create two lists of equal length (using the package pytokenizations).
-
map labels (e.g.: spaCy's "LOC" & "GPE" become "LOCATION", as in the gold standard data).
-
calculate metrics using nervaluate/seqeval.
It seems so convoluted to apply this methodology, but I haven't been able to find a better way. Am I overlooking or not grasping something? Is there an amazing evaluation package or research on evaluation methods of transformers which I don't know about?
Thank you for your help!