[D] Named Entity Recognition (NER) Libraries
Named Entity Recognition with a small dataset
Can you make more examples? Annotating NER isn't that hard, there are user-friendly annotation tools and you (or an exploited student) can get much more than 400 annotated lines within a single day. Alternatively, is your problem really that unique so that your 400 lines is all there is, is there really no way to find a larger existing dataset somewhere in the world?
Also, NER tends to rely on two types of training data - contextual examples (annotated sentences i.e. BIO format) and gazetteers (databases of known entity names). If you have few examples, perhaps can you get a good/large gazetteer for your problem domain?
More on reddit.com[D] Best large language model for Named Entity Extraction?
How to build a NER?
What is Named Entity Recognition (NER)?
What are common applications of NER?
Why is NER important for NLP?
Videos
Hi everyone, I have to cluster a large chunk of textual conversational business data to find relevant topics in it.
Since there is lot of abstract info in every text like phone, url, numbers, email, name, etc., I have done some basic NER using regex and spacy NER to tag such info and make the texts more generic and canonicalized.
But there are some things like product names, raw materials, brand/model, company, etc. which couldn't be tagged. Also, the accuracy of regex and spacy NER isn't high enough.
Can anyone suggest a good python NER library, which is accurate and fast enough, preferably has pre-trained models and can tag diverse fields.
Thanks.