spacy named entity recognition

spaCy named entity recognition does not seem to work if the entity was at the beginning of the string

github.com › explosion › spaCy › discussions › 12612

Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

The size of the training data;
the quality of the training data;
the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

spaCy

spacy.io › api › entityrecognizer

EntityRecognizer · spaCy API Documentation

A transition-based named entity recognition component. The entity recognizer identifies non-overlapping labelled spans of tokens.

GeeksforGeeks

geeksforgeeks.org › python › python-named-entity-recognition-ner-using-spacy

Python | Named Entity Recognition (NER) using spaCy - GeeksforGeeks

July 12, 2025 - These "named entities" include proper nouns like people, organizations, locations and other meaningful categories such as dates, monetary values and products. By tagging these entities, we can transform raw text into structured data that can ...

Discussions

python - Can a Named Entity Recognition (NER) spaCy model or any code like an entity ruler around it catch my new further date patterns also as DATE entities? - Stack Overflow

Anonymization of entities found by a NER model I try to anonymize files by means of a NER model for German text that sometimes may have a few English words. If I take spaCy NER models for German and More on stackoverflow.com

stackoverflow.com

Named Entity Recognition for Resume Parsing

I've had a good experience using spaCy, but I only use it for names, although, by default, it will also attempt to extract organizations and locations.

I use it in a non-English language, and it does require some rather extensive text pre-formatting, but it is very accurate (certainly more than 70%) - even when extracting names that are in other languages - and very fast too.

I'm sure you can fine tune it to extract other types of entities, and they even have a visual tool to assist with that that looks pretty awesome.

By comparison, I used BERT for a multi-label classification project; it takes way longer and was way more complicated to setup.

Videos

spacy.io

Named Entity Recognition (NER) using spaCy · spaCy Universe

25:12

YouTube

Named Entity Recognition (NER) in Python: Pre-Trained & Custom ...

January 5, 2025

05:01

YouTube

Best way to do Named Entity Recognition in 2024 with GliNER and ...

March 19, 2024

11:24

YouTube

How to USE Named Entity Recognition (NER) Models | NLP | Text ...

github.com › explosion › spaCy › discussions › 12612

spaCy named entity recognition does not seem to work if the entity was at the beginning of the string · explosion/spaCy · Discussion #12612

Author explosion

Top answer

1 of 1

1

Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

The size of the training data;
the quality of the training data;
the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

Towards Data Science

towardsdatascience.com › home › latest › custom named entity recognition with bert

Custom Named Entity Recognition with BERT | Towards Data Science

March 5, 2025 - is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary ...

Stack Overflow

stackoverflow.com › questions › 77700760 › can-a-named-entity-recognition-ner-spacy-model-or-any-code-like-an-entity-rule

python - Can a Named Entity Recognition (NER) spaCy model or any code like an entity ruler around it catch my new further date patterns also as DATE entities? - Stack Overflow

If I take spaCy NER models for German and English like de_core_news_sm and en_core_web_sm, they find town names or persons, and at least the English model finds "Dezember 2022", but it does not find the full date like "15. Dezember 2022". I cannot change the matches of the model. I thought I could take an entity ruler to change the NER model, but the NER model seems to be fixed, and I do not know how my own entity ruler can outweigh the spaCy NER model, and also, how I can get any entity ruler to work at all, even if I disable the NER model.

Dataknowsall

dataknowsall.com › blog › ner.html

An Accessible Guide to Named Entity Recognition

March 5, 2024 - Spacy has a wonderful ability to render NER tags in line with the text, a fantastic way to see what's being recognized in the context of the original article. NER models as they come trained are fantastic if you're a reporter covering Washington, DC.

Medium

medium.com › @sanskrutikhedkar09 › mastering-information-extraction-from-unstructured-text-a-deep-dive-into-named-entity-recognition-4aa2f664a453

Mastering Information Extraction from Unstructured Text: A Deep Dive into Named Entity Recognition with spaCy | by Sanskrutikhedkar | Medium

October 27, 2023 - Named Entity Recognition (NER): SpaCy can identify named entities in text, such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Find elsewhere

Google Bing Mojeek

Sematext

sematext.com › home › blog › entity extraction with spacy

Entity Extraction with spaCy

Yoast SEO for WordPress

Yoast SEO is the most complete WordPress SEO plugin. It handles the technical optimization of your site & assists with optimizing your content.

Price $69.00

spaCy

spacy.io

spaCy · Industrial-strength Natural Language Processing in Python

Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more

CodeSignal

codesignal.com › learn › courses › linguistics-for-token-classification-in-spacy › lessons › unveiling-the-essentials-of-entity-recognition-with-spacy

Unveiling the Essentials of Entity Recognition with spaCy

As mentioned above, spaCy has a built-in Named Entity Recognition system that can recognize a wide variety of named or numerical entities. This comes as a part of spaCy's statistical models and not all the language models support it.

spaCy

spacy.io › universe › project › video-spacys-ner-model

spaCy's NER model · spaCy Universe

Incremental parsing with bloom embeddings and residual CNNs

Analytics Vidhya

analyticsvidhya.com › home › named entity recognition (ner) in python with spacy

Named Entity Recognition (NER) in Python with Spacy

May 1, 2025 - A. SpaCy NER (Named Entity Recognition) is a feature of the spaCy library used for natural language processing. It automatically identifies and categorizes named entities (e.g., persons, organizations, locations, dates) in text data.

Python Humanities

ner.pythonhumanities.com › 03_02_train_spacy_ner_model.html

7. How to Train spaCy NER Model — Introduction to Named Entity Recognition

In the last notebook, we created a basic training set for a machine learning model using spaCy’s EntityRuler. We were able to do this by making certain presumptions about things that are very likely or certainly going to fall under a specific label. Such an approach to cultivating a training ...

FutureSmart AI

blog.futuresmart.ai › building-a-custom-ner-model-with-spacy-a-step-by-step-guide

Building a Custom NER Model with SpaCy: A Step-by-Step Guide

June 21, 2023 - This blog post will guide you through ... Named Entity Recognition (NER) is a subtask of natural language processing that focuses on identifying and classifying named entities within the text....

Kaggle

kaggle.com › code › abhisarangan › ner-using-spacy

NER using Spacy

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

Prodigy

prodi.gy › docs › named-entity-recognition

Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP

Let’s say you want to train a model for financial news with labels for person names, organizations, monetary amounts and ticker symbols. This is a very achievable task for named entity recognition. spaCy’s English models already predict PERSON, ORG and MONEY, so you can correct its suggestions for these labels and add annotations for your new TICKER label.

spaCy

spacy.io › usage › spacy-101

spaCy 101: Everything you need to know · spaCy Usage Documentation

A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction.

spaCy

spacy.io › usage › linguistic-features

Linguistic Features · spaCy Usage Documentation

spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens. The default trained pipelines can identify a variety of named and numeric entities, including companies, locations, ...

Plain English

python.plainenglish.io › optimize-your-spacy-ner-results-with-this-simple-change-e59937c411ab

Optimize Your SpaCy NER Results with This Simple Change | by Pranjal Saxena | Python in Plain English

July 6, 2023 - The Spacy Transformer model is a powerful tool for NLP tasks, especially Named Entity Recognition. It outperforms the Spacy Large model in terms of accuracy and performance.

Analytics Vidhya

analyticsvidhya.com › home › custom named entity recognition using spacy v3

Custom Named Entity Recognition using spaCy v3 - Analytics Vidhya

October 14, 2024 - In this article, you will learn to develop custom named entity recognition which helps to train our custom NER pipeline using spacy v3.