spacy named entity recognition example

spaCy named entity recognition does not seem to work if the entity was at the beginning of the string

github.com › explosion › spaCy › discussions › 12612

Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

The size of the training data;
the quality of the training data;
the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

spaCy

spacy.io › usage › spacy-101

spaCy 101: Everything you need to know · spaCy Usage Documentation

Using spaCy’s built-in displaCy visualizer, here’s what our example sentence and its named entities look like: ... U.K. GPE startup for ... To learn more about entity recognition in spaCy, how to add your own entities to a document and how to train and update the entity predictions of a model, see the usage guides on named entity recognition and training pipelines.

spaCy

spacy.io › api › entityrecognizer

EntityRecognizer · spaCy API Documentation

A transition-based named entity recognition component. The entity recognizer identifies non-overlapping labelled spans of tokens.

Videos

22:34

YouTube

Named Entity Recognition (NER): NLP Tutorial For Beginners - S1 ...

Clinical Named Entity Recognition in Python with Spacy - YouTube

February 23, 2022

25:12

YouTube

Named Entity Recognition (NER) in Python: Pre-Trained & Custom ...

January 5, 2025

05:01

YouTube

Best way to do Named Entity Recognition in 2024 with GliNER and ...

March 19, 2024

View all

GitHub

github.com › explosion › spaCy › discussions › 12612

spaCy named entity recognition does not seem to work if the entity was at the beginning of the string · explosion/spaCy · Discussion #12612

Author explosion

Top answer

1 of 1

1

Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

The size of the training data;
the quality of the training data;
the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

spaCy

spacy.io › usage › linguistic-features

Linguistic Features · spaCy Usage Documentation

A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction.

GeeksforGeeks

geeksforgeeks.org › python › python-named-entity-recognition-ner-using-spacy

Python | Named Entity Recognition (NER) using spaCy - GeeksforGeeks

July 12, 2025 - Efficient pipeline processing: ... tagging, dependency parsing and named entity recognition. Customizability: We can train custom models or manually defining new entities. Here is the step by step procedure to do NER using spaCy:...

Analytics Vidhya

analyticsvidhya.com › home › named entity recognition (ner) in python with spacy

Named Entity Recognition (NER) in Python with Spacy

May 1, 2025 - ... A named entity is basically ... object, or geographic entity. For example, named entities would be Roger Federer, Honda city, Samsung Galaxy S10....

spaCy

spacy.io › universe › project › video-spacys-ner-model-alt

Named Entity Recognition (NER) using spaCy · spaCy Universe

spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.

spaCy

spacy.io › usage › training

Training Pipelines & Models · spaCy Usage Documentation

The function Example.from_dict takes a dictionary with keyword arguments specifying the annotations, like tags or entities. Using the resulting Example object and its gold-standard annotations, the model can be updated to learn a sentence of three words with their assigned part-of-speech tags. Here’s another example that shows how to define gold-standard named entities.

Find elsewhere

Google Bing Mojeek

Kaggle

kaggle.com › code › abhisarangan › ner-using-spacy

NER using Spacy

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

Medium

medium.com › ubiai-nlp › fine-tuning-spacy-models-customizing-named-entity-recognition-for-domain-specific-data-3d17c5fc72ae

Fine-Tuning SpaCy Models: Customizing Named Entity Recognition for Domain-Specific Data | by Wiem Souai | UBIAI NLP | Medium

February 6, 2024 - As an open-source library, SpaCy provides pre-trained models for essential tasks like part-of-speech tagging, named entity recognition, and dependency parsing. Its distinguishing features include exceptional speed and memory efficiency, enabling it to handle substantial volumes of text in real-time effectively.

Stack Overflow

stackoverflow.com › questions › 48200524 › named-entity-recognition-in-spacy

python - Named entity recognition in Spacy - Stack Overflow

Top answer

1 of 2

24

As per spacy documentation for Name Entity Recognition here is the way to extract name entity

import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Result
Name Entity: (China,)

To make "Alphabet" a 'Noun' append it with "The".

doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Name Entity: (Alphabet, China)

2 of 2

1

In Spacy version 3 the Transformers from Hugging Face are fine-tuned to the operations that Spacy provided in previous versions, but with better results.

Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on

Install the dependencies:

sudo apt install libncurses5

pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html

pip install spacy-nightly # I'm using 3.0.0rc2

Download a model:

python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base

Here's a list of available models.

And then use it as you would normally do:

import spacy


text = 'Type something here which can be related to something, e.g Stack Over Flow organization'

nlp = spacy.load('en_core_web_trf')

document = nlp(text)

print(document.ents)

References:

Learn about Transformers and Attention.

Read a summary about the different Trasnformers architectures.

Learn about the Transformers fine-tune done by Spacy.

spaCy

spacy.io › universe › project › video-spacys-ner-model

spaCy's NER model · spaCy Universe

Incremental parsing with bloom embeddings and residual CNNs

Towards Data Science

towardsdatascience.com › home › latest › named entity recognition with spacy and the mighty roberta

Named Entity Recognition with Spacy and the Mighty roBERTa | Towards Data Science

March 5, 2025 - An opensource library for ... spaCy successfully identified CNN as an Organisation (ORG), Amy Schneider as a PERSON, Oakland, and California as Geo-Political Entity (GEP), etc....

Medium

medium.com › @sanskrutikhedkar09 › mastering-information-extraction-from-unstructured-text-a-deep-dive-into-named-entity-recognition-4aa2f664a453

Mastering Information Extraction from Unstructured Text: A Deep Dive into Named Entity Recognition with spaCy | by Sanskrutikhedkar | Medium

October 27, 2023 - Named Entity Recognition (NER): SpaCy can identify named entities in text, such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Python Humanities

ner.pythonhumanities.com › 01_02_introduction_to_spacy.html

2. Introduction to spaCy — Introduction to Named Entity Recognition

For our purposes right now, I simply want to print off each entity’s text (the string itself) and its corresponding label (note the _ after label). I will be explaining this process in much greater detail in the next two notebooks. ... As we can see the small spaCy statistical machine learning model has correctly identified that Martin J.

Medium

medium.com › analytics-vidhya › named-entity-recognition-with-spacy-2ecfa4114162

Named Entity Recognition (NER) with spaCy | by Sanidhya Singh | Analytics Vidhya | Medium

May 2, 2022 - spaCy supports the following entity types for models trained on the OntoNotes 5. ... Let’s take a look at an example, we are loading the “en_core_web_lg” model for NER. The model is English multi-task CNN trained on OntoNotes, with GloVe ...

CodeSignal

codesignal.com › learn › courses › linguistics-for-token-classification-in-spacy › lessons › unveiling-the-essentials-of-entity-recognition-with-spacy

Unveiling the Essentials of Entity Recognition with spaCy

However, the model we are using, en_core_web_sm, supports Named Entity Recognition. When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. Doc is then processed in several different steps – this is also known as the processing pipeline.

Medium

medium.com › mlearning-ai › named-entity-recognition-with-spacy-fd834ff84b86

Named Entity Recognition with spaCy | by FS Ndzomga | MLearning.ai | Medium

March 5, 2023 - The model also includes additional entity types, such as product names, languages, and nationalities. While spaCy’s pre-trained NER model is quite powerful, it may not always be sufficient for certain tasks or domains. In such cases, it may be necessary to train a custom NER model using your own annotated dataset. To train a custom NER model in spaCy, you need to follow a few steps: Prepare the training data: You need to create an annotated dataset containing examples of the named entities you want to recognize.

Textanalysisonline

textanalysisonline.com › spacy-named-entity-recognition-ner

spaCy Named Entity Recognizer (NER) - API & Demo | Text Analysis Online | TextAnalysis

Getting started with spaCy · Word Tokenize · Word Lemmatize · Pos Tagging · Sentence Segmentation · Noun Chunks Extraction · Named Entity Recognition · LanguageDetector · Language Detection Introduction · LangId Language Detection · Custom · Custom Service ·

Medium

medium.com › @mjghadge9007 › building-your-own-custom-named-entity-recognition-ner-model-with-spacy-v3-a-step-by-step-guide-15c7dcb1c416

Building Your Own Custom Named Entity Recognition (NER) Model with spaCy V3: A Step-by-Step Guide | by Mayur Ghadge | Medium

September 23, 2024 - In this blog post, I’ll take ... using spaCy v3. We’ll explore why custom NER is essential, how it outperforms ready-made NER libraries, and guide you through building your own NER model from annotated data. ... In the world of Natural Language Processing (NLP), extracting valuable information from text data is a fundamental task. Named Entity Recognition (NER) is the ...