🌐
spaCy
spacy.io › api › entityrecognizer
EntityRecognizer · spaCy API Documentation
A transition-based named entity recognition component. The entity recognizer identifies non-overlapping labelled spans of tokens.
🌐
spaCy
spacy.io › usage › linguistic-features
Linguistic Features · spaCy Usage Documentation
spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens. The default trained pipelines can identify a variety of named and numeric entities, including companies, locations, ...
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-named-entity-recognition-ner-using-spacy
Python | Named Entity Recognition (NER) using spaCy - GeeksforGeeks
July 12, 2025 - These "named entities" include proper nouns like people, organizations, locations and other meaningful categories such as dates, monetary values and products. By tagging these entities, we can transform raw text into structured data that can ...
🌐
Medium
medium.com › @sanskrutikhedkar09 › mastering-information-extraction-from-unstructured-text-a-deep-dive-into-named-entity-recognition-4aa2f664a453
Mastering Information Extraction from Unstructured Text: A Deep Dive into Named Entity Recognition with spaCy | by Sanskrutikhedkar | Medium
October 27, 2023 - Named Entity Recognition (NER): SpaCy can identify named entities in text, such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
🌐
spaCy
spacy.io › usage › spacy-101
spaCy 101: Everything you need to know · spaCy Usage Documentation
A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction.
🌐
Analytics Vidhya
analyticsvidhya.com › home › named entity recognition (ner) in python with spacy
Named Entity Recognition (NER) in Python with Spacy
May 1, 2025 - A. SpaCy NER (Named Entity Recognition) is a feature of the spaCy library used for natural language processing. It automatically identifies and categorizes named entities (e.g., persons, organizations, locations, dates) in text data.
🌐
Sematext
sematext.com › home › blog › entity extraction with spacy
Entity Extraction with spaCy
Yoast SEO for WordPress
Yoast SEO is the most complete WordPress SEO plugin. It handles the technical optimization of your site & assists with optimizing your content.
Price   US$69.00
🌐
Kaggle
kaggle.com › code › abhisarangan › ner-using-spacy
NER using Spacy
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
spaCy
spacy.io › universe › project › video-spacys-ner-model-alt
Named Entity Recognition (NER) using spaCy · spaCy Universe
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
Find elsewhere
Top answer
1 of 1
1

Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

  • The size of the training data;
  • the quality of the training data;
  • the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

Top answer
1 of 2
24

As per spacy documentation for Name Entity Recognition here is the way to extract name entity

import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Result
Name Entity: (China,)

To make "Alphabet" a 'Noun' append it with "The".

doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Name Entity: (Alphabet, China)

2 of 2
1

In Spacy version 3 the Transformers from Hugging Face are fine-tuned to the operations that Spacy provided in previous versions, but with better results.

Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on

Install the dependencies:

sudo apt install libncurses5
pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy-nightly # I'm using 3.0.0rc2

Download a model:

python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base

Here's a list of available models.

And then use it as you would normally do:

import spacy


text = 'Type something here which can be related to something, e.g Stack Over Flow organization'

nlp = spacy.load('en_core_web_trf')

document = nlp(text)

print(document.ents)

References:

Learn about Transformers and Attention.

Read a summary about the different Trasnformers architectures.

Learn about the Transformers fine-tune done by Spacy.

🌐
Medium
medium.com › ubiai-nlp › fine-tuning-spacy-models-customizing-named-entity-recognition-for-domain-specific-data-3d17c5fc72ae
Fine-Tuning SpaCy Models: Customizing Named Entity Recognition for Domain-Specific Data | by Wiem Souai | UBIAI NLP | Medium
February 6, 2024 - As an open-source library, SpaCy provides pre-trained models for essential tasks like part-of-speech tagging, named entity recognition, and dependency parsing. Its distinguishing features include exceptional speed and memory efficiency, enabling ...
🌐
Kaggle
kaggle.com › code › curiousprogrammer › entity-extraction-and-classification-using-spacy
Entity Extraction and Classification using SpaCy
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Dataiku
developer.dataiku.com › latest › tutorials › machine-learning › code-env-resources › spacy-resources › index.html
Load and re-use a spaCy named-entity recognition model - Dataiku Developer Guide
Named-entity recognition (NER) is concerned with locating and classifying named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations etc. The training of a NER model might be costly. Fortunately, you could rely on pre-trained models ...
🌐
Data Science Duniya
ashutoshtripathi.com › 2020 › 04 › 27 › named-entity-recognition-ner-using-spacy-nlp-part-4
Named Entity Recognition NER using spaCy | NLP | Part 4 – Data Science Duniya
November 16, 2021 - Named Entity Recognition NER works ... comes with an extremely fast statistical entity recognition system that assigns labels to contiguous spans of tokens....
🌐
Analytics Vidhya
analyticsvidhya.com › home › custom named entity recognition using spacy v3
Custom Named Entity Recognition using spaCy v3 - Analytics Vidhya
October 14, 2024 - In this article, you will learn to develop custom named entity recognition which helps to train our custom NER pipeline using spacy v3.
🌐
Machine Learning Plus
machinelearningplus.com › nlp › training-custom-ner-model-in-spacy
Training Custom NER models in SpaCy to auto-detect named entities [Complete Guide]
April 4, 2022 - Named-entity recognition (NER) is the process of automatically identifying the entities discussed in a text and classifying them into pre-defined categories. Categories could be entities like ‘person’, ‘organization’, ‘location’ ...
🌐
Dataknowsall
dataknowsall.com › blog › ner.html
An Accessible Guide to Named Entity Recognition
March 5, 2024 - Spacy has a wonderful ability to render NER tags in line with the text, a fantastic way to see what's being recognized in the context of the original article. NER models as they come trained are fantastic if you're a reporter covering Washington, DC.
🌐
spaCy
spacy.io › usage › training
Training Pipelines & Models · spaCy Usage Documentation
The weight values are estimated based on examples the model has seen during training. To train a model, you first need training data – examples of text, and the labels you want the model to predict. This could be a part-of-speech tag, a named entity or any other information.