Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

  • The size of the training data;
  • the quality of the training data;
  • the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

🌐
spaCy
spacy.io › usage › spacy-101
spaCy 101: Everything you need to know · spaCy Usage Documentation
Using spaCy’s built-in displaCy visualizer, here’s what our example sentence and its named entities look like: ... U.K. GPE startup for ... To learn more about entity recognition in spaCy, how to add your own entities to a document and how to train and update the entity predictions of a model, see the usage guides on named entity recognition and training pipelines.
🌐
spaCy
spacy.io › api › entityrecognizer
EntityRecognizer · spaCy API Documentation
A transition-based named entity recognition component. The entity recognizer identifies non-overlapping labelled spans of tokens.
Top answer
1 of 1
1

Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

  • The size of the training data;
  • the quality of the training data;
  • the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

🌐
spaCy
spacy.io › usage › linguistic-features
Linguistic Features · spaCy Usage Documentation
A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction.
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-named-entity-recognition-ner-using-spacy
Python | Named Entity Recognition (NER) using spaCy - GeeksforGeeks
July 12, 2025 - Efficient pipeline processing: ... tagging, dependency parsing and named entity recognition. Customizability: We can train custom models or manually defining new entities. Here is the step by step procedure to do NER using spaCy:...
🌐
Analytics Vidhya
analyticsvidhya.com › home › named entity recognition (ner) in python with spacy
Named Entity Recognition (NER) in Python with Spacy
May 1, 2025 - ... A named entity is basically ... object, or geographic entity. For example, named entities would be Roger Federer, Honda city, Samsung Galaxy S10....
🌐
spaCy
spacy.io › universe › project › video-spacys-ner-model-alt
Named Entity Recognition (NER) using spaCy · spaCy Universe
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
🌐
spaCy
spacy.io › usage › training
Training Pipelines & Models · spaCy Usage Documentation
The function Example.from_dict takes a dictionary with keyword arguments specifying the annotations, like tags or entities. Using the resulting Example object and its gold-standard annotations, the model can be updated to learn a sentence of three words with their assigned part-of-speech tags. Here’s another example that shows how to define gold-standard named entities.
Find elsewhere
🌐
Kaggle
kaggle.com › code › abhisarangan › ner-using-spacy
NER using Spacy
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Medium
medium.com › ubiai-nlp › fine-tuning-spacy-models-customizing-named-entity-recognition-for-domain-specific-data-3d17c5fc72ae
Fine-Tuning SpaCy Models: Customizing Named Entity Recognition for Domain-Specific Data | by Wiem Souai | UBIAI NLP | Medium
February 6, 2024 - As an open-source library, SpaCy provides pre-trained models for essential tasks like part-of-speech tagging, named entity recognition, and dependency parsing. Its distinguishing features include exceptional speed and memory efficiency, enabling it to handle substantial volumes of text in real-time effectively.
Top answer
1 of 2
24

As per spacy documentation for Name Entity Recognition here is the way to extract name entity

import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Result
Name Entity: (China,)

To make "Alphabet" a 'Noun' append it with "The".

doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))

Name Entity: (Alphabet, China)

2 of 2
1

In Spacy version 3 the Transformers from Hugging Face are fine-tuned to the operations that Spacy provided in previous versions, but with better results.

Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on

Install the dependencies:

sudo apt install libncurses5
pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy-nightly # I'm using 3.0.0rc2

Download a model:

python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base

Here's a list of available models.

And then use it as you would normally do:

import spacy


text = 'Type something here which can be related to something, e.g Stack Over Flow organization'

nlp = spacy.load('en_core_web_trf')

document = nlp(text)

print(document.ents)

References:

Learn about Transformers and Attention.

Read a summary about the different Trasnformers architectures.

Learn about the Transformers fine-tune done by Spacy.

🌐
Towards Data Science
towardsdatascience.com › home › latest › named entity recognition with spacy and the mighty roberta
Named Entity Recognition with Spacy and the Mighty roBERTa | Towards Data Science
March 5, 2025 - An opensource library for ... spaCy successfully identified CNN as an Organisation (ORG), Amy Schneider as a PERSON, Oakland, and California as Geo-Political Entity (GEP), etc....
🌐
Medium
medium.com › @sanskrutikhedkar09 › mastering-information-extraction-from-unstructured-text-a-deep-dive-into-named-entity-recognition-4aa2f664a453
Mastering Information Extraction from Unstructured Text: A Deep Dive into Named Entity Recognition with spaCy | by Sanskrutikhedkar | Medium
October 27, 2023 - Named Entity Recognition (NER): SpaCy can identify named entities in text, such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
🌐
Python Humanities
ner.pythonhumanities.com › 01_02_introduction_to_spacy.html
2. Introduction to spaCy — Introduction to Named Entity Recognition
For our purposes right now, I simply want to print off each entity’s text (the string itself) and its corresponding label (note the _ after label). I will be explaining this process in much greater detail in the next two notebooks. ... As we can see the small spaCy statistical machine learning model has correctly identified that Martin J.
🌐
Medium
medium.com › analytics-vidhya › named-entity-recognition-with-spacy-2ecfa4114162
Named Entity Recognition (NER) with spaCy | by Sanidhya Singh | Analytics Vidhya | Medium
May 2, 2022 - spaCy supports the following entity types for models trained on the OntoNotes 5. ... Let’s take a look at an example, we are loading the “en_core_web_lg” model for NER. The model is English multi-task CNN trained on OntoNotes, with GloVe ...
🌐
CodeSignal
codesignal.com › learn › courses › linguistics-for-token-classification-in-spacy › lessons › unveiling-the-essentials-of-entity-recognition-with-spacy
Unveiling the Essentials of Entity Recognition with spaCy
However, the model we are using, en_core_web_sm, supports Named Entity Recognition. When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. Doc is then processed in several different steps – this is also known as the processing pipeline.
🌐
Medium
medium.com › mlearning-ai › named-entity-recognition-with-spacy-fd834ff84b86
Named Entity Recognition with spaCy | by FS Ndzomga | MLearning.ai | Medium
March 5, 2023 - The model also includes additional entity types, such as product names, languages, and nationalities. While spaCy’s pre-trained NER model is quite powerful, it may not always be sufficient for certain tasks or domains. In such cases, it may be necessary to train a custom NER model using your own annotated dataset. To train a custom NER model in spaCy, you need to follow a few steps: Prepare the training data: You need to create an annotated dataset containing examples of the named entities you want to recognize.
🌐
Textanalysisonline
textanalysisonline.com › spacy-named-entity-recognition-ner
spaCy Named Entity Recognizer (NER) - API & Demo | Text Analysis Online | TextAnalysis
Getting started with spaCy · Word Tokenize · Word Lemmatize · Pos Tagging · Sentence Segmentation · Noun Chunks Extraction · Named Entity Recognition · LanguageDetector · Language Detection Introduction · LangId Language Detection · Custom · Custom Service ·
🌐
Medium
medium.com › @mjghadge9007 › building-your-own-custom-named-entity-recognition-ner-model-with-spacy-v3-a-step-by-step-guide-15c7dcb1c416
Building Your Own Custom Named Entity Recognition (NER) Model with spaCy V3: A Step-by-Step Guide | by Mayur Ghadge | Medium
September 23, 2024 - In this blog post, I’ll take ... using spaCy v3. We’ll explore why custom NER is essential, how it outperforms ready-made NER libraries, and guide you through building your own NER model from annotated data. ... In the world of Natural Language Processing (NLP), extracting valuable information from text data is a fundamental task. Named Entity Recognition (NER) is the ...