Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

  • The size of the training data;
  • the quality of the training data;
  • the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

🌐
spaCy
spacy.io › api › entityrecognizer
EntityRecognizer · spaCy API Documentation
A transition-based named entity recognition component. The entity recognizer identifies non-overlapping labelled spans of tokens.
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-named-entity-recognition-ner-using-spacy
Python | Named Entity Recognition (NER) using spaCy - GeeksforGeeks
July 12, 2025 - These "named entities" include proper nouns like people, organizations, locations and other meaningful categories such as dates, monetary values and products. By tagging these entities, we can transform raw text into structured data that can ...
Discussions

python - Can a Named Entity Recognition (NER) spaCy model or any code like an entity ruler around it catch my new further date patterns also as DATE entities? - Stack Overflow
Anonymization of entities found by a NER model I try to anonymize files by means of a NER model for German text that sometimes may have a few English words. If I take spaCy NER models for German and More on stackoverflow.com
🌐 stackoverflow.com
Named Entity Recognition for Resume Parsing

I've had a good experience using spaCy, but I only use it for names, although, by default, it will also attempt to extract organizations and locations.

I use it in a non-English language, and it does require some rather extensive text pre-formatting, but it is very accurate (certainly more than 70%) - even when extracting names that are in other languages - and very fast too.

I'm sure you can fine tune it to extract other types of entities, and they even have a visual tool to assist with that that looks pretty awesome.

By comparison, I used BERT for a multi-label classification project; it takes way longer and was way more complicated to setup.

More on reddit.com
🌐 r/learnmachinelearning
2
1
July 15, 2021
[D] Named Entity Recognition (NER) Libraries
If spaCy’s NER isn’t picking up what you need, you’ll probably need to look into creating your own annotations and fine tuning a model or training a custom model. It isn’t too hard using BIO/BILOU tags. Things like “raw materials” and particularly niche models and brands are unlikely to be picked up by off the shelf solutions. More on reddit.com
🌐 r/MachineLearning
10
11
January 7, 2023
[D] Multilingual Named Entity Recognition
IMHO a generic "multilingual" model is not going to get you to good results, just as BERT-multilingual is weak for smallish languages. You need an "n-lingual" model focused on particular languages or a set of monolingual models. There's some research on "multilingual as in the 24 EU languages" models; I'm not certain about the current state but there's a bunch of older projects such as http://emm.newsbrief.eu or http://www.accurat-project.eu that had some decent practical results and published research on NER for these languages back then; probably there's also something newer. But you'd need to start with a clear understanding of which languages you care about, solutions that aim to work on all languages seem to be (IMHO) universally lousy because for languages with much less data than English (i.e. almost all languages) you need some language specific treatment to do well; and major NLP groups all have quite different needs. English is the wierd language that's practically different from most, romance+germanic languages need a bit different treatment than English; slavic+baltic+finnougric(+turkic?) languages where inflections matter need very different NLP treatment than English, as does Chinese, and I'm not even sure how other major Asian languages need to be treated but definitely not like English and not like Chinese, and then you have all the less resourced lanugages that, again, need very different NLP approaches than linguistically very similar languages with more data available. Good multilingual models for English+French+Spanish+German+Italian are going to be very different than EU-24 multilingual models and very different than English+Chinese+Spanish models which I've seen in some shared tasks, and very different from an all-wiki-languages models. Multilingual models "work" in the sense that you can get some results for every language. But they aren't good for any particular language (perhaps except English). More on reddit.com
🌐 r/LanguageTechnology
10
4
June 6, 2019
Top answer
1 of 1
1

Both of your questions can be answered in a similar way. Both the named entity recognition and part-of-speech tagging pipelines use machine learning models. Such models will make mistakes and these mistakes are hard for us to correct in general, because models are not a deterministic set of rules. The accuracy of a model depends on several factors, including:

  • The size of the training data;
  • the quality of the training data;
  • the size of the model.

Taking your named entity recognition example, the en_core_web_sm model is (as the name suggests) a small model. It uses a relatively small convolutional network, but also does not use static embeddings that are pretrained on a large corpus. Since the model is relatively limited, the model may have picked up patterns like: words that are capitalized are typically names, except if they occur at the beginning of the sentence (since all sentence-initial words are capitalized). This may be the reason that it fails to annotate your example correctly. However, if you use the en_core_web_lg model instead, you will see that that model will return the correct annotation:

('Dumbledore', 'PERSON')

You'd have to dive deeper to understand why it works in this case. But en_core_web_lg is a larger model that uses pretrained word embeddings. So, it may e.g. be the case that Dumbledore occurs in the set of word embeddings and the vector is similar to other names, allowing the model to extrapolate that since the vector of _ Dumbledore_ is similar to that of names it has seen in the training data that Dumbledore must also be a name.

Similar reasoning applies to your second question, the models are a trade-off between size, speed and accuracy. Also in this case en_core_web_lg does predicts 'VERB' consistently as the tag for finished.

So what does this mean in practice? First, models make mistakes. Second, if the error rate is not acceptable, you may want to look at larger models (such as md/lg/trf); or if you are working in a very specific domain, annotating more training data. Finally, do not underestimate the power of a set of rules. If you are working in a particular domain, say processing Harry Potter novels, you could get a lot of milage out of making a small set of rules to recognize names since it is a finite set (using e.g. the attribute ruler).

🌐
Towards Data Science
towardsdatascience.com › home › latest › custom named entity recognition with bert
Custom Named Entity Recognition with BERT | Towards Data Science
March 5, 2025 - is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary ...
🌐
Stack Overflow
stackoverflow.com › questions › 77700760 › can-a-named-entity-recognition-ner-spacy-model-or-any-code-like-an-entity-rule
python - Can a Named Entity Recognition (NER) spaCy model or any code like an entity ruler around it catch my new further date patterns also as DATE entities? - Stack Overflow
If I take spaCy NER models for German and English like de_core_news_sm and en_core_web_sm, they find town names or persons, and at least the English model finds "Dezember 2022", but it does not find the full date like "15. Dezember 2022". I cannot change the matches of the model. I thought I could take an entity ruler to change the NER model, but the NER model seems to be fixed, and I do not know how my own entity ruler can outweigh the spaCy NER model, and also, how I can get any entity ruler to work at all, even if I disable the NER model.
🌐
Dataknowsall
dataknowsall.com › blog › ner.html
An Accessible Guide to Named Entity Recognition
March 5, 2024 - Spacy has a wonderful ability to render NER tags in line with the text, a fantastic way to see what's being recognized in the context of the original article. NER models as they come trained are fantastic if you're a reporter covering Washington, DC.
🌐
Medium
medium.com › @sanskrutikhedkar09 › mastering-information-extraction-from-unstructured-text-a-deep-dive-into-named-entity-recognition-4aa2f664a453
Mastering Information Extraction from Unstructured Text: A Deep Dive into Named Entity Recognition with spaCy | by Sanskrutikhedkar | Medium
October 27, 2023 - Named Entity Recognition (NER): SpaCy can identify named entities in text, such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.
Find elsewhere
🌐
Sematext
sematext.com › home › blog › entity extraction with spacy
Entity Extraction with spaCy
Yoast SEO for WordPress
Yoast SEO is the most complete WordPress SEO plugin. It handles the technical optimization of your site & assists with optimizing your content.
Price   $69.00
🌐
spaCy
spacy.io
spaCy · Industrial-strength Natural Language Processing in Python
Components for named entity recognition, part-of-speech tagging, dependency parsing, sentence segmentation, text classification, lemmatization, morphological analysis, entity linking and more
🌐
CodeSignal
codesignal.com › learn › courses › linguistics-for-token-classification-in-spacy › lessons › unveiling-the-essentials-of-entity-recognition-with-spacy
Unveiling the Essentials of Entity Recognition with spaCy
As mentioned above, spaCy has a built-in Named Entity Recognition system that can recognize a wide variety of named or numerical entities. This comes as a part of spaCy's statistical models and not all the language models support it.
🌐
Analytics Vidhya
analyticsvidhya.com › home › named entity recognition (ner) in python with spacy
Named Entity Recognition (NER) in Python with Spacy
May 1, 2025 - A. SpaCy NER (Named Entity Recognition) is a feature of the spaCy library used for natural language processing. It automatically identifies and categorizes named entities (e.g., persons, organizations, locations, dates) in text data.
🌐
Python Humanities
ner.pythonhumanities.com › 03_02_train_spacy_ner_model.html
7. How to Train spaCy NER Model — Introduction to Named Entity Recognition
In the last notebook, we created a basic training set for a machine learning model using spaCy’s EntityRuler. We were able to do this by making certain presumptions about things that are very likely or certainly going to fall under a specific label. Such an approach to cultivating a training ...
🌐
FutureSmart AI
blog.futuresmart.ai › building-a-custom-ner-model-with-spacy-a-step-by-step-guide
Building a Custom NER Model with SpaCy: A Step-by-Step Guide
June 21, 2023 - This blog post will guide you through ... Named Entity Recognition (NER) is a subtask of natural language processing that focuses on identifying and classifying named entities within the text....
🌐
Kaggle
kaggle.com › code › abhisarangan › ner-using-spacy
NER using Spacy
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Prodigy
prodi.gy › docs › named-entity-recognition
Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP
Let’s say you want to train a model for financial news with labels for person names, organizations, monetary amounts and ticker symbols. This is a very achievable task for named entity recognition. spaCy’s English models already predict PERSON, ORG and MONEY, so you can correct its suggestions for these labels and add annotations for your new TICKER label.
🌐
spaCy
spacy.io › usage › spacy-101
spaCy 101: Everything you need to know · spaCy Usage Documentation
A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction.
🌐
spaCy
spacy.io › usage › linguistic-features
Linguistic Features · spaCy Usage Documentation
spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens. The default trained pipelines can identify a variety of named and numeric entities, including companies, locations, ...
🌐
Plain English
python.plainenglish.io › optimize-your-spacy-ner-results-with-this-simple-change-e59937c411ab
Optimize Your SpaCy NER Results with This Simple Change | by Pranjal Saxena | Python in Plain English
July 6, 2023 - The Spacy Transformer model is a powerful tool for NLP tasks, especially Named Entity Recognition. It outperforms the Spacy Large model in terms of accuracy and performance.
🌐
Analytics Vidhya
analyticsvidhya.com › home › custom named entity recognition using spacy v3
Custom Named Entity Recognition using spaCy v3 - Analytics Vidhya
October 14, 2024 - In this article, you will learn to develop custom named entity recognition which helps to train our custom NER pipeline using spacy v3.