🌐
Medium
medium.com › @sanskrutikhedkar09 › mastering-information-extraction-from-unstructured-text-a-deep-dive-into-named-entity-recognition-4aa2f664a453
Mastering Information Extraction from Unstructured Text: A Deep Dive into Named Entity Recognition with spaCy | by Sanskrutikhedkar | Medium
October 27, 2023 - SpaCy, our trusty companion in this adventure, offers a plethora of pre-trained models tailored for diverse language processing tasks. Among them are the agile ‘en_core_web_sm,’ the balanced ‘en_core_web_md,’ and the robust ‘en_core_web_lg,’ each catering to specific needs and preferences. These models, with their components like tokenization rules, Part-of-Speech Tagging, Named Entity Recognition, and more, form the bedrock of our exploration into unstructured data.
🌐
spaCy
spacy.io › usage › linguistic-features
Linguistic Features · spaCy Usage Documentation
entities labeled as MONEY, and then uses the dependency parse to find the noun phrase they are referring to – for example "Net income" → "$9.4 million". ... For more examples of how to write rule-based information extraction logic that takes advantage of the model’s predictions produced by the different components, see the usage guide on combining models and rules.
🌐
spaCy
spacy.io › universe › project › video-spacys-ner-model-alt
Named Entity Recognition (NER) using spaCy · spaCy Universe
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
🌐
Kaggle
kaggle.com › code › curiousprogrammer › entity-extraction-and-classification-using-spacy
Entity Extraction and Classification using SpaCy
Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds
🌐
Quanteda
spacyr.quanteda.io › reference › spacy_extract_entity.html
Extract named entities from texts using spaCy — spacy_extract_entity • spacyr
This function extracts named entities from texts, based on the entity tag ent attributes of documents objects parsed by spaCy (see https://spacy.io/usage/linguistic-features#section-named-entities).
🌐
spaCy
spacy.io › api › entityrecognizer
EntityRecognizer · spaCy API Documentation
A transition-based named entity recognition component. The entity recognizer identifies non-overlapping labelled spans of tokens.
🌐
GeeksforGeeks
geeksforgeeks.org › python › python-named-entity-recognition-ner-using-spacy
Python | Named Entity Recognition (NER) using spaCy - GeeksforGeeks
July 12, 2025 - By tagging these entities, we can transform raw text into structured data that can be analyzed, indexed or used in applications. ... Optimized performance: spaCy is built for high-speed text processing making it ideal for large-scale NLP tasks.
🌐
spaCy
spacy.io › usage › spacy-101
spaCy 101: Everything you need to know · spaCy Usage Documentation
A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction.
🌐
Sematext
sematext.com › home › blog › entity extraction with spacy
Entity Extraction with spaCy
Yoast SEO for WordPress
Yoast SEO is the most complete WordPress SEO plugin. It handles the technical optimization of your site & assists with optimizing your content.
Price   $69.00
Find elsewhere
🌐
CRAN
cran.r-project.org › web › packages › spacyr › vignettes › using_spacyr.html
A Guide to Using spacyr
If a user’s only goal is entity or noun phrase extraction, then two functions make this easy without first parsing the entire text: spacy_extract_entity(txt) ## doc_id text ent_type start_id length ## 1 d2 Smith PERSON 2 1 ## 2 d2 two years DATE 4 2 ## 3 d2 North Carolina GPE 7 2 spacy_extract_nounphrases(txt) ## doc_id text root_text start_id root_id length ## 1 d1 fast natural language processing processing 5 8 4 ## 2 d2 Mr.
🌐
Textanalysisonline
textanalysisonline.com › spacy-named-entity-recognition-ner
spaCy Named Entity Recognizer (NER) - API & Demo | Text Analysis Online | TextAnalysis
Getting started with spaCy · Word Tokenize · Word Lemmatize · Pos Tagging · Sentence Segmentation · Noun Chunks Extraction · Named Entity Recognition · LanguageDetector · Language Detection Introduction · LangId Language Detection · Custom · Custom Service ·
🌐
spaCy
spacy.io
spaCy · Industrial-strength Natural Language Processing in Python
spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using.
🌐
Stack Overflow
stackoverflow.com › questions › 60621365 › spacy-extract-named-entity-relations-from-trained-model
python - Spacy Extract named entity relations from trained model - Stack Overflow
My goal is to extract the number of cases of a given disease/virus from a news article, and then later also the number of deaths. I now use this newly created model trying to find the dependencies between CASES and CARDINAL: ... import plac import spacy TEXTS = [ "Net income was $9.4 million compared to the prior year of $2.7 million.
Top answer
1 of 3
9

The issue with models accuracy

The problem with all models is that they don't have 100% accuracy and even using a bigger model doesn't help to recognize dates. Here are the accuracy values (F-score, precision, recall) for NER models--they are all around 86%.

document_string = """ 
Electronically signed : Wes Scott, M.D.; Jun 26 2010 11:10AM CST 
 The patient was referred by Dr. Jacob Austin.   
Electronically signed by Robert Clowson, M.D.; Janury 15 2015 11:13AM CST 
Electronically signed by Dr. John Douglas, M.D.; Jun 16 2017 11:13AM CST 
The patient was referred by 
Dr. Jayden Green Olivia.   
"""  

With small model two date items are labelled as 'PERSON':

import spacy                                                                                                                            

nlp = spacy.load('en')                                                                                                                  
sents = nlp(document_string) 
 [ee for ee in sents.ents if ee.label_ == 'PERSON']                                                                                      
# Out:
# [Wes Scott,
#  Jun 26,
#  Jacob Austin,
#  Robert Clowson,
#  John Douglas,
#  Jun 16 2017,
#  Jayden Green Olivia]

With a larger model en_core_web_md the results are even worse in terms of precision, as there are three misclassified entities.

nlp = spacy.load('en_core_web_md')                                                                                                                  
sents = nlp(document_string) 
# Out:
#[Wes Scott,
# Jun 26,
# Jacob Austin,
# Robert Clowson,
# Janury,
# John Douglas,
# Jun 16 2017,
# Jayden Green Olivia]

I also tried other models (xx_ent_wiki_sm, en_core_web_md) and they don't bring any improvement as well.

What about using rules to improve accuracy?

In the small example not only the document seems to have a clear structure, but the misclassified entities are all dates. So why not combine the initial model with a rule-based component?

The good news is that in Spacy:

it's possible can combine statistical and rule-based components in a variety of ways. Rule-based components can be used to improve the accuracy of statistical models

(from https://spacy.io/usage/rule-based-matching#models-rules)

So, by following the example and using the dateparser library (a parser for human readable dates) I've put together a rule-based component that works very well on this example:

from spacy.tokens import Span
import dateparser

def expand_person_entities(doc):
    new_ents = []
    for ent in doc.ents:
        # Only check for title if it's a person and not the first token
        if ent.label_ == "PERSON":
            if ent.start != 0:
                # if person preceded by title, include title in entity
                prev_token = doc[ent.start - 1]
                if prev_token.text in ("Dr", "Dr.", "Mr", "Mr.", "Ms", "Ms."):
                    new_ent = Span(doc, ent.start - 1, ent.end, label=ent.label)
                    new_ents.append(new_ent)
                else:
                    # if entity can be parsed as a date, it's not a person
                    if dateparser.parse(ent.text) is None:
                        new_ents.append(ent) 
        else:
            new_ents.append(ent)
    doc.ents = new_ents
    return doc

# Add the component after the named entity recognizer
# nlp.remove_pipe('expand_person_entities')
nlp.add_pipe(expand_person_entities, after='ner')

doc = nlp(document_string)
[(ent.text, ent.label_) for ent in doc.ents if ent.label_=='PERSON']
# Out:
# [(‘Wes Scott', 'PERSON'),
#  ('Dr. Jacob Austin', 'PERSON'),
#  ('Robert Clowson', 'PERSON'),
#  ('Dr. John Douglas', 'PERSON'),
#  ('Dr. Jayden Green Olivia', 'PERSON')]
2 of 3
1

Try this:

import spacy
en = spacy.load('en')

sents = en(open('input.txt').read())
people = [ee for ee in sents.ents if ee.label_ == 'PERSON']
🌐
Analytics Vidhya
analyticsvidhya.com › home › named entity recognition (ner) in python with spacy
Named Entity Recognition (NER) in Python with Spacy
May 1, 2025 - It automatically identifies and categorizes named entities (e.g., persons, organizations, locations, dates) in text data. spaCy NER is valuable for information extraction, entity recognition in documents, and improving the understanding of text ...
🌐
Medium
manivannan-ai.medium.com › spacy-named-entity-recognizer-4a1eeee1d749
spaCy Named Entity Recognizer. How to extract the entity from text… | by Manivannan Murugavel | Medium
March 29, 2019 - The library is published under the MIT license and currently offers statistical neural network models for English, German, Spanish, Portuguese, French, Italian, Dutch and multi-language NER, as well as tokenization for various other languages. spaCy v2.0 features new neural models for tagging, parsing and entity recognition.