spacy entity extraction example

entities labeled as MONEY, and then uses the dependency parse to find the noun phrase they are referring to – for example "Net income" → "$9.4 million". ... For more examples of how to write rule-based information extraction logic that takes ...

Medium

medium.com › @sanskrutikhedkar09 › mastering-information-extraction-from-unstructured-text-a-deep-dive-into-named-entity-recognition-4aa2f664a453

Mastering Information Extraction from Unstructured Text: A Deep Dive into Named Entity Recognition with spaCy | by Sanskrutikhedkar | Medium

October 27, 2023 - Named Entity Recognition (NER): SpaCy can identify named entities in text, such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

Videos

02:54

YouTube

How to Extract NER (Named Entity Recognition) Using Spacy - YouTube

Clinical Named Entity Recognition in Python with Spacy - YouTube

February 23, 2022

23:51

YouTube

Custom NER with spaCy v3 Tutorial | Free NER Data Annotation | ...

December 30, 2021

youtube.com

Entity Recognition Extract information from Job posting using ...

15:40

YouTube

How to Train a spaCy NER model (Named Entity Recognition for DH ...

December 4, 2020

View all

spaCy

spacy.io › usage › spacy-101

spaCy 101: Everything you need to know · spaCy Usage Documentation

Whenever possible, spaCy tries ... also encodes all strings to hash values – in this case for example, “coffee” has the hash 3197928453018144401....

Sematext

sematext.com › home › blog › entity extraction with spacy

Entity Extraction with spaCy

Yoast SEO for WordPress

Yoast SEO is the most complete WordPress SEO plugin. It handles the technical optimization of your site & assists with optimizing your content.

Price $69.00

Kaggle

kaggle.com › code › curiousprogrammer › entity-extraction-and-classification-using-spacy

Entity Extraction and Classification using SpaCy

Checking your browser before accessing www.kaggle.com · Click here if you are not automatically redirected after 5 seconds

Robertorocha

robertorocha.info › how-to-extract-entities-from-raw-text-with-spacy-3-approaches-using-canadian-data

How to extract entities from raw text with Spacy: 3 approaches using Canadian data – Roberto Rocha

TL;DR: Use the en_core_web_trf transformer model with Spacy to get much more accurate named entity recognition with multilingual text.

spaCy

spacy.io › api › entityrecognizer

EntityRecognizer · spaCy API Documentation

The transition-based algorithm used encodes certain assumptions that are effective for “traditional” named entity recognition tasks, but may not be a good fit for every span identification problem. Specifically, the loss function optimizes for whole entity accuracy, so if your inter-annotator ...

spaCy

spacy.io › universe › project › video-spacys-ner-model-alt

Named Entity Recognition (NER) using spaCy · spaCy Universe

spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.

Analytics Vidhya

analyticsvidhya.com › home › named entity recognition (ner) in python with spacy

Named Entity Recognition (NER) in Python with Spacy

May 1, 2025 - The Indian Space Research Organisation ORG the national space agency ORG India GPE Bengaluru GPE Department of Space ORG India GPE ISRO ORG DOS ORG · So, now we can see that all the Named Entities in this particular text are extracted.

Find elsewhere

Google Bing Mojeek

Quanteda

spacyr.quanteda.io › reference › spacy_extract_entity.html

Extract named entities from texts using spaCy — spacy_extract_entity • spacyr

spacy_extract_entity( x, output = c("data.frame", "list"), type = c("all", "named", "extended"), multithread = TRUE, ... ) x · a character object or a TIF-compliant corpus data.frame (see https://github.com/ropenscilabs/tif) output · type of returned object, either "list" or "data.frame".

GeeksforGeeks

geeksforgeeks.org › python › python-named-entity-recognition-ner-using-spacy

Python | Named Entity Recognition (NER) using spaCy - GeeksforGeeks

July 12, 2025 - Customizability: We can train custom models or manually defining new entities. Here is the step by step procedure to do NER using spaCy:

Towards Data Science

towardsdatascience.com › home › latest › named entity recognition with spacy and the mighty roberta

Named Entity Recognition with Spacy and the Mighty roBERTa | Towards Data Science

March 5, 2025 - The first function _print_entities_ is used to perform the named entity extraction from a given text and pipeline (traditional spaCy or spaCy transformer in our case).

Stack Overflow

stackoverflow.com › questions › 48200524 › named-entity-recognition-in-spacy

python - Named entity recognition in Spacy - Stack Overflow

The issue with models accuracy

The problem with all models is that they don't have 100% accuracy and even using a bigger model doesn't help to recognize dates. Here are the accuracy values (F-score, precision, recall) for NER models--they are all around 86%.

document_string = """ 
Electronically signed : Wes Scott, M.D.; Jun 26 2010 11:10AM CST 
 The patient was referred by Dr. Jacob Austin.   
Electronically signed by Robert Clowson, M.D.; Janury 15 2015 11:13AM CST 
Electronically signed by Dr. John Douglas, M.D.; Jun 16 2017 11:13AM CST 
The patient was referred by 
Dr. Jayden Green Olivia.   
"""

With small model two date items are labelled as 'PERSON':

import spacy                                                                                                                            

nlp = spacy.load('en')                                                                                                                  
sents = nlp(document_string) 
 [ee for ee in sents.ents if ee.label_ == 'PERSON']                                                                                      
# Out:
# [Wes Scott,
#  Jun 26,
#  Jacob Austin,
#  Robert Clowson,
#  John Douglas,
#  Jun 16 2017,
#  Jayden Green Olivia]

With a larger model en_core_web_md the results are even worse in terms of precision, as there are three misclassified entities.

nlp = spacy.load('en_core_web_md')                                                                                                                  
sents = nlp(document_string) 
# Out:
#[Wes Scott,
# Jun 26,
# Jacob Austin,
# Robert Clowson,
# Janury,
# John Douglas,
# Jun 16 2017,
# Jayden Green Olivia]

I also tried other models (xx_ent_wiki_sm, en_core_web_md) and they don't bring any improvement as well.

What about using rules to improve accuracy?

In the small example not only the document seems to have a clear structure, but the misclassified entities are all dates. So why not combine the initial model with a rule-based component?

The good news is that in Spacy:

it's possible can combine statistical and rule-based components in a variety of ways. Rule-based components can be used to improve the accuracy of statistical models

(from https://spacy.io/usage/rule-based-matching#models-rules)

So, by following the example and using the dateparser library (a parser for human readable dates) I've put together a rule-based component that works very well on this example:

from spacy.tokens import Span
import dateparser

def expand_person_entities(doc):
    new_ents = []
    for ent in doc.ents:
        # Only check for title if it's a person and not the first token
        if ent.label_ == "PERSON":
            if ent.start != 0:
                # if person preceded by title, include title in entity
                prev_token = doc[ent.start - 1]
                if prev_token.text in ("Dr", "Dr.", "Mr", "Mr.", "Ms", "Ms."):
                    new_ent = Span(doc, ent.start - 1, ent.end, label=ent.label)
                    new_ents.append(new_ent)
                else:
                    # if entity can be parsed as a date, it's not a person
                    if dateparser.parse(ent.text) is None:
                        new_ents.append(ent) 
        else:
            new_ents.append(ent)
    doc.ents = new_ents
    return doc

# Add the component after the named entity recognizer
# nlp.remove_pipe('expand_person_entities')
nlp.add_pipe(expand_person_entities, after='ner')

doc = nlp(document_string)
[(ent.text, ent.label_) for ent in doc.ents if ent.label_=='PERSON']
# Out:
# [(‘Wes Scott', 'PERSON'),
#  ('Dr. Jacob Austin', 'PERSON'),
#  ('Robert Clowson', 'PERSON'),
#  ('Dr. John Douglas', 'PERSON'),
#  ('Dr. Jayden Green Olivia', 'PERSON')]

2 of 3

Try this:

import spacy
en = spacy.load('en')

sents = en(open('input.txt').read())
people = [ee for ee in sents.ents if ee.label_ == 'PERSON']

GitHub

github.com › explosion › spaCy › discussions › 12451

Entity Recognition from Search Queries · explosion/spaCy · Discussion #12451

Author explosion