Videos
As per spacy documentation for Name Entity Recognition here is the way to extract name entity
import spacy
nlp = spacy.load('en') # install 'en' model (python3 -m spacy download en)
doc = nlp("Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))
Result
Name Entity: (China,)
To make "Alphabet" a 'Noun' append it with "The".
doc = nlp("The Alphabet is a new startup in China")
print('Name Entity: {0}'.format(doc.ents))
Name Entity: (Alphabet, China)
In Spacy version 3 the Transformers from Hugging Face are fine-tuned to the operations that Spacy provided in previous versions, but with better results.
Transformers are currently (2020) the state-of-art in Natural Language Processing, i.e generally we had (one-hot-encode -> word2vec -> glove | fast text) then (recurrent neural network, recursive neural network, gated recurrent unit, long short-term memory, bi-directional long short-term memory, etc) and now Transformers + Attention (BERT, RoBERTa, XLNet, XLM, CTRL, AlBERT, T5, Bart, GPT, GPT-2, GPT-3) - This is just to give context for 'why' you should consider Transformers, I know that there are lots of stuff that I didn't mention like Fuzz, Knowledge Graph and so on
Install the dependencies:
sudo apt install libncurses5
pip install spacy-transformers --pre -f https://download.pytorch.org/whl/torch_stable.html
pip install spacy-nightly # I'm using 3.0.0rc2
Download a model:
python -m spacy download en_core_web_trf # English Transformer pipeline, Roberta base
Here's a list of available models.
And then use it as you would normally do:
import spacy
text = 'Type something here which can be related to something, e.g Stack Over Flow organization'
nlp = spacy.load('en_core_web_trf')
document = nlp(text)
print(document.ents)
References:
Learn about Transformers and Attention.
Read a summary about the different Trasnformers architectures.
Learn about the Transformers fine-tune done by Spacy.
The statistical pipeline components like ner provide their labels under .labels:
import spacy
nlp = spacy.load("en_core_web_sm")
nlp.get_pipe("ner").labels
This might not be the most general answer, but for en_core_web_sm this returns the named entity types.
model = spacy.load("en_core_web_sm")
list(model.__dict__['_meta']['accuracy']['ents_per_type'].keys())
['ORG', 'CARDINAL', 'DATE', 'GPE', 'PERSON', 'MONEY', 'PRODUCT', 'TIME', 'PERCENT', 'WORK_OF_ART', 'QUANTITY', 'NORP', 'LOC', 'EVENT', 'ORDINAL', 'FAC', 'LAW', 'LANGUAGE']
This is expected as Spacy is not prepared to deal with a dataframe as-is. You need to do some work before being able to print the entities. Start by identifying the column that contains the text you want to use nlp on. After that, extract its value as list, and now you're ready to go. Let's suppose the column name that contains the text is named Text.
for i in df['Question'].tolist():
doc = nlp(i)
for entity in doc.ents:
print((entity.text))
This will iterate over each text (row) for in your dataframe and print the entities.
You need to loop through the individual strings within your dataframe. The NLP parser and entity extraction is expecting a string.
For example:
for row in range(len(df)):
doc = nlp(df.loc[row, "text_column"])
for enitity in doc.ents:
print((entity.text))