The NER label scheme varies by language, and depends heavily on what kind of training data was available. You need to check the "Label Scheme" entry on the model page, which should have an NER section. For example, here's Japanese.
Videos
In spaCy 3 do nlp.get_pipe('ner').labels.
You can do the following :
nlp.entity.labels
It outputs the following tuple :
('CARDINAL',
'DATE',
'EVENT',
'FAC',
'GPE',
'LANGUAGE',
'LAW',
'LOC',
'MONEY',
'NORP',
'ORDINAL',
'ORG',
'PERCENT',
'PERSON',
'PRODUCT',
'QUANTITY',
'TIME',
'WORK_OF_ART')
Most labels have definitions you can access using spacy.explain(label).
For NORP: "Nationalities or religious or political groups"
For more details you would need to look into the annotation guidelines for the resources listed in the model documentation under https://spacy.io/models/.
The whole list is as below. As of February 2023, there are 18 labels in the English model.
PERSON: People, including fictional.
NORP: Nationalities or religious or political groups.
FAC: Buildings, airports, highways, bridges, etc.
ORG: Companies, agencies, institutions, etc.
GPE: Countries, cities, states.
LOC: Non-GPE locations, mountain ranges, bodies of water.
PRODUCT: Objects, vehicles, foods, etc. (Not services.)
EVENT: Named hurricanes, battles, wars, sports events, etc.
WORK_OF_ART: Titles of books, songs, etc.
LAW: Named documents made into laws.
LANGUAGE: Any named language.
DATE: Absolute or relative dates or periods.
TIME: Times smaller than a day.
PERCENT: Percentage, including ”%“.
MONEY: Monetary values, including unit.
QUANTITY: Measurements, as of weight or distance.
ORDINAL: “first”, “second”, etc.
CARDINAL: Numerals that do not fall under another type.
Source: Mikael Davidsson on Medium.
the entity label is an attribute of the token (see here)
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')
s = "His friend Nicolas is here."
doc = nlp(s)
print([t.text if not t.ent_type_ else t.ent_type_ for t in doc])
# ['His', 'friend', 'PERSON', 'is', 'here', '.']
print(" ".join([t.text if not t.ent_type_ else t.ent_type_ for t in doc]) )
# His friend PERSON is here .
Edit:
In order to handle cases were entities can span several words the following code can be used instead:
s = "His friend Nicolas J. Smith is here with Bart Simpon and Fred."
doc = nlp(s)
newString = s
for e in reversed(doc.ents): #reversed to not modify the offsets of other entities when substituting
start = e.start_char
end = start + len(e.text)
newString = newString[:start] + e.label_ + newString[end:]
print(newString)
#His friend PERSON is here with PERSON and PERSON.
Update:
Jinhua Wang brought to my attention that there is now a more built-in and simpler way to do this using the merge_entities pipe. See Jinhua's answer below.
A more elegant modification to @DBaker's solution above when entities can span several words:
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_lg')
nlp.add_pipe("merge_entities")
s = "His friend Nicolas J. Smith is here with Bart Simpon and Fred."
doc = nlp(s)
print([t.text if not t.ent_type_ else t.ent_type_ for t in doc])
# ['His', 'friend', 'PERSON', 'is', 'here', 'with', 'PERSON', 'and', 'PERSON', '.']
print(" ".join([t.text if not t.ent_type_ else t.ent_type_ for t in doc]) )
# His friend PERSON is here with PERSON and PERSON .
You can check the documentation on Spacy here. It uses the built in Pipeline for the job and has good support for multiprocessing. I believe this is the officially supported way to replace entities by their tags.