Are you committed to using NLTK/Python? I ran into the same problems as you, and had much better results using Stanford's named-entity recognizer: http://nlp.stanford.edu/software/CRF-NER.shtml. The process for training the classifier using your own data is very well-documented in the FAQ.

If you really need to use NLTK, I'd hit up the mailing list for some advice from other users: http://groups.google.com/group/nltk-users.

Hope this helps!

Answer from jjdubs on Stack Overflow
🌐
Python Programming
pythonprogramming.net › named-entity-recognition-nltk-tutorial
Named Entity Recognition with NLTK
There are two major options with NLTK's named entity recognition: either recognize all named entities, or recognize named entities as their respective type, like people, places, locations, etc. ... import nltk from nltk.corpus import state_union from nltk.tokenize import PunktSentenceTokenizer ...
🌐
Artiba
artiba.org › blog › named-entity-recognition-in-nltk-a-practical-guide
Named Entity Recognition in NLTK: A Practical Guide | Artificial Intelligence
Learn how to perform Named Entity Recognition with NLTK, including setup, code examples, and key advantages and limitations in natural language processing
🌐
MLK
machinelearningknowledge.ai › home › beginner’s guide to named entity recognition (ner) in nltk library
Beginner's Guide to Named Entity Recognition (NER) in NLTK Library - MLK - Machine Learning Knowledge
June 2, 2021 - In the output, we can see that the classifier has added category labels such as PERSON, ORGANIZATION, and GPE (geographical physical location) where ever it founded named entity. ... import nltk from nltk import word_tokenize,pos_tag text = "NASA awarded Elon Musk’s SpaceX a $2.9 billion contract to build the lunar lander." tokens = word_tokenize(text) tag=pos_tag(tokens) print(tag) ne_tree = nltk.ne_chunk(tag) print(ne_tree)
🌐
Medium
fouadroumieh.medium.com › nlp-entity-extraction-ner-using-python-nltk-68649e65e54b
NLP Entity Extraction/NER using python NLTK | by Fouad Roumieh | Medium
October 13, 2023 - Now, we have the tagged_tokens that contain both the tokens and their respective part-of-speech tags, which is a crucial preprocessing step for named entity recognition, let’s see the final step which is the actual Entity Extraction. To extract the entities all we need is to call “ne_chunk” to chunk the given list of tagged tokens: entities = nltk.ne_chunk(tagged_tokens) print(entities)
🌐
Spot Intelligence
spotintelligence.com › home › how to implement named entity recognition in python with spacy, bert, nltk & flair
How To Implement Named Entity Recognition In Python With SpaCy, BERT, NLTK & Flair
December 26, 2023 - To use the named entity recognition (NER) functionality in the NLTK library, you will need to have NLTK installed on your machine. You can install NLTK using pip, the Python package manager, ...
🌐
NLTK
nltk.org › howto › relextract.html
NLTK :: Sample usage for relextract
For example, assuming that we can recognize ORGANIZATIONs and LOCATIONs in text, we might want to also recognize pairs (o, l) of these kinds of entities such that o is located in l. The sem.relextract module provides some tools to help carry out a simple version of this task. The tree2semi_rel() function splits a chunk document into a list of two-member lists, each of which consists of a (possibly empty) string followed by a Tree (i.e., a Named Entity):
Find elsewhere
🌐
Nanonets
nanonets.com › blog › named-entity-recognition-with-nltk-and-spacy
A complete guide to Named Entity Recognition (NER) in 2025
January 20, 2025 - In this section, we’ll be using ... ... nltk is a leading python-based library for performing NLP tasks such as preprocessing text data, modelling data, parts of speech tagging, evaluating models and more....
🌐
Wellsr
wellsr.com › python › python-named-entity-recognition-with-nltk-and-spacy
Python Named Entity Recognition with NLTK & spaCy - wellsr.com
August 14, 2020 - To download and install all the modules and objects required to support the NLTK library, you’ll need to run the following command inside your Python application: ... MaxEnt Chunker which stands for maximum entropy chunker. To call the maximum entropy chunker for named entity recognition, you need to pass the parts of speech (POS) tags of a text to the ne_chunk() function of the NLTK library.
🌐
NLTK
nltk.org › book › ch07.html
7. Extracting Information from Text
Named entity recognition is a task that is well-suited to the type of classifier-based approach that we saw for noun phrase chunking. In particular, we can build a tagger that labels each word in a sentence using the IOB format, where chunks are labeled by their appropriate type.
Top answer
1 of 7
36

nltk.ne_chunk returns a nested nltk.tree.Tree object so you would have to traverse the Tree object to get to the NEs.

Take a look at Named Entity Recognition with Regular Expression: NLTK

>>> from nltk import ne_chunk, pos_tag, word_tokenize
>>> from nltk.tree import Tree
>>> 
>>> def get_continuous_chunks(text):
...     chunked = ne_chunk(pos_tag(word_tokenize(text)))
...     continuous_chunk = []
...     current_chunk = []
...     for i in chunked:
...             if type(i) == Tree:
...                     current_chunk.append(" ".join([token for token, pos in i.leaves()]))
...             if current_chunk:
...                     named_entity = " ".join(current_chunk)
...                     if named_entity not in continuous_chunk:
...                             continuous_chunk.append(named_entity)
...                             current_chunk = []
...             else:
...                     continue
...     return continuous_chunk
... 
>>> my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."
>>> get_continuous_chunks(my_sent)
['WASHINGTON', 'New York', 'Loretta E. Lynch', 'Brooklyn']


>>> my_sent = "How's the weather in New York and Brooklyn"
>>> get_continuous_chunks(my_sent)
['New York', 'Brooklyn']
2 of 7
22

You can also extract the label of each Name Entity in the text using this code:

import nltk
for sent in nltk.sent_tokenize(sentence):
   for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
      if hasattr(chunk, 'label'):
         print(chunk.label(), ' '.join(c[0] for c in chunk))

Output:

GPE WASHINGTON
GPE New York
PERSON Loretta E. Lynch
GPE Brooklyn

You can see Washington, New York and Brooklyn are GPE means geo-political entities

and Loretta E. Lynch is a PERSON

🌐
Analytics India Magazine
analyticsindiamag.com › aim › deep tech › guide to named entity recognition with spacy and nltk
Guide to Named Entity Recognition with spaCy and NLTK | AIM
July 15, 2024 - In NLP data preprocessing tagging of data takes a very crucial part. here we is a type of tagging which we calls Named Entity Recognition helps in providing tag to Named entity.
🌐
YouTube
youtube.com › watch
Custom Named Entity Recognition using Python - YouTube
#machinelearning #naturallanguageprocessing #pythonNamed-entity recognition seeks to locate and classify named entities mentioned in unstructured text into p...
Published   August 9, 2020
🌐
DataCamp
campus.datacamp.com › courses › introduction-to-natural-language-processing-in-python › named-entity-recognition
NER with NLTK | Python
# Tokenize the article into sentences: sentences sentences = ____ # Tokenize each sentence into words: token_sentences token_sentences = [____ for sent in ____] # Tag each tokenized sentence into parts of speech: pos_sentences pos_sentences = [____ for sent in ____] # Create the named entity chunks: chunked_sentences chunked_sentences = ____ # Test for stems of the tree with 'NE' tags for sent in chunked_sentences: for chunk in sent: if hasattr(chunk, "label") and ____ == "____": print(chunk)
🌐
Towards Data Science
towardsdatascience.com › home › latest › quick guide to entity recognition and geocoding with r
Named Entity Recognition with NLTK and SpaCy
March 5, 2025 - Now that I have a list of locations that occur at least once in the text, I can also check to see if there are any occurrences that the annotator has missed. I can do this in a number of ways, but I have included a function that simplifies the process, which requires a dataframe, a vector of locations to look for, and the column name where the text is located.