🌐
spaCy
spacy.io
spaCy · Industrial-strength Natural Language Processing in Python
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
API
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
Usage
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.
Models
Downloadable trained pipelines and weights for spaCy
Universe
spaCy is a free open-source library for Natural Language Processing in Python. It features NER, POS tagging, dependency parsing, word vectors and more.

software library for natural language processing

spaCy Tailored Pipelines
Advanced NLP with spaCy: A free online course
pypi Version
conda Version
spaCy (/speɪˈsiː/ spay-SEE) is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main … Wikipedia
Factsheet
spaCy
Original author Matthew Honnibal
Developers Explosion AI, various
Factsheet
spaCy
Original author Matthew Honnibal
Developers Explosion AI, various
🌐
PyPI
pypi.org › project › spacy
spacy · PyPI
To install additional data tables for lemmatization and normalization you can run pip install spacy[lookups] or install spacy-lookups-data separately. The lookups package is needed to create blank models with lemmatization data, and to lemmatize in languages that don't yet come with pretrained models and aren't powered by third-party libraries. When using pip it is generally recommended to install packages in a virtual environment to avoid modifying system state: python -m venv .env source .env/bin/activate pip install -U pip setuptools wheel pip install spacy
      » pip install spacy
    
Published   Nov 17, 2025
Version   3.8.11
Homepage   https://spacy.io
Discussions

python - Spacy nlp = spacy.load("en_core_web_lg") - Stack Overflow
Also look at stackoverflow.com/questions/56446478/spacy-en-model-issue/… - I just answered a similar question. ... For a Linux system run the below code in terminal if you would be using a virtual environment else skip first and second command : python -m venv .env source .env/bin/activate ... More on stackoverflow.com
🌐 stackoverflow.com
Should I learn Python if I want to use spaCy?
I agree with both of the previous commenters in that: I use {spacyr} + {reticulate} for NLP work that involves using spaCy. This is because I like support for Python from RStudio IDE and rmarkdown. It's miles better for R + Python work than its closest neighbor, Jupyter Notebook. But at the same time, even as someone who also mostly uses R, I agree with Python's approach to NLP. Actually, if I'm being more precise, it's not really R vs. Python but more functional vs. object-oriented approaches to NLP. The difference is between working with words as observations representated as rows in dataframes versus working with words as objects that have special attributes. Let me extrapolate a bit more on my point #2. Consider a case where you want to know whether the head of a word is a Verb. In {spacyr}, this means finding the value of the head column at the row of the word, then finding the row of the head word using that value, then finding the value of the POS column at that row. This approach involves a lot of slice() and filter() and subsetting. In Python's spaCy, on the other hand, this just involves getting the attribute head of the word and then getting the attribute of that word. If that word is stored in a variable called my_word, checking whether the head of that word is a Verb is simply my_word.head.pos_ == "VERB" So at least in the case of analyzing depedency relations, the difference is between working with indices of a data frame of words vs. working with pointers to words that are accessible as an attribute of word objects. I favor the latter object-oriented approach in Python because it's faster, less code, more readable, and more intuitive. But that's probably partly because my work involves a lot of analyzing dependency relatios. tl;dr #1: Depends on the task, but object-oriented approach of Python is overall better IMO. But it's not an either-or: you can do all that work within the comfort of the RStudio IDE and your R-centered workflow with {reticulate} and R markdown. Just to be clear though, functional vs. object-oriented approach to NLP is also not really a black-white issue, much like "R vs. Python". In fact, the part that I love the most about the object-oriented NLP workflow in Python is list comprehension and that's about as functional programming as you can get. Working with a list of word objects is IMO the perfect marriage between functional and object-oriented programming as it vastly improves the quality of my work. tl;dr #2: If you're going to learn spaCy (the Python way), learn Python's list comprehension too. More on reddit.com
🌐 r/rstats
8
3
July 5, 2020
Install spacy in python
Hey, I tried to install spacy in python. It worked out and I can find it in the pip list in my terminal. But now if i try to run my python programme there is an error that says “no module named spacy”. I don’t know what to do because it is installed. Have you any idea? More on discuss.python.org
🌐 discuss.python.org
0
August 30, 2021
Is it better to use NLTK or Spacy for text pre processing?
Both serve their own purpose I guess, but also consider Stanza More on reddit.com
🌐 r/LanguageTechnology
19
13
November 3, 2022
Top answer
1 of 5
31

I stumbled across the same question and the model path can be found using the model class variable to a loaded spacy model.

For instance, having completed the model download at the command line as follows:
python -m spacy download en_core_web_sm

then within the python shell:

import spacy
model = spacy.load("en_core_web_sm")
model._path

This will show you where the model has been installed in your system.

If you want to download to a different location, I believe you can write the following at the command line:
python -m spacy.en.download en_core_web_sm --data-path /some/dir

Hope that helps

2 of 5
12

I can't seem to find any evidence that spacy pays attention to the $SPACY_DATA_DIR environment variable, nor can I get the above --data-path or model.path (--model.path?) parameters to work when trying to download models to a particular place. For me this was an issue as I was trying to keep the models out of a Docker image so that they could be persisted or be updated easily without rebuilding the image.

I eventually came to the following solution for using pre-trained models:

  1. Run the download code as normal (i.e. python -m spacy.download en_core_web_lg)
  2. In Python: import spacy and then nlp = spacy.load('en_core_web_lg')
  3. Now save this to the place you want it: nlp.to_disk('path/to/dir')

You can now load this from the local file via nlp=spacy.load('path/to/dir'). There's a suggestion in the documentation that you can download the models manually:

You can place the model data directory anywhere on your local file system. To use it with spaCy, simply assign it a name by creating a shortcut link for the data directory. But I can't make sense of what this means in practice (have submitted an 'issue' to spaCy).

Hope this helps anyone else trying to do something similar.

🌐
Real Python
realpython.com › natural-language-processing-spacy-python
Natural Language Processing With spaCy in Python – Real Python
February 1, 2025 - This free and open-source library for natural language processing (NLP) in Python has a lot of built-in capabilities and is becoming increasingly popular for processing and analyzing data in NLP.
🌐
GitHub
github.com › explosion › spaCy
GitHub - explosion/spaCy: 💫 Industrial-strength Natural Language Processing (NLP) in Python
💫 Industrial-strength Natural Language Processing (NLP) in Python - explosion/spaCy
Starred by 32.9K users
Forked by 4.6K users
Languages   Python 54.1% | MDX 31.2% | Cython 10.5% | JavaScript 2.6% | Sass 0.8% | TypeScript 0.4%
Find elsewhere
🌐
Domino Data Lab
domino.ai › blog › natural-language-in-python-using-spacy
Using spaCy for natural language processing (NLP) in Python
August 13, 2025 - Oftentimes teams turn to various libraries in Python to manage complex NLP tasks. sPacy is an open-source Python library that provides capabilities to conduct advanced natural language processing analysis and build models that can underpin document ...
🌐
GeeksforGeeks
geeksforgeeks.org › nlp › spacy-for-natural-language-processing
spaCy for Natural Language Processing - GeeksforGeeks
July 23, 2025 - spaCy is an open-source library for advanced Natural Language Processing (NLP) in Python.
🌐
Wikipedia
en.wikipedia.org › wiki › SpaCy
spaCy - Wikipedia
May 9, 2025 - spaCy (/speɪˈsiː/ spay-SEE) is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew ...
🌐
scispacy
allenai.github.io › scispacy
scispacy | SpaCy models for biomedical text processing
scispaCy is a Python package containing spaCy models for processing biomedical, scientific or clinical text.
🌐
Penn Libraries
guides.library.upenn.edu › penntdm › python › spacy
SpaCy Package - Text Analysis - Guides at Penn Libraries
These models are powerful engines of spaCy that performs several NLP-related tasks, such as part-of-speech tagging, named entity recognition, and dependency parsing. You can download these models for the English language by executing the following code: # [Mac Terminal] python3 -m spacy download en_core_web_lg python3 -m spacy download en_core_web_sm # [Jupyter Notebook] !python3 -m spacy download en_core_web_sm !python3 -m spacy download en_core_web_lg # [Conda install] conda install -c conda-forge spacy-model-en_core_web_sm conda install -c "conda-forge/label/broken" spacy-model-en_core_web_sm conda install -c "conda-forge/label/cf202003" spacy-model-en_core_web_sm
🌐
Reddit
reddit.com › r/rstats › should i learn python if i want to use spacy?
r/rstats on Reddit: Should I learn Python if I want to use spaCy?
July 5, 2020 -

I am no expert by any means in R but I can get it to do what I want. However, I find myself intrigued by spaCy and its possibilities. I mainly work with unstructured text data and that will continue to be the case.

Is this something that is better in python than in R due to spaCy or have I been sucked in by glitzy marketing?

I am aware that there is a spaCy wrapper for R. Will I lose much functionality by using this? Is it better to use and understand it within python first?

Sorry for the barrage of questions I am just a bit of a novice and keen to learn.

Top answer
1 of 4
7
I agree with both of the previous commenters in that: I use {spacyr} + {reticulate} for NLP work that involves using spaCy. This is because I like support for Python from RStudio IDE and rmarkdown. It's miles better for R + Python work than its closest neighbor, Jupyter Notebook. But at the same time, even as someone who also mostly uses R, I agree with Python's approach to NLP. Actually, if I'm being more precise, it's not really R vs. Python but more functional vs. object-oriented approaches to NLP. The difference is between working with words as observations representated as rows in dataframes versus working with words as objects that have special attributes. Let me extrapolate a bit more on my point #2. Consider a case where you want to know whether the head of a word is a Verb. In {spacyr}, this means finding the value of the head column at the row of the word, then finding the row of the head word using that value, then finding the value of the POS column at that row. This approach involves a lot of slice() and filter() and subsetting. In Python's spaCy, on the other hand, this just involves getting the attribute head of the word and then getting the attribute of that word. If that word is stored in a variable called my_word, checking whether the head of that word is a Verb is simply my_word.head.pos_ == "VERB" So at least in the case of analyzing depedency relations, the difference is between working with indices of a data frame of words vs. working with pointers to words that are accessible as an attribute of word objects. I favor the latter object-oriented approach in Python because it's faster, less code, more readable, and more intuitive. But that's probably partly because my work involves a lot of analyzing dependency relatios. tl;dr #1: Depends on the task, but object-oriented approach of Python is overall better IMO. But it's not an either-or: you can do all that work within the comfort of the RStudio IDE and your R-centered workflow with {reticulate} and R markdown. Just to be clear though, functional vs. object-oriented approach to NLP is also not really a black-white issue, much like "R vs. Python". In fact, the part that I love the most about the object-oriented NLP workflow in Python is list comprehension and that's about as functional programming as you can get. Working with a list of word objects is IMO the perfect marriage between functional and object-oriented programming as it vastly improves the quality of my work. tl;dr #2: If you're going to learn spaCy (the Python way), learn Python's list comprehension too.
2 of 4
5
As someone who prefers R to Python 9 times out of 10, Python has much better resources for NLP, including most online tutorials. It’s not super difficult to learn and it’ll serve you well in future work. (Granted, pandas can be much more confusing than tidyverse but no one says you have to do all your data processing in Python :p) spaCy is a great library with a solid community of users. I’d definitely recommend learning it.
🌐
DataCamp
datacamp.com › cheat-sheet › spacy-cheat-sheet-advanced-nlp-in-python
spaCy Cheat Sheet: Advanced NLP in Python | DataCamp
August 1, 2021 - spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python.
🌐
Python.org
discuss.python.org › python help
Install spacy in python - Python Help - Discussions on Python.org
August 30, 2021 - Hey, I tried to install spacy in python. It worked out and I can find it in the pip list in my terminal. But now if i try to run my python programme there is an error that says “no module named spacy”. I don’t know what …
🌐
Pythonhumanities
spacy.pythonhumanities.com › 01_01_install_and_containers.html
1. The Basics of spaCy — Introduction to spaCy 3
!pip install spacy · !python -m spacy download en_core_web_sm · Now that we’ve installed spaCy let’s import it to make sure we installed it correctly. import spacy · Great! Now, let’s make sure we downloaded the model successfully with the command below.
🌐
Annameier
annameier.net › spacy-no-internet
How to make spaCy work on a non-networked computer (or, reflections on how much we take the internet for granted) – Anna A. Meier
March 2, 2019 - If you’ve used Python before, you’ve probably used pip, which comes with Python. Conda is an alternative, but we’ll work with pip from here on out. The basic command in bash for installing a package on an internet-connected machine is (for spaCy; note that bash, unlike R, is not case-sensitive):
🌐
QuantInsti
blog.quantinsti.com › spacy-python
Natural Language Processing in Python Using spaCy
September 8, 2022 - spaCy is a powerful Python library for natural language processing. In this guide, we look at tokenisation, named entity recognition, pos tagging, and more using spaCy and Python.
🌐
Domino Data Lab
domino.ai › home › data science & machine learning dictionary | domino data lab › what is spacy? | domino data lab
What is spaCy? | Domino Data Lab
June 10, 2025 - spaCy is a free, open-source Python library that provides advanced capabilities for natural language processing (NLP) on large volumes of text at high speed.