Factsheet
» pip install spacy
Videos
python - Spacy nlp = spacy.load("en_core_web_lg") - Stack Overflow
Should I learn Python if I want to use spaCy?
Install spacy in python
Is it better to use NLTK or Spacy for text pre processing?
I stumbled across the same question and the model path can be found using the model class variable to a loaded spacy model.
For instance, having completed the model download at the command line as follows:
python -m spacy download en_core_web_sm
then within the python shell:
import spacy
model = spacy.load("en_core_web_sm")
model._path
This will show you where the model has been installed in your system.
If you want to download to a different location, I believe you can write the following at the command line:
python -m spacy.en.download en_core_web_sm --data-path /some/dir
Hope that helps
I can't seem to find any evidence that spacy pays attention to the $SPACY_DATA_DIR environment variable, nor can I get the above --data-path or model.path (--model.path?) parameters to work when trying to download models to a particular place. For me this was an issue as I was trying to keep the models out of a Docker image so that they could be persisted or be updated easily without rebuilding the image.
I eventually came to the following solution for using pre-trained models:
- Run the download code as normal (i.e.
python -m spacy.download en_core_web_lg) - In Python:
import spacyand thennlp = spacy.load('en_core_web_lg') - Now save this to the place you want it:
nlp.to_disk('path/to/dir')
You can now load this from the local file via nlp=spacy.load('path/to/dir'). There's a suggestion in the documentation that you can download the models manually:
You can place the model data directory anywhere on your local file system. To use it with spaCy, simply assign it a name by creating a shortcut link for the data directory. But I can't make sense of what this means in practice (have submitted an 'issue' to spaCy).
Hope this helps anyone else trying to do something similar.
For a Linux system run the below code in terminal if you would be using a virtual environment else skip first and second command :
python -m venv .env
source .env/bin/activate
pip install -U spacy
python -m spacy download en_core_web_lg
The downloaded language model can be found at :
/usr/local/lib/python3.6/dist-packages/en_core_web_lg --> /usr/local/lib/python3.6/dist-packages/spacy/data/en_core_web_lg
For more documentation information refer https://spacy.io/usage
Hope it was helpful.
Commands to install any package from spacy check here about en_ore_web_lg ~800MB:
python -m spacy download en
python -m spacy download en_core_web_sm
I am no expert by any means in R but I can get it to do what I want. However, I find myself intrigued by spaCy and its possibilities. I mainly work with unstructured text data and that will continue to be the case.
Is this something that is better in python than in R due to spaCy or have I been sucked in by glitzy marketing?
I am aware that there is a spaCy wrapper for R. Will I lose much functionality by using this? Is it better to use and understand it within python first?
Sorry for the barrage of questions I am just a bit of a novice and keen to learn.

