Factsheet
Videos
"spaCy is not research software" – What do you think?
Should I learn Python if I want to use spaCy?
Is spaCy really ready for production?
Hi,
First — I'm sorry to hear that this has been a bad experience. Thanks for taking the time to write this up. There's often a shortage of this sort of challenging feedback, and it's necessary for improvement.
I will say that the versions over the last couple of months have been a bit less stable than we want. This is an awkward transitional period: we're getting spaCy ready for neural networks in 2.0, and fixing a lot of long-standing issues. We're also getting ready to push v1.7.0.
In response to your specific comments:
-
spaCy tries to use as much of the same code for training and runtime as possible. There's therefore no plans to have a separate "runtime only" mode.
-
What isn't compiling out of the box?
-
I've never built a Debian package myself. I think this makes sense, and I should add it to our test service.
-
Here's how we're fixing the model download problem: We're introducing thin wrappers around the data assets so that you'll be able to install them as pip packages. You'll therefore be able to serve these however else you're serving pip dependencies. You can also point pip to a location on your file system, run a pip service, etc. But this isn't the only way to get the data installed.
Ultimately spaCy just needs to find the files on the filesystem somewhere. I had imagined that production users would either copy the data inside the spaCy package themselves, or create a package of spaCy that included the data they needed by default. I realise this wasn't clear --- but it's hard to know what a different production environment might need. You can also point spaCy to a location on your file system with the util.set_data_path() command.
-
I've been installing v1.6.0 from PyPi regularly, without issue --- so I'm not sure what's different in your setup. I wonder whether one of us could be pulling from a cached version? We do have a CI process which builds an sdist on a server, and then the test installs from there. We plan to keep improving the test infrastructure.
-
It's tricky to interpret what semver should mean when the data changes. I think if we bumped major version for something like a change to the stop words, we'd have no way to communicate deeper breaking changes. This is especially true as more languages are added. If we change an Estonian lemmatizer rule, should we increment the version?
Thanks again, Matthew Honnibal
More on reddit.comIs UNT Spacy still a thing in the Gundam?
Who created spaCy and when was it launched?
What are some common use cases for spaCy?
What are the key improvements in spaCy 3.0?
» pip install spacy

