Videos
I am currently using sequenceMatcher.ratio() in a program I am working on, and while the function itself is exactly what I need the runtime is an issue. On 2 files im testing on, 500x2000 lines it takes about 1 minute. On the actual target documents, 20000x20000, it will take around 4000 minutes or roughly 3 days as best as I can figure.
I can't use quick_ratio() or real_quick_ratio() because accuracy of comparisons matter and both quick_ratio() and real_quick_ratio() per the documentation are "always at least as large as ratio()", or in other words will say that words are more similar than the normal ratio function.
If anyone knows any similar functions or other ways of approaching this issue (comparing how similar two words are relatively quickly) I could really use the help. The only alternative I or my boss have at the moment is multiprocessing or pushing it into a distributed environment and just brute forcing the slow version I have at the moment.