Ways to speedup SVR training in scikitlearn
Exporting Python Scikit-learn model to trading platform in C++
How well do Support Vector Machines scale to very large training datasets?
It should be fine with an online linear svm that is explicitly formulated for these kinds of problems. LibLinear is one of these and I think they have Java hooks, although I'm not sure about that. There is also Pegasos and Bottou's Stochastic quasi-Newton work.
If you're using a kernel svm the minimum bottleneck is the computation of the kernel matrix which is O(N2 ), but you can expect performance of around O(N3 ) on larger datasets. If you want references from this you should google "Trade offs large scale learning" by Bottou.
You can make approximate kernel SVMs work on large scale data, but it involves approximating the kernel by a set of projections, and it's probably more hassle than it's worth for an initial test.
Randomly guessing at the runtime of the linear SVM - you haven't given me sufficient information about the specs of your machine, or the data, but somewhere between an hour and a couple of days probably. The numbers get better if your feature vectors are sparse.
More on reddit.com