You are not really doing time-series prediction. You are trying to predict each element of Y from a single element of X, which means that you are just solving a standard kernelized regression problem.
Another problem is when computing the RBF kernel over a range of vectors [[0],[1],[2],...], you will get a band of positive values along the diagonal of the kernel matrix while values far from the diagonal will be close to zero. The test set portion of your kernel matrix is far from the diagonal and will therefore be very close to zero, which would cause all of the SVR predictions to be close to the bias term.
For time series prediction I suggest building the training test set as
x[0]=Y[0:K]; y[0]=Y[K]
x[1]=Y[1:K+1]; y[1]=Y[K+1]
...
that is, try to predict future elements of the sequence from a window of previous elements.
Answer from user1149913 on Stack OverflowVideos
In the context of support vector regression, the fact that your data is a time series is mainly relevant from a methodological standpoint -- for example, you can't do a k-fold cross validation, and you need to take precautions when running backtests/simulations.
Basically, support vector regression is a discriminative regression technique much like any other discriminative regression technique. You give it a set of input vectors and associated responses, and it fits a model to try and predict the response given a new input vector. Kernel SVR, on the other hand, applies one of many transformations to your data set prior to the learning step. This allows it to pick up nonlinear trends in the data set, unlike e.g. linear regression. A good kernel to start with would probably be the Gaussian RBF -- it will have a hyperparameter you can tune, so try out a couple values. And then when you get a feeling for what's going on you can try out other kernels.
With a time series, an import step is determining what your "feature vector" will be; each
is called a "feature" and can be calculated from present or past data, and each
, the response, will be the future change over some time period of whatever you're trying to predict. Take a stock for example. You have prices over time. Maybe your features are a.) the 200MA-30MA spread and b.) 20-day volatility, so you calculate each
at each point in time, along with
, the (say) following week's return on that stock. Thus, your SVR learns how to predict the following week's return based on the present MA spread and 20-day vol. (This strategy won't work, so don't get too excited ;)).
If the papers you read were too difficult, you probably don't want to try to implement an SVM yourself, as it can be complicated. IIRC there is a "kernlab" package for R that has a Kernel SVM implementation with a number of kernels included, so that would provide a quick way to get up and running.
My personal answer to the question as asked is "yes". You may view it as a pro or a con that there are an infinite number of choices of features to describe the past.Try to pick features that correspond to how you might concisely describe to someone what the market has just done [eg "the price is at 1.4" tells you nothing if it is not related to some other number]. As for the target of the SVM, the simplest are the difference in prices and the ratio of prices for two consecutive days. As these correspond directly to the fate of a hypothetical trade, they seem good choices.
I have to pedantically disagree with the first statement by Jason: you can do k-fold cross-validation in situations like that described by raconteur and it is useful (with a proviso I will explain). The reason it is statistically valid is that the instances of the target in this case have no intrinsic relationship: they are disjoint differences or ratios. If you choose instead to use data at higher resolution than the scale of the target, there would be reason for concern that correlated instances might appear in the training set and validation set, which would compromise the cross-validation (by contrast, when applying the SVM you will have no instances available whose targets overlap the one you are interested in).
The thing that does reduce the effectiveness of cross-validation is if the behavior of the market is changing over time. There are two possible ways to deal with this. The first is to incorporate time as a feature (I've not found this very useful, perhaps because the values of this feature in the future are all new). A well-motivated alternative is to use walk-forward validation (which means testing your methodology on a sliding window of time, and testing it on the period just after this window. If behaviour is changing over time, the saying attributed to Niels Bohr "Prediction is very difficult, especially about the future" is especially appropriate. There is some evidence in the literature that the behaviour of financial markets does change over time, generally becoming more efficient, which typically means that successful trading systems deteriorate in performance over time.
Good luck!
Here, a very good article: http://machinelearningmastery.com/time-series-forecasting-supervised-learning/
In a few words, define a window of size n and that is the size of your feature vector. Reshape the dataset and play.
I used to solve the value error:
model = svm.SVR().fit(np.transpose(np.matrix(df['Dates'])),np.transpose(np.matrix(df['sie'])))
More Info: https://stackoverflow.com/questions/30813044/sklearn-found-arrays-with-inconsistent-numbers-of-samples-when-calling-linearre
