I would expect soft-margin SVM to be better even when training dataset is linearly separable. The reason is that in a hard-margin SVM, a single outlier can determine the boundary, which makes the classifier overly sensitive to noise in the data.
In the diagram below, a single red outlier essentially determines the boundary, which is the hallmark of overfitting

To get a sense of what soft-margin SVM is doing, it's better to look at it in the dual formulation, where you can see that it has the same margin-maximizing objective (margin could be negative) as the hard-margin SVM, but with an additional constraint that each lagrange multiplier associated with support vector is bounded by C. Essentially this bounds the influence of any single point on the decision boundary, for derivation, see Proposition 6.12 in Cristianini/Shaw-Taylor's "An Introduction to Support Vector Machines and Other Kernel-based Learning Methods".
The result is that soft-margin SVM could choose decision boundary that has non-zero training error even if dataset is linearly separable, and is less likely to overfit.
Here's an example using libSVM on a synthetic problem. Circled points show support vectors. You can see that decreasing C causes classifier to sacrifice linear separability in order to gain stability, in a sense that influence of any single datapoint is now bounded by C.

Meaning of support vectors:
For hard margin SVM, support vectors are the points which are "on the margin". In the picture above, C=1000 is pretty close to hard-margin SVM, and you can see the circled points are the ones that will touch the margin (margin is almost 0 in that picture, so it's essentially the same as the separating hyperplane)
For soft-margin SVM, it's easer to explain them in terms of dual variables. Your support vector predictor in terms of dual variables is the following function.

Here, alphas and b are parameters that are found during training procedure, xi's, yi's are your training set and x is the new datapoint. Support vectors are datapoints from training set which are are included in the predictor, ie, the ones with non-zero alpha parameter.
Answer from Yaroslav Bulatov on Stack OverflowVideos
I would expect soft-margin SVM to be better even when training dataset is linearly separable. The reason is that in a hard-margin SVM, a single outlier can determine the boundary, which makes the classifier overly sensitive to noise in the data.
In the diagram below, a single red outlier essentially determines the boundary, which is the hallmark of overfitting

To get a sense of what soft-margin SVM is doing, it's better to look at it in the dual formulation, where you can see that it has the same margin-maximizing objective (margin could be negative) as the hard-margin SVM, but with an additional constraint that each lagrange multiplier associated with support vector is bounded by C. Essentially this bounds the influence of any single point on the decision boundary, for derivation, see Proposition 6.12 in Cristianini/Shaw-Taylor's "An Introduction to Support Vector Machines and Other Kernel-based Learning Methods".
The result is that soft-margin SVM could choose decision boundary that has non-zero training error even if dataset is linearly separable, and is less likely to overfit.
Here's an example using libSVM on a synthetic problem. Circled points show support vectors. You can see that decreasing C causes classifier to sacrifice linear separability in order to gain stability, in a sense that influence of any single datapoint is now bounded by C.

Meaning of support vectors:
For hard margin SVM, support vectors are the points which are "on the margin". In the picture above, C=1000 is pretty close to hard-margin SVM, and you can see the circled points are the ones that will touch the margin (margin is almost 0 in that picture, so it's essentially the same as the separating hyperplane)
For soft-margin SVM, it's easer to explain them in terms of dual variables. Your support vector predictor in terms of dual variables is the following function.

Here, alphas and b are parameters that are found during training procedure, xi's, yi's are your training set and x is the new datapoint. Support vectors are datapoints from training set which are are included in the predictor, ie, the ones with non-zero alpha parameter.
In my opinion, Hard Margin SVM overfits to a particular dataset and thus can not generalize. Even in a linearly separable dataset (as shown in the above diagram), outliers well within the boundaries can influence the margin. Soft Margin SVM has more versatility because we have control over choosing the support vectors by tweaking the C.
This is analogous to "why use linear regression if ridge regression is available" - if your primary goal is predictive accuracy, you wouldn't.
Since SVMs are only used for predictive modeling, I don't think there is much real world use for hard margin. The soft margin problem includes the hard margin as a special case, so if you're doing everything properly, your hyperparameter tuning of the soft margin will find the hard margin solution if that maximizes the predictive power of the model.
That said, hard margin SVMs are useful to understand as a stepping stone to the soft margin case. So there is pedagogical and mental mapping use for them.
The real world problems aren't linearly separable usually, thus you can't use hard margin. However if you find a mapping corresponding to a kernel that make transformed data linearly separable in the Hilbert space you can use hard margin. This is equal to extract features from training set according to mentioned mapping and using feature data as new training set. If the mapping make feature data linearly separable you can use hard margin