Videos
Searching for the quoted text, it seems the book is Data Science for Business (Provost and Fawcett), and they're describing the soft-margin SVM. Their description of the hinge loss is wrong. The problem is that it doesn't penalize misclassified points that lie within the margin, as you mentioned.
In SVMs, smaller weights correspond to larger margins. So, using this "version" of the hinge loss would have pathological consequences: We could achieve the minimum possible loss (zero) simply by choosing weights small enough such that all points lie within the margin. Even if every single point is misclassified. Because the SVM optimization problem contains a regularization term that encourages small weights (i.e. large margins), the solution will always be the zero vector. This means the solution is completely independent of the data, and nothing is learned. Needless to say, this wouldn't make for a very good classifier.
The correct expression for the hinge loss for a soft-margin SVM is:
$$\max \Big( 0, 1 - y f(x) \Big)$$
where $f(x)$ is the output of the SVM given input $x$, and $y$ is the true class (-1 or 1). When the true class is -1 (as in your example), the hinge loss looks like this:

Note that the loss is nonzero for misclassified points, as well as correctly classified points that fall within the margin.
For a proper description of soft-margin SVMs using the hinge loss formulation, see The Elements of Statistical Learning (section 12.3.2) or the Wikipedia article.
The (A) hinge function can be expressed as
$$y_{i} = \gamma \max{\left(x_{i}-\theta, 0\right)} + \varepsilon_{i},$$
where:
$\gamma$ is the change in slope after the hinge. In your example, this amounts to the slope following the hinge, since your hinge-only model (see below) assumes zero effect of $x$ on $y$ until the hinge.
$\theta$ is the point (in $\boldsymbol{x}$) at which the hinge is located, and is a parameter estimated for the model. I believe your question is answered by considering that the location of the hinge is informed by the loss function.
$\varepsilon_{i}$ is some error term with some distribution.
Hinge functions can also be useful in changing any line:
$$y_{i} = \alpha_{0} + \beta x_{i} + \gamma \max{\left(x_{i}-\theta, 0\right)} + \varepsilon_{i},$$
where:
- $\alpha$ is the model constant, and the intercept of the curve before the hinge (i.e. for $x < \theta$). Of course, if $\theta < 0$, then the curve intersects the $y$-axis after the hinge so $\alpha$ will not necessarily be the $y$-intercept of the bent line.
- $\beta$ is the slope of the line relating $y$ to $x$
- $\gamma$ is the change in slope after the hinge.
In addition, the hinge can be used to model how a functional relationship between $y$ and $x$ changes form, as in this model where the relationship becomes quadra
$$y_{i} = \alpha_{0} + \beta x_{i} + \gamma \max{\left(x_{i}-\theta, 0\right)^{2}} + \varepsilon_{i},$$