Hinge loss is difficult to work with when the derivative is needed because the derivative will be a piece-wise function. max has one non-differentiable point in its solution, and thus the derivative has the same. This was a very prominent issue with non-separable cases of SVM (and a good reason to use ridge regression).
Here's a slide (Original source from Zhuowen Tu, apologies for the title typo):

Where hinge loss is defined as max(0, 1-v) and v is the decision boundary of the SVM classifier. More can be found on the Hinge Loss Wikipedia.
As for your equation: you can easily pick out the v of the equation, however without more context of those functions it's hard to say how to derive. Unfortunately I don't have access to the paper and cannot guide you any further...
I disagree with the earlier answer that this is difficult to calculate. If we have the function \begin{align*} \sum_{t\in\mathcal{T}} \max \{0, 1 - d(t) \, y(t, \theta)\} \end{align*} the gradient with respect to $\theta$ is \begin{align*} & \sum_{t\in\mathcal{T}}g(t) \\ & g(t) := \begin{cases} 0 & \text{ if }1 - d(t) y(t, \theta) < 0 \\ -d(t)\dfrac{\partial y}{\partial \theta} & \text{ otherwise} \\ \end{cases} \end{align*} Theoretically this is ok, it just means that the gradient is not continuous. However, the objective is still continuous assuming that $d$ and $y$ are both continuous.
In practice, it's not a problem either. Any automatic differentiation software (tensorflow, pytorch, jax) will handle something like this automatically and correctly.