In the past weeks I've been working on some neural network models on a regression task. Given the temporal nature of my data, a recurrent models is a good fit and I decided to use a GRU layer in my architecture.
If you're reading this, it's likely that you already know what a Gated Recurrent Unit is, if not, then this and this will give you context.
Long story short, a GRU is a recurrent neural network defined by the following equations: