\

Hypernetworks: Neural Networks for Hierarchical Data

8 points - today at 4:55 PM

Source
  • joefourier

    today at 9:43 PM

    Odd that the author didnโ€™t try giving a latent embedding to the standard neural network (or modulated the activations with a FiLM layer) and had static embeddings as the baseline. Thereโ€™s no real advantage to using a hypernetwork and they tend to be more unstable and difficult to train, and scale poorly unless you train a low rank adaptation.

      • mkmccjr

        today at 10:30 PM

        Hello. I am the author of the post. The goal of this was to provide a pedagogical example of applying Bayesian hierarchical modeling principles to real world datasets. These datasets often contain inherent structure that is important to explicitly model (eg clinical trials across multiple hospitals). Oftentimes a single model cannot capture this over-dispersion but there is not enough data to split out the results (nor should you).

        The idea behind hypernetworks is that they enable Gelman-style partial pooling to explicitly modeling the data generation process while leveraging the flexibility of neural network tooling. Iโ€™m curious to read more about your recommendations: their connection to the described problems is not immediately obvious to me but I would be curious to dig a bit deeper.

        I agree that hypernetworks have some challenges associated with them due to the fragility of maximum likelihood estimates. In the follow-up post, I dug into how explicit Bayesian sampling addresses these issues.