Blog
19 hours ago
dReLU Activation Function: Matching SwiGLU Performance with 90% Sparsity
Explore dReLU, a novel activation function that applies ReLU to both gate and up-projections. Achieve superior sparsity and lower validation perplexity without compromising model convergence or performance.
Source: HackerNoon →