The result seems very promising. Did you train it from scratch and rely on the scaling law or did you prune from a much larger model.
Β· Sign up or log in to comment