Release tinygrad 0.9.0 · tinygrad/tinygrad

Nothing to show

Could not load tags

This commit was created on GitHub.com and signed with GitHub’s verified signature .

Close to the new line limit of 8000 lines, sitting at 7958 lines.

tinygrad is much more usable now.

Just over 1200 commits since 0.8.0 .

Release Highlights

New documentation: https://docs.tinygrad.org

gpuctypes has been brought in tree and is no longer an external dependency. [#3253]

has been brought in tree and is no longer an external dependency. [#3253] AMD=1 and NV=1 experimental backends for not requiring any userspace components like ROCm or CUDA. These backends should reduce the amount of python time, and specifically with multi-gpu use cases.

and experimental backends for not requiring any userspace components like ROCm or CUDA. PTX=1 for rendering directly to ptx instead of cuda. [#3139] [#3623] [#3775]

for rendering directly to ptx instead of cuda. [#3139] [#3623] [#3775] Nvidia tensor core support. [#3544]

THREEFRY=1 for numpy-less random number generation using threefry2x32. [#2601] [#3785]

for numpy-less random number generation using threefry2x32. [#2601] [#3785] More stabilized multi-tensor API. With ring all-reduce: [#3000] [#3852]

Core tinygrad has been refactored into 4 pieces, read more about it here.

Linearizer and codegen has support for generating kernels with multiple outputs.

Lots of progress towards greater kernel fusion in the scheduler. Fusing of ReduceOps with their elementwise children. This trains mnist and gpt2 with ~20% less kernels and makes llama inference faster. New LoadOps.ASSIGN allows fusing optimizer updates with grad. Schedule kernels in BFS order. This improves resnet and llama speed. W.I.P. for fusing multiple reduces: [#4259] [#4208]

MLPerf ResNet and BERT with a W.I.P. UNet3D

Llama 3 support with a new llama3.py that provides an OpenAI compatible API. [#4576]

that provides an OpenAI compatible API. [#4576] NF4 quantization support in Llama examples. [#4540]

label_smoothing has been added to sparse_categorical_crossentropy . [#3568]

Known Issues

Using tinygrad in a conda env on macOS is known to cause problems with the METAL backend. See #2226.

See the full changelog: v0.8.0...v0.9.0

See the known issues: https://github.com/tinygrad/tinygrad/issues?q=is%3Aissue+is%3Aopen+label%3Abug+sort%3Aupdated-desc

Join the Discord!