METAL-TOOLCHAIN(7) Metal METAL-TOOLCHAIN(7)

metal-toolchain - metal compiler toolchain overview

The Metal toolchain consists of a set of programs targeting Apple GPUs. The goal of this document is to provide an overview of the toolchain behavior. Refer to the documentation of individual programs for more specific information.

Metal supports two compilation mode: split-compilation and traditional.

In the split-compilation mode, the toolchain targets the AIR virtual target. Final translation to the actual GPU binary code is performed at runtime. In the more traditional mode, the toolchain directly emits binary code compatible with the selected GPU target.

The architecture of the AIR virtual target is air64. There are different subarchitectures for air64. Each architecture is associated with a platform version.

The currently supported AIR achitectures, together with their native platform versions are:

air64_v16

iPhoneOS 8


air64_v18

iPhoneOS 9, macOS 10.11, tvOS 9


air64_v111

iPhoneOS 10, macOS 10.12, tvOS 10, watchOS 3


air64_v20

iPhoneOS 11, macOS 10.13, tvOS 11, watchOS 4


air64_v21

iPhoneOS 12, macOS 10.14, tvOS 12, watchOS 5


air64_v22

iPhoneOS 13, macOS 10.15, tvOS 13, watchOS 6


air64_v23

iPhoneOS 14, macOS 11, tvOS 14, watchOS 7


air64_v24

iPhoneOS 15, macOS 12, tvOS 15, watchOS 8


air64_v25

iPhoneOS 16, macOS 13, tvOS 16, watchOS 9


air64_v26

iPhoneOS 17, macOS 14, tvOS 17, watchOS 10, visionOS 1


air64_v27

iPhoneOS 18, macOS 15, tvOS 18, watchOS 11, visionOS 2




Native GPU targets are in the <vendor>gpu_<arch> form, where <vendor> can be apple, amd, or intel; <arch> identifies the actual GPU architecture.

Known Apple GPU architectures are:

applegpu_gx2

applegpu_g4p

applegpu_g4g

applegpu_g5p

applegpu_g9p

applegpu_g9g

applegpu_g10p

applegpu_g11p

applegpu_g11m

applegpu_g11g

applegpu_g11g_8fstp

applegpu_g12p

applegpu_g13p

applegpu_g13g

applegpu_g13s

applegpu_g13c

applegpu_g13d

applegpu_g14p

applegpu_g14g

applegpu_g14s

applegpu_g14d

applegpu_g15p



Known AMD GPU architectures are:

amdgpu_gfx600

amdgpu_gfx600_nwh

amdgpu_gfx701

amdgpu_gfx704

amdgpu_gfx803

amdgpu_gfx802

amdgpu_gfx900

amdgpu_gfx904

amdgpu_gfx906

amdgpu_gfx1010_nsgc

amdgpu_gfx1010

amdgpu_gfx1011

amdgpu_gfx1012

amdgpu_gfx1030

amdgpu_gfx1032



Known Intel GPU architectures are:

intelgpu_skl_gt2r6

intelgpu_skl_gt2r7

intelgpu_skl_gt3r10

intelgpu_kbl_gt2r0

intelgpu_kbl_gt2r2

intelgpu_kbl_gt2r4

intelgpu_kbl_gt3r1

intelgpu_kbl_gt3r6

intelgpu_icl_1x6x8r7

intelgpu_icl_1x8x8r7



Having multiple architectures allows to store inside the same universal binary multiple binaries, each targeting a different version of the same platform.

The AIR toolchain is able to target the following platforms:

iPhoneOS

Minimum supported version is iPhoneOS 8


macOS

Minimum supported version is macOS 10.11


tvOS

Minimum supported version is tvOS 9


watchOS

Minimum supported version is watchOS 3


visionOS

Minimum supported version is visionOS 1




Starting with air64_v23, all platforms are compatible with each other. So for instance you can link an air64_v23-apple-iphoneos14 object and an air64_v23-apple-macos11 object together.

There two main inputs of the AIR toolchain are Metal source files and Metal scripts. The canonical extension of Metal source files is .metal. The canonical extension of Metal scripts is .mtlp-json.

Metal scripts are consumed by tools emitting GPU binary code. Depending on the code being emitted, a Metal script might be required or not. For instance, a Metal script is required to emit a pipeline, but it is not required when emitting a dynamic library.

The AIR toolchain emits MetalLibs and MachOs. The former stores AIR binaries. The latter stores GPU binaries.

The AIR toolchain also emits universal binaries, that can contains both MetalLib and MachO slices at the same time.

The AIR toolchain provides two main compiler drivers: metal and metal-tt.

metal primary goal is to translate a bunch of source files into MetalLibs, MachOs, or universal binaries.

What is actually emitted depends on the selected target architectures. If more than one architecture is selected, a universal binary is emitted. Otherwise, if the target architecture is AIR a MetalLib is emitted. If the target architecture is a GPU architecture, a MachO is emitted.

$ metal -arch air64_v23 foo.metal -o foo.metallib


Emits a MetalLib.

$ metal -arch applegpu_g13s foo.metal -N foo.mtlp-json -o foo.metallib


Emits a MachO.

$ metal -arch air64_v23 -arch applegpu_g13s foo.metal -N foo.mtlp-json -o foo.metallib


Emits a universal binary, with one MetalLib slice and one MachO slice.

The most efficient way to use the metal driver is to independently compile a bunch of source files, followed by a link step:

$ metal -arch air64_v23 -c foo.metal -o foo.air
$ metal -arch air64_v23 -c bar.metal -o bar.air
$ metal -arch air64_v23 foo.air bar.air -o foobar.metallib


Since the emission of GPU binaries starts from MetalLibs, it is only needed to specify a GPU architecture at the link step:

$ metal -arch air64_v23 -c foo.metal -o foo.air
$ metal -arch air64_v23 -c bar.metal -o bar.air
$ metal -arch applegpu_g13s foo.air bar.air -N foobar.mtlp-json -o foobar.metallib


The metal driver requires to be told what architectures to target, which can be challenging when a large number of GPU architectures has to be targeted. The metal-tt driver solves this problem by automatically targeting all the GPU architectures supported by the toolchain:

$ metal -arch air64_v23 foo.metal -o foo.metallib-air64_v23
$ metal-tt foo.metallib-air64_v23 foo.mtlp-json -o foo.metallib


The produced foo.metallib contains one slice for each supported GPU architecture, plus the air64_v23 slice produced by metal.

A target is composed of a target architecture and a target platform.

Generally speaking, the target used by a compiler driver can be explicitly spelled out in the compiler driver command line. If the target is only partially spelled out -- e.g. the command line only specifies the target architecture -- the remaining components of the target are deduced by the compiler driver.

The deduction process is specific to each compiler driver, but it generally split deduction into two steps: selection of an architecture, followed by selection of a platform.

The default architecture is air64.

The platform is selected starting from the system root. If the system root points to a Darwin SDK, the target platform is set to the one of the SDK.

For instance assuming iPhoneOS16.0.sdk contains a valid iPhoneOS SDK, the target selected by the following command:

$ metal -isysroot iPhoneOS16.0.sdk foo.metal -o foo.metallib


Would be air64-apple-iphoneos16.0.

The system root can also be set using the SDKROOT environment variable. On Darwin, development tools are usually invoked using xcrun, which automatically sets SDKROOT to the selected SDK. Thus this command:

xcrun -sdk iphoneos metal foo.metal -o foo.metallib


Will target air64-apple-iphoneosX.Y, where X.Y is the iPhoneOS SDK target platform found by xcrun.

The metal-arch tool prints information about the architectures of the GPUs available in the current platform.

The metal-config tool prints information about the GPU architectures that can be targeted by the current toolchain.

To report bugs, please visit <https://developer.apple.com/bug-reporting/>.

metal(1), metal-arch(1), metal-config(1), metal-pipelines-script(5), metal-tt(1), xcrun(1)

Metal Shading Language Specification: <https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf>

2014-2024, The Metal Team

April 30, 2024 32023