Gemma – blace.ai

Gemma (2b-v2)

C++ (Windows, Linux, MacOS / CUDA and Metal accelerated) port of https://github.com/google/gemma_pytorch.git.

Example Input & Outputs

Inputs

Outputs

Which is your favorite Lord of the Rings movie?

As an AI, I don’t have personal preferences like a favorite movie.

However, I can tell you that the Lord of the Rings movies are generally considered to be very well-made and beloved by many fans.

Here’s a breakdown of some common opinions:

The Fellowship of the Ring: Many consider this the most faithful adaptation of the book, capturing the epic scope and emotional depth.
The Two Towers: This film is praised for its action sequences and the introduction of new characters like Merry and Pippin.
The Return of the King: This film is often considered the most visually stunning and emotionally resonant, culminating in a satisfying conclusion to the story.

Ultimately, the best Lord of the Rings movie for you depends on your personal preferences.

Demo Code

 1#include "blace_ai.h"
 2#include <fstream>
 3#include <iostream>
 4
 5// include the models you want to use
 6#include "gemma_v2_2b_v2_v1_ALL_export_version_v26.h"
 7
 8using namespace blace;
 9int main() {
10  ::workload_management::BlaceWorld blace;
11
12  // construct model inference arguments
13  ml_core::InferenceArgsCollection infer_args;
14  infer_args.inference_args.backends = {
15      ml_core::TORCHSCRIPT_CUDA_FP16, ml_core::TORCHSCRIPT_MPS_FP16,
16      ml_core::TORCHSCRIPT_CUDA_FP32, ml_core::TORCHSCRIPT_MPS_FP32,
17      ml_core::ONNX_DML_FP32,         ml_core::TORCHSCRIPT_CPU_FP32};
18
19  std::vector<std::string> questions = {
20      "What is the answer to life?", "Will ai rule the world?",
21      "Which is your favorite lord of the rings movie?"};
22
23  for (auto str : questions) {
24    auto text_t = CONSTRUCT_OP(ops::FromTextOp(str));
25
26    auto output_len = CONSTRUCT_OP(ops::FromIntOp(200));
27    auto temperature = CONSTRUCT_OP(ops::FromFloatOp(0.));
28    auto top_p = CONSTRUCT_OP(ops::FromFloatOp(0.9));
29    auto top_k = CONSTRUCT_OP(ops::FromIntOp(50));
30
31    // construct inference operation
32    auto infer_op = gemma_v2_2b_v2_v1_ALL_export_version_v26_run(
33        text_t, output_len, temperature, top_p, top_k, 0, infer_args,
34        util::getPathToExe().string());
35
36    computation_graph::GraphEvaluator evaluator(infer_op);
37    auto [return_code, answer] = evaluator.evaluateToString();
38    std::cout << "Answer: " << answer << std::endl;
39
40    // writes text to file
41    std::ofstream out("answer.txt");
42    out << answer;
43    out.close();
44  }
45
46  return 0;
47}

Tested on version v1.1.4 of blace.ai sdk. Might also work on newer or older releases (check if release notes of blace.ai state breaking changes).

Quickstart

Download blace.ai SDK and unzip. In the bootstrap script build_run_demos.ps1 (Windows) or build_run_demos.sh (Linux/MacOS) set the BLACE_AI_CMAKE_DIR environment variable to the cmake folder inside the unzipped SDK, e.g. export BLACE_AI_CMAKE_DIR="<unzip_folder>/package/cmake".
Download the model payload(s) (.bin files) from below and place in the same folder as the bootstrapper scripts.
Then run the bootstrap script with
powershell build_run_demo.ps1 (Windows)
sh build_run_demo.sh (Linux and MacOS).
This will build and execute the demo.

Supported Backends

Torchscript CPU	Torchscript CUDA FP16 *	Torchscript CUDA FP32 *	Torchscript MPS FP16 *	Torchscript MPS FP32 *	ONNX CPU FP32	ONNX DirectML FP32 *	ONNX DirectML FP16 *
✅	✅	✅	✅	✅	❌	❌	❌

(*: Hardware Accelerated)

Artifacts

Torchscript Payload

Demo Project

Header