Gemma (2b-v2)
C++ (Windows, Linux, MacOS / CUDA and Metal accelerated) port of https://github.com/google/gemma_pytorch.git.
Example Input & Outputs
| Inputs | Outputs |
|
Which is your favorite Lord of the Rings movie? |
As an AI, I don’t have personal preferences like a favorite movie. However, I can tell you that the Lord of the Rings movies are generally considered to be very well-made and beloved by many fans. Here’s a breakdown of some common opinions:
Ultimately, the best Lord of the Rings movie for you depends on your personal preferences. |
Demo Code
1#include "blace_ai.h"
2#include <fstream>
3#include <iostream>
4
5// include the models you want to use
6#include "gemma_v2_2b_v2_v1_ALL_export_version_v25.h"
7
8using namespace blace;
9int main() {
10 ::workload_management::BlaceWorld blace;
11
12 // construct model inference arguments
13 ml_core::InferenceArgsCollection infer_args;
14 infer_args.inference_args.backends = {
15 ml_core::TORCHSCRIPT_CUDA_FP16, ml_core::TORCHSCRIPT_MPS_FP16,
16 ml_core::TORCHSCRIPT_CUDA_FP32, ml_core::TORCHSCRIPT_MPS_FP32,
17 ml_core::ONNX_DML_FP32, ml_core::TORCHSCRIPT_CPU_FP32};
18
19 std::vector<std::string> questions = {
20 "What is the answer to life?", "Will ai rule the world?",
21 "Which is your favorite lord of the rings movie?"};
22
23 for (auto str : questions) {
24 auto text_t = CONSTRUCT_OP(ops::FromTextOp(str));
25
26 auto output_len = CONSTRUCT_OP(ops::FromIntOp(200));
27 auto temperature = CONSTRUCT_OP(ops::FromFloatOp(0.));
28 auto top_p = CONSTRUCT_OP(ops::FromFloatOp(0.9));
29 auto top_k = CONSTRUCT_OP(ops::FromIntOp(50));
30
31 // construct inference operation
32 auto infer_op = gemma_v2_2b_v2_v1_ALL_export_version_v25_run(
33 text_t, output_len, temperature, top_p, top_k, 0, infer_args,
34 util::getPathToExe().string());
35
36 computation_graph::GraphEvaluator evaluator(infer_op);
37 auto [return_code, answer] = evaluator.evaluateToString();
38 std::cout << "Answer: " << answer << std::endl;
39
40 // writes text to file
41 std::ofstream out("answer.txt");
42 out << answer;
43 out.close();
44 }
45
46 return 0;
47}
Tested on version v0.9.96 of blace.ai sdk. Might also work on newer or older releases (check if release notes of blace.ai state breaking changes).
Quickstart
- Download blace.ai SDK and unzip. In the bootstrap script
build_run_demos.ps1(Windows) orbuild_run_demos.sh(Linux/MacOS) set theBLACE_AI_CMAKE_DIRenvironment variable to thecmakefolder inside the unzipped SDK, e.g.export BLACE_AI_CMAKE_DIR="<unzip_folder>/package/cmake". - Download the model payload(s) (
.binfiles) from below and place in the same folder as the bootstrapper scripts. - Then run the bootstrap script with
powershell build_run_demo.ps1(Windows)
sh build_run_demo.sh(Linux and MacOS).
This will build and execute the demo.
Supported Backends
| Torchscript CPU | Torchscript CUDA FP16 * | Torchscript CUDA FP32 * | Torchscript MPS FP16 * | Torchscript MPS FP32 * | ONNX CPU FP32 | ONNX DirectML FP32 * |
|---|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
(*: Hardware Accelerated)
Artifacts
| Torchscript Payload | Demo Project | Header |

