blace.inference – blace.ai

The SDK can easily be included in new and existing c++ projects. It consumes AI models like .onnx or .pt (torchscript) files as well as models from the blace.hub.

Features

native C++ Library

The whole library and model inference engine is written in native and high-performance c++.

All computations run locally on your users hardware. The data never leaves the machine which makes it suitable for data sensitive environments and products.

Local inference

EASY integration and DEPLOYMENT

Add the library to your project via cmake.

Full support for Windows, MacOS and Ubuntu. Hardware accelerated on NVDIA GPUs and Apple Silicon Macs.

Support for major os and hardware accelerators

Support for different model types

Blace.inference can consume a range of industry-standard model formats like onnx, torchscript and our proprietary format exported from blace.hub. This makes it possible to port your existing inference solutions to our framework.

Model files can be encrypted prior deployment and decrypted on the fly by our engine, making it easy to protect your intellectual property.

Encryption

50+ building blocks

Blace.ai consists of a wide range nodes which can be used to programmatically assemble the execution graph. Execution of this graph will be highly performant since existing results are automatically cached.

Example graph of an application using blace.ai to inference Segment Anything masking on an image. Note that the result of the dark-orange node is cached internally, so if the image buffer stays the same the mask prediction will run very fast.

The computation graph can be fully serialized, making it possible to run computations on remote machines (or even the cloud).

Serializable

Seperate process space

All computation can be run in a separate process, so if you integrate our framework in your existing solution it will never interfere with existing libraries.

Load as many models into your process as you like – blace.inference will automatically take care of memory allocations, model loading and unloading and resource management.