AI inference chat session from custom data using OpenCL and NOT CUDA!
While developing and procuring hardware for the PINGLEWARE.SUPPORT knowledge base, I was disappointed that OLLAMA and GPT4ALL built on LLAMA.CPP only supported CUDA devices and only the high end NVIDIA GPU cards were supported as the popular GT 730 is considered obsolete by NVIDIA even though proliferated on eCommerce websites like amazon and walmart. Delving deeper into discovery that OpenCL is an open framework for interacting with the GPU and all GPU hardware does support this framework including the popular GT 730. With the aid of ChatGPT, and following the example of @tensorflow-models/universal-sentence-encoder where custom content is split into sentences and the query is compared against the tokenization score to find the best match, a tokenization and matching algorithm was created using custom content chunked by sentences and maintained in local text files.
While the correct answers are 100%, the wrongs answers are highly hallucinogenic, but the response time is quick in comparison with other inference engines.
CLINFERA hs two modes:
- CLI
- API Server
CLI mode
To use the CLI mode, specify the option and the required parameters, use --help to list the CLI options.
API Server mode:
To start the API Server, specify ./clinera server host=IP_ADDRESS port=PORT_NO