Local Vector Search with llama.cpp Embeddings and sqlite_vec
With “embeddings”, you can search for passages of text with a similar meaning to another passage. You can create an embedding for a passage of text using a model that converts the text to a vector of numbers. Two passages that are similar will have similar embedding vectors.
Local Llama.cpp Embedding Server
I already use llama.cpp to run large language models locally and there is a nice tutorial showing how to use llama.cpp to generate embeddings using Snowflake’s embedding models.
I found a quantized version of the snowflake-arctic-embed-m-v1.5 model, and launched a server hosting the model with this command:
./build/bin/llama-server -m ~/Models/snowflake-arctic-embed-m-v1.5-q8_0.gguf --embeddings -c 512 -ngl 99
This model generates a vector of 768 floating point numbers. You can test it with this command:
curl -X POST "http://localhost:8080/embedding" --data '{"content":"Star Wars is better than Star Trek"}'
[{"index":0,"embedding":[[0.01127300038933754,-0.005094126798212528,0.08973372727632523,0.03633967041969299,0.04224010556936264,...
Sqlite Vec
Now that we can generate embeddings, we need to store embeddings in a vector database and search them. The general pipeline will take a query string, embed it, then match it against every vector in the database, and finally sort the results by distance. This sounds like it would be slow, but in practice, it is fairly fast to search.
There is an impressive open source project called sqlite_vec that I leveraged to create a database of vectors, and search them. I used the Ruby sql_vec gem to store vectors for a document.
Get an embedding from Ruby
First, we need to be able to call the llama.cpp embedding API from Ruby. I implemented a method to do this.
# libs/embed.rb
require 'net/http'
require 'uri'
require 'json'
def embedding(content)
uri = URI.parse("http://localhost:8080/embedding")
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Post.new(uri.request_uri)
request.body = {
content: content
}.to_json
request["Content-Type"] = "application/json"
response = http.request(request)
if response.code.to_i >= 200 && response.code.to_i < 300
json = JSON.parse(response.body)
return json[0]["embedding"].first
end
nil
end
Store the Vector in Sqlite
First, initialize an empty database
# init.rb
require 'sqlite3'
require 'sqlite_vec'
`rm embeddings.db`
db = SQLite3::Database.new('embeddings.db')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)
sqlite_version, vec_version = db.execute("select sqlite_version(), vec_version()").first
puts "sqlite_version=#{sqlite_version}, vec_version=#{vec_version}"
db.execute("CREATE TABLE document (
document_id integer primary key,
document_path varchar(255),
document_sha2 varchar(64)
)")
db.execute("CREATE TABLE document_chunk (
document_chunk_id integer primary key,
document_id int,
document_chunk varchar(255)
)")
db.execute("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[768])")
I made a script to import documents in a folder named corpus
into the database so we can query them later. I’m ignoring really short passages, but maybe this isn’t needed? I’m also hashing each document, so I can efficiently re-index files that have changed in the future. Warning: This script deletes the database and recreates it.
# import.rb
require "./libs/embed"
require 'digest'
require 'sqlite3'
require 'sqlite_vec'
def file_sha256(file_path)
sha = Digest::SHA2.new
File.open(file_path) do |f|
while chunk = f.read(256) # only load 256 bytes at a time
sha << chunk
end
end
sha.hexdigest # returns what you want
end
# watch out!
`rm embeddings.db`
db = SQLite3::Database.new('embeddings.db')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)
sqlite_version, vec_version = db.execute("select sqlite_version(), vec_version()").first
puts "sqlite_version=#{sqlite_version}, vec_version=#{vec_version}"
document_id = 1
document_chunk_id = 1
data = []
# Look in folder specified by argument, or 'corpus'
corpus_path = ARGV[0]&.strip || "corpus"
Dir[corpus_path + "**/*"].each do |file_path|
db.transaction do
sha256 = file_sha256(file_path)
db.execute("INSERT INTO
document(document_id, document_path, document_sha2)
VALUES (?, ?, ?)", [document_id, file_path, sha256])
IO.foreach(file_path) do |line|
db.execute("INSERT INTO document_chunk(document_chunk_id, document_id, document_chunk)
VALUES (?, ?, ?)", [document_chunk_id, document_id, line])
begin
raw = line.strip
if raw != '' && raw.length > 16
data[document_chunk_id] = raw
float_array = embedding(raw)
db.execute("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)", [document_chunk_id, float_array.pack("f*")])
end
rescue => ex
puts
puts file_path
puts raw
puts ex
end
document_chunk_id += 1
end
document_id += 1
end
end
Finally, query the database with this script. You can try either KNN or cosine vector search methods to see which works better. I didn’t see much difference in searching natural language.
# query.rb
require "./libs/embed"
require 'sqlite3'
require 'sqlite_vec'
db = SQLite3::Database.new('embeddings.db')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)
sqlite_version, vec_version = db.execute("select sqlite_version(), vec_version()").first
puts "sqlite_version=#{sqlite_version}, vec_version=#{vec_version}"
input_string = ARGV[0]
float_array = embedding(input_string)
# KNN search
# query = %(SELECT
# rowid,
# distance
# FROM vec_items
# WHERE embedding MATCH ?
# ORDER BY distance
# LIMIT 10)
query = %(SELECT
rowid,
vec_distance_cosine(embedding, ?) as score
FROM vec_items
ORDER BY score
LIMIT 10)
rows = db.execute(query, [float_array.pack("f*")])
rowids = rows.map do |rowid, distance|
rowid
end.join(',')
result = db.execute(
"SELECT
document.document_path,
document_chunk.document_chunk_id,
document_chunk.document_chunk
FROM document_chunk
LEFT JOIN document ON document.document_id = document_chunk.document_id
WHERE
document_chunk_id IN (#{rowids})")
chunks = result.map do |document_path, document_chunk_id, document_chunk|
[document_chunk_id, document_path + ": " + document_chunk]
end.to_h
rows.map do |rowid, distance|
puts
puts "#{rowid} #{distance} #{chunks[rowid]}"
end
You can search your corpus for related passages with this script above.
ruby ./query.rb "search string here"