Local Vector Search with llama.cpp Embeddings and sqlite

With “embeddings”, you can search for passages of text with a similar meaning to another passage. You can create an embedding for a passage of text using a model that converts the text to a vector of numbers. Two passages that are similar will have similar embedding vectors.

Local Llama.cpp Embedding Server

I already use llama.cpp to run large language models locally and there is a nice tutorial showing how to use llama.cpp to generate embeddings using Snowflake’s embedding models.

I found a quantized version of the snowflake-arctic-embed-m-v1.5 model, and launched a server hosting the model with this command:

./build/bin/llama-server -m ~/Models/snowflake-arctic-embed-m-v1.5-q8_0.gguf --embeddings -c 512 -ngl 99

This model generates a vector of 768 floating point numbers. You can test it with this command:

curl -X POST "http://localhost:8080/embedding" --data '{"content":"Star Wars is better than Star Trek"}'                                           

[{"index":0,"embedding":[[0.01127300038933754,-0.005094126798212528,0.08973372727632523,0.03633967041969299,0.04224010556936264,...

Sqlite Vec

Now that we can generate embeddings, we need to store embeddings in a vector database and search them. The general pipeline will take a query string, embed it, then match it against every vector in the database, and finally sort the results by distance. This sounds like it would be slow, but in practice, it is fairly fast to search.

There is an impressive open source project called sqlite_vec that I leveraged to create a database of vectors, and search them. I used the Ruby sql_vec gem to store vectors for a document.

Get an embedding from Ruby

First, we need to be able to call the llama.cpp embedding API from Ruby. I implemented a method to do this.

# libs/embed.rb

require 'net/http'
require 'uri'
require 'json'

def embedding(content)
	uri = URI.parse("http://localhost:8080/embedding")
	http = Net::HTTP.new(uri.host, uri.port)
	
	request = Net::HTTP::Post.new(uri.request_uri)
	request.body = {
		content: content
	}.to_json
	
	request["Content-Type"] = "application/json"
	
	response = http.request(request)
	
	if response.code.to_i >= 200 && response.code.to_i < 300
		json = JSON.parse(response.body)
		return json[0]["embedding"].first
	end
	
	nil
end

Store the Vector in Sqlite

First, initialize an empty database

# init.rb

require 'sqlite3'
require 'sqlite_vec'

`rm embeddings.db`

db = SQLite3::Database.new('embeddings.db')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)

sqlite_version, vec_version = db.execute("select sqlite_version(), vec_version()").first

puts "sqlite_version=#{sqlite_version}, vec_version=#{vec_version}"

db.execute("CREATE TABLE document (
	document_id integer primary key,
	document_path varchar(255),
	document_sha2 varchar(64)
)")

db.execute("CREATE TABLE document_chunk (
	document_chunk_id integer primary key,
	document_id int,
	document_chunk varchar(255)
)")

db.execute("CREATE VIRTUAL TABLE vec_items USING vec0(embedding float[768])")

I made a script to import documents in a folder named corpus into the database so we can query them later. I’m ignoring really short passages, but maybe this isn’t needed? I’m also hashing each document, so I can efficiently re-index files that have changed in the future. Warning: This script deletes the database and recreates it.

# import.rb

require "./libs/embed"

require 'digest'
require 'sqlite3'
require 'sqlite_vec'

def file_sha256(file_path)
	sha = Digest::SHA2.new
	File.open(file_path) do |f|
		while chunk = f.read(256) # only load 256 bytes at a time
			sha << chunk
		end
	end
	sha.hexdigest # returns what you want
end

# watch out!
`rm embeddings.db`

db = SQLite3::Database.new('embeddings.db')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)

sqlite_version, vec_version = db.execute("select sqlite_version(), vec_version()").first

puts "sqlite_version=#{sqlite_version}, vec_version=#{vec_version}"

document_id = 1
document_chunk_id = 1

data = []

# Look in folder specified by argument, or 'corpus'

corpus_path = ARGV[0]&.strip || "corpus"

Dir[corpus_path + "**/*"].each do |file_path|
	
	db.transaction do
		
		sha256 = file_sha256(file_path)
		
		db.execute("INSERT INTO 
			document(document_id, document_path, document_sha2) 
				VALUES (?, ?, ?)", [document_id, file_path, sha256])
		
		IO.foreach(file_path) do |line|
	
			db.execute("INSERT INTO document_chunk(document_chunk_id, document_id, document_chunk) 
				VALUES (?, ?, ?)", [document_chunk_id, document_id, line])

			begin
				raw = line.strip
				if raw != '' && raw.length > 16
							
					data[document_chunk_id] = raw
			
					float_array = embedding(raw)
					db.execute("INSERT INTO vec_items(rowid, embedding) VALUES (?, ?)", [document_chunk_id, float_array.pack("f*")])
			
				end
	
				rescue => ex
					puts
					puts file_path
					puts raw
					puts ex
			end

			document_chunk_id += 1
			
		end
		
		document_id += 1
	end

end

Finally, query the database with this script. You can try either KNN or cosine vector search methods to see which works better. I didn’t see much difference in searching natural language.

# query.rb
require "./libs/embed"

require 'sqlite3'
require 'sqlite_vec'

db = SQLite3::Database.new('embeddings.db')
db.enable_load_extension(true)
SqliteVec.load(db)
db.enable_load_extension(false)

sqlite_version, vec_version = db.execute("select sqlite_version(), vec_version()").first

puts "sqlite_version=#{sqlite_version}, vec_version=#{vec_version}"

input_string = ARGV[0]

float_array = embedding(input_string)

# KNN search

# query = %(SELECT
# 		rowid,
# 		distance
# 	FROM vec_items
# 	WHERE embedding MATCH ?
# 	ORDER BY distance
# 	LIMIT 10)

query = %(SELECT
		rowid,
		vec_distance_cosine(embedding, ?) as score
	FROM vec_items
	ORDER BY score
	LIMIT 10)

rows = db.execute(query, [float_array.pack("f*")])

rowids = rows.map do |rowid, distance|
	rowid
end.join(',')

result = db.execute(
"SELECT 
	document.document_path,
	document_chunk.document_chunk_id, 
	document_chunk.document_chunk 
FROM document_chunk
LEFT JOIN document ON document.document_id = document_chunk.document_id
WHERE
	 document_chunk_id IN (#{rowids})")

chunks = result.map do |document_path, document_chunk_id, document_chunk|
	[document_chunk_id, document_path + ": " + document_chunk]
end.to_h

rows.map do |rowid, distance|
	puts
	puts "#{rowid} #{distance} #{chunks[rowid]}"
end

You can search your corpus for related passages with this script above.

ruby ./query.rb "search string here"