2025-11-06 11:22:30 +01:00
2025-11-06 10:46:31 +01:00
2025-11-03 15:32:02 +01:00
2025-11-06 10:58:18 +01:00
2025-11-06 10:58:18 +01:00
2025-11-03 14:33:19 +01:00
2025-11-06 11:22:30 +01:00
2025-11-03 14:33:19 +01:00

Semantic search demo

A simple CLI and web tool to help you search your PDF files.

Dependencies

Ollama

You need to run Ollama server using either local instance or docker image.

Local instance

  • install ollama
  • run ollama serve
  • pull selected embedding model:
    > ollama pull nomic-embed-text
    

Ollama: Podman / Docker image

  • Pull docker/podman image:
    > podman run -d -v models:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
    
  • Then you need to pull the selected model (in our case nomic-embed-text):
    > podman exec -ti ollama ollama pull nomic-embed-text
    

UV

Install UV from your package manager. You can also run the script as-is without UV, but you will need to install required dependencies manually (pymupdf and ollama packages).

Running

Test

Check that the script can reach Ollama server and create / delete databases:

> py -m uv run main.py test

Create database

> uv run main.py create

Add files

> uv run main.py add-file db.pkl ~/docs/*pdf

Query (CLI)

> uv run main.py query db.pkl "balanced tree"
Querying: 'balanced tree' in database: db.pkl

Found 10 results:
============================================================

1. Distance: 15.2735
   Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf
   Page: 236, Chunk: 1
   Text preview: constructed balanced tree? 2. What is the probability that an insertion requires rebalancing? Mathematical analysis of this complicated algorithm is still an open problem. Empirical tests support t...
----------------------------------------

2. Distance: 15.7531
   Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf
   Page: 230, Chunk: 1
   Text preview: s balanced if and only if for every node the heights of its two subtrees differ by at most 1. Trees satisfying this condition are often called AVL-trees (after their inventors). We shall simply cal...
----------------------------------------
...

Query (web interface)

Start the server first:

> uv run main.py host db.pkl
   Starting web server...
   Database: db.pkl
   URL: http://127.0.0.1:5000
   Press Ctrl+C to stop
 * Serving Flask app 'main'
 * Debug mode: off
 WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit

Then visit http://localhost:5000/ and search there.

Important

If you intend to expose this publicly, use production WSGI server instead of the one built in Flask. Also as I intend to use this internally, not much effort went into securing the server.

Description
No description provided
Readme 252 KiB
Languages
Python 85.5%
HTML 14.5%