2cf727fe0a84d2223ffecb270b7313de6f36bca9
Semantic search demo
A simple CLI and web tool to help you search your PDF files.
Dependencies
Ollama
You need to run Ollama server using either local instance or docker image.
Local instance
- install ollama
- run
ollama serve - pull selected embedding model:
> ollama pull nomic-embed-text
Ollama: Podman / Docker image
- Pull docker/podman image:
> podman run -d -v models:/root/.ollama -p 11434:11434 --name ollama ollama/ollama - Then you need to pull the selected model (in our case
nomic-embed-text):> podman exec -ti ollama ollama pull nomic-embed-text
UV
Install UV from your package manager. You can also run the script as-is without UV, but you will need to install required dependencies manually (pymupdf and ollama packages).
Running
Test
Check that the script can reach Ollama server and create / delete databases:
> py -m uv run main.py test
Create database
> uv run main.py create
Add files
> uv run main.py add-file db.pkl ~/docs/*pdf
Query (CLI)
> uv run main.py query db.pkl "balanced tree"
Querying: 'balanced tree' in database: db.pkl
Found 10 results:
============================================================
1. Distance: 15.2735
Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf
Page: 236, Chunk: 1
Text preview: constructed balanced tree? 2. What is the probability that an insertion requires rebalancing? Mathematical analysis of this complicated algorithm is still an open problem. Empirical tests support t...
----------------------------------------
2. Distance: 15.7531
Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf
Page: 230, Chunk: 1
Text preview: s balanced if and only if for every node the heights of its two subtrees differ by at most 1. Trees satisfying this condition are often called AVL-trees (after their inventors). We shall simply cal...
----------------------------------------
...
Query (web interface)
Start the server first:
> uv run main.py host db.pkl
Starting web server...
Database: db.pkl
URL: http://127.0.0.1:5000
Press Ctrl+C to stop
* Serving Flask app 'main'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
Then visit http://localhost:5000/ and search there.
Important
If you intend to expose this publicly, use production WSGI server instead of the one built in Flask. Also as I intend to use this internally, not much effort went into securing the server.
Description
Languages
Python
85.5%
HTML
14.5%
