# Semantic search demo ![Web interface](./img/screenshot.png) A simple CLI and web tool to help you search your PDF files. ## Dependencies ### Ollama You need to run Ollama server using either local instance or docker image. #### Local instance * install [ollama](ollama.ai) * run `ollama serve` * pull selected embedding model: ```bash > ollama pull nomic-embed-text ``` #### Ollama: Podman / Docker image * Pull docker/podman [image](https://hub.docker.com/r/ollama/ollama): ```bash > podman run -d -v models:/root/.ollama -p 11434:11434 --name ollama ollama/ollama ``` * Then you need to pull the selected model (in our case `nomic-embed-text`): ```bash > podman exec -ti ollama ollama pull nomic-embed-text ``` ### UV Install UV from your package manager. You can also run the script as-is without UV, but you will need to install required dependencies manually (pymupdf and ollama packages). ## Running ### Test Check that the script can reach Ollama server and create / delete databases: ``` > py -m uv run main.py test ``` ### Create database ``` > uv run main.py create ``` ### Add files ``` > uv run main.py add-file db.pkl ~/docs/*pdf ``` ### Query (CLI) ``` > uv run main.py query db.pkl "balanced tree" Querying: 'balanced tree' in database: db.pkl Found 10 results: ============================================================ 1. Distance: 15.2735 Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf Page: 236, Chunk: 1 Text preview: constructed balanced tree? 2. What is the probability that an insertion requires rebalancing? Mathematical analysis of this complicated algorithm is still an open problem. Empirical tests support t... ---------------------------------------- 2. Distance: 15.7531 Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf Page: 230, Chunk: 1 Text preview: s balanced if and only if for every node the heights of its two subtrees differ by at most 1. Trees satisfying this condition are often called AVL-trees (after their inventors). We shall simply cal... ---------------------------------------- ... ``` ### Query (web interface) Start the server first: ```bash > uv run main.py host db.pkl Starting web server... Database: db.pkl URL: http://127.0.0.1:5000 Press Ctrl+C to stop * Serving Flask app 'main' * Debug mode: off WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on http://127.0.0.1:5000 Press CTRL+C to quit ``` Then visit http://localhost:5000/ and search there. **Important** If you intend to expose this publicly, use production WSGI server instead of the one built in Flask. Also as I intend to use this internally, not much effort went into securing the server.