diff --git a/README.md b/README.md index 7521629..5ce0aaa 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,101 @@ # Semantic search demo -Start ollama docker (podman) image: +A simple CLI and web tool to help you search your PDF files. + +## Dependencies + +### Ollama + +You need to run Ollama server using either local instance or docker image. + +#### Local instance + +* install [ollama](ollama.ai) +* run `ollama serve` +* pull selected embedding model: + ```bash + > ollama pull nomic-embed-text + ``` + +#### Ollama: Podman / Docker image + +* Pull docker/podman [image](https://hub.docker.com/r/ollama/ollama): + ```bash + > podman run -d -v models:/root/.ollama -p 11434:11434 --name ollama ollama/ollama + ``` +* Then you need to pull the selected model (in our case `nomic-embed-text`): + ```bash + > podman exec -ti ollama ollama pull nomic-embed-text + ``` + +### UV + +Install UV from your package manager. You can also run the script as-is without UV, but you will need to install required dependencies manually (pymupdf and ollama packages). + +## Running + +### Test + +Check that the script can reach Ollama server and create / delete databases: ``` -podman run -d -v models:/root/.ollama -p 11434:11434 --name ollama ollama/ollama -podman exec -ti ollama ollama pull nomic-embed-text +> py -m uv run main.py test ``` -Needs `uv` to work +### Create database ``` -py -m uv run main.py test -``` \ No newline at end of file +> uv run main.py create +``` + +### Add files + +``` +> uv run main.py add-file db.pkl ~/docs/*pdf +``` + +### Query (CLI) + +``` +> uv run main.py query db.pkl "balanced tree" +Querying: 'balanced tree' in database: db.pkl + +Found 10 results: +============================================================ + +1. Distance: 15.2735 + Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf + Page: 236, Chunk: 1 + Text preview: constructed balanced tree? 2. What is the probability that an insertion requires rebalancing? Mathematical analysis of this complicated algorithm is still an open problem. Empirical tests support t... +---------------------------------------- + +2. Distance: 15.7531 + Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf + Page: 230, Chunk: 1 + Text preview: s balanced if and only if for every node the heights of its two subtrees differ by at most 1. Trees satisfying this condition are often called AVL-trees (after their inventors). We shall simply cal... +---------------------------------------- +... +``` + +### Query (web interface) + +Start the server first: + +```bash +> uv run main.py host db.pkl + Starting web server... + Database: db.pkl + URL: http://127.0.0.1:5000 + Press Ctrl+C to stop + * Serving Flask app 'main' + * Debug mode: off + WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. + * Running on http://127.0.0.1:5000 +Press CTRL+C to quit +``` + +Then visit http://localhost:5000/ and search there. + +**Important** + +If you intend to expose this publicly, use production WSGI server instead of the one built in Flask. Also as I intend to use this internally, not much effort went into securing the server. \ No newline at end of file