Update README
This commit is contained in:
99
README.md
99
README.md
@@ -1,14 +1,101 @@
|
||||
# Semantic search demo
|
||||
|
||||
Start ollama docker (podman) image:
|
||||
A simple CLI and web tool to help you search your PDF files.
|
||||
|
||||
```
|
||||
podman run -d -v models:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
||||
podman exec -ti ollama ollama pull nomic-embed-text
|
||||
## Dependencies
|
||||
|
||||
### Ollama
|
||||
|
||||
You need to run Ollama server using either local instance or docker image.
|
||||
|
||||
#### Local instance
|
||||
|
||||
* install [ollama](ollama.ai)
|
||||
* run `ollama serve`
|
||||
* pull selected embedding model:
|
||||
```bash
|
||||
> ollama pull nomic-embed-text
|
||||
```
|
||||
|
||||
Needs `uv` to work
|
||||
#### Ollama: Podman / Docker image
|
||||
|
||||
* Pull docker/podman [image](https://hub.docker.com/r/ollama/ollama):
|
||||
```bash
|
||||
> podman run -d -v models:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
|
||||
```
|
||||
* Then you need to pull the selected model (in our case `nomic-embed-text`):
|
||||
```bash
|
||||
> podman exec -ti ollama ollama pull nomic-embed-text
|
||||
```
|
||||
|
||||
### UV
|
||||
|
||||
Install UV from your package manager. You can also run the script as-is without UV, but you will need to install required dependencies manually (pymupdf and ollama packages).
|
||||
|
||||
## Running
|
||||
|
||||
### Test
|
||||
|
||||
Check that the script can reach Ollama server and create / delete databases:
|
||||
|
||||
```
|
||||
py -m uv run main.py test
|
||||
> py -m uv run main.py test
|
||||
```
|
||||
|
||||
### Create database
|
||||
|
||||
```
|
||||
> uv run main.py create
|
||||
```
|
||||
|
||||
### Add files
|
||||
|
||||
```
|
||||
> uv run main.py add-file db.pkl ~/docs/*pdf
|
||||
```
|
||||
|
||||
### Query (CLI)
|
||||
|
||||
```
|
||||
> uv run main.py query db.pkl "balanced tree"
|
||||
Querying: 'balanced tree' in database: db.pkl
|
||||
|
||||
Found 10 results:
|
||||
============================================================
|
||||
|
||||
1. Distance: 15.2735
|
||||
Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf
|
||||
Page: 236, Chunk: 1
|
||||
Text preview: constructed balanced tree? 2. What is the probability that an insertion requires rebalancing? Mathematical analysis of this complicated algorithm is still an open problem. Empirical tests support t...
|
||||
----------------------------------------
|
||||
|
||||
2. Distance: 15.7531
|
||||
Document: [Niklaus_Wirth]_Algorithms_+_Data_Structures_=_Programs.pdf
|
||||
Page: 230, Chunk: 1
|
||||
Text preview: s balanced if and only if for every node the heights of its two subtrees differ by at most 1. Trees satisfying this condition are often called AVL-trees (after their inventors). We shall simply cal...
|
||||
----------------------------------------
|
||||
...
|
||||
```
|
||||
|
||||
### Query (web interface)
|
||||
|
||||
Start the server first:
|
||||
|
||||
```bash
|
||||
> uv run main.py host db.pkl
|
||||
Starting web server...
|
||||
Database: db.pkl
|
||||
URL: http://127.0.0.1:5000
|
||||
Press Ctrl+C to stop
|
||||
* Serving Flask app 'main'
|
||||
* Debug mode: off
|
||||
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
|
||||
* Running on http://127.0.0.1:5000
|
||||
Press CTRL+C to quit
|
||||
```
|
||||
|
||||
Then visit http://localhost:5000/ and search there.
|
||||
|
||||
**Important**
|
||||
|
||||
If you intend to expose this publicly, use production WSGI server instead of the one built in Flask. Also as I intend to use this internally, not much effort went into securing the server.
|
||||
Reference in New Issue
Block a user