Additional Setup Information

Configuration Initialization

prismAId offers multiple ways to create review configuration files:

Web Initializer: Use the browser-based tool on the Review Configurator page to create TOML configuration files through a user-friendly interface.
Template Files: Ready-to-use configuration templates are available in the projects/templates directory for review, screening, and Zotero download tools.
Command Line Initializer: Use the binary with the -init flag to create a configuration file through an interactive terminal:
```
./prismaid -init
```

Terminal app for drafting project configuration file

Apache Tika Server for OCR (Optional)

For automatic OCR fallback when standard document conversion fails or returns empty text, you can set up an Apache Tika server. Tika is used automatically when conversion methods fail - you don’t call it separately.

What is Apache Tika?

Apache Tika is a content analysis toolkit that can extract text and metadata from over a thousand different file types. When configured with Tesseract OCR, it automatically serves as a fallback for:

Scanned PDF documents (when standard PDF extraction returns empty)
Image files (PNG, JPEG, TIFF, etc.)
Documents where standard extraction methods fail or return no text
Corrupted or non-standard files

Important: Tika is never called directly - it’s only used as an automatic fallback when standard conversion methods (like ledongthuc/pdf or pdfcpu for PDFs) fail or return empty text.

Quick Start with Included Script

prismAId includes a helper script (tika-service.sh) to easily manage a local Tika server using Podman or Docker:

# Start Tika server with OCR support
./tika-service.sh start

# Check if the server is running
./tika-service.sh status

# View server logs
./tika-service.sh logs

# Stop the server
./tika-service.sh stop

The server will be available at http://localhost:9998 by default.

Manual Setup with Docker/Podman

If you prefer to manage the container manually:

Using Docker:

# Pull and run Tika server with full OCR support
docker run -d -p 9998:9998 --name tika-ocr apache/tika:latest-full

# Check if it's running
docker ps | grep tika-ocr

# View logs
docker logs tika-ocr

# Stop the server
docker stop tika-ocr
docker rm tika-ocr

Using Podman:

# Pull and run Tika server with full OCR support
podman run -d -p 9998:9998 --name tika-ocr apache/tika:latest-full

# Check if it's running
podman ps | grep tika-ocr

# View logs
podman logs tika-ocr

# Stop the server
podman stop tika-ocr
podman rm tika-ocr

Testing Your Tika Server

Verify that the server is running correctly:

# Test with curl
curl http://localhost:9998/tika

# Or test with a file
curl -T sample.pdf http://localhost:9998/tika --header "Accept: text/plain"

If working correctly, you should receive a response from the server.

Using Tika with prismAId

Once the Tika server is running, provide its address when converting. Tika will automatically be used as fallback when standard methods fail:

# Convert PDFs - Tika used automatically as fallback when needed
./prismaid -convert-pdf ./papers -tika-server localhost:9998

The conversion will:

Try standard methods first (fast, local)
Only if they fail or return empty text → use Tika as fallback

See the Convert Tool documentation for more details.

System Requirements

RAM: 2-4 GB for the Tika container
Disk Space: ~1 GB for the Docker/Podman image
Software: Docker or Podman installed on your system

Troubleshooting

Server won’t start:

Ensure port 9998 is not already in use: lsof -i :9998 or netstat -an | grep 9998
Check Docker/Podman is running: docker info or podman info

Server is slow:

OCR processing is CPU-intensive and can take 10-60 seconds per page
Ensure adequate RAM is available (at least 4 GB free)
Consider processing documents in smaller batches

Connection refused:

Wait a few seconds after starting - the server needs time to initialize
Check firewall settings if accessing from another machine

Use in Jupyter Notebooks

When using versions <= 0.6.6 it is not possible to disable the prompt asking the user’s confirmation to proceed with the review, leading Jupyter notebooks to crash the python engine and to the impossibility to run reviews with single models (in ensemble reviews, on the contrary, confirmation requests are automatically disabled).

To overcome this problem, it is possible to intercept the IO on the terminal as it follows:

import pty
import os
import time
import select

def run_review_with_auto_input(input_str):
    master, slave = pty.openpty()  # Create a pseudo-terminal

    pid = os.fork()
    if pid == 0:  # Child process
        os.dup2(slave, 0)  # Redirect stdin
        os.dup2(slave, 1)  # Redirect stdout
        os.dup2(slave, 2)  # Redirect stderr
        os.close(master)
        import prismaid
        prismaid.RunReviewPython(input_str.encode("utf-8"))
        os._exit(0)

    else:  # Parent process
        os.close(slave)
        try:
            while True:
                rlist, _, _ = select.select([master], [], [], 5)
                if master in rlist:
                    output = os.read(master, 1024).decode("utf-8", errors="ignore")
                    if not output:
                        break  # Process finished

                    print(output, end="")

                    if "Do you want to continue?" in output:
                        print("\n[SENDING INPUT: y]")
                        os.write(master, b"y\n")
                        time.sleep(1)
        finally:
            os.close(master)
            os.waitpid(pid, 0)  # Ensure the child process is cleaned up

# Load your review (TOML) configuration
with open("config.toml", "r") as file:
    input_str = file.read()

# Run the review function
run_review_with_auto_input(input_str)