Fahd Mirza on AI, Cloud, DevOps and Databases: Install NVIDIA Ingest Locally and Use it with Thousands of Documents

This video shares step-by-step instructions to install NVIDIA Ingest locally and use it with PDFs, Word, and PowerPoint.

Code:

Pre-requisites:
===============

-- Install docker
-- Get NGC api key from https://ngc.nvidia.com/
-- Get Early Access from https://developer.nvidia.com/nemo-microservices-early-access/join

Phase 1= Configure NV-INGEST Server:
====================================

Step 1:

git clone https://github.com/nvidia/nv-ingest && cd nv-ingest

Step 2:

docker login nvcr.io

Username: $oauthtoken
Password: <Your NGC API Key>

Step 3:

Make sure NVIDIA is set as your default container runtime before running the docker compose command:
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default

Step 4:

docker compose up


Phase 2= Configure NV-INGEST client:
====================================

Step 1:


conda create --name nv-ingest-dev --file ./conda/environments/nv_ingest_environment.yml
conda activate nv-ingest-dev

cd client
pip install .

Step 2:

nv-ingest-cli \
  --doc ./data/multimodal_test.pdf \
  --output_directory ./processed_docs \
  --task='extract:{"document_type": "pdf", "extract_method": "pdfium", "extract_tables": "true", "extract_images": "true"}' \
  --client_host=localhost \
  --client_port=7670

  
Where to find output?
======================

After the ingestion steps above have completed, you should be able to find text and image subfolders inside your processed docs folder. Each will contain JSON formatted extracted content and metadata.

  ls -R processed_docs

Fahd Mirza on AI, Cloud, DevOps and Databases

Friday, January 3, 2025

Install NVIDIA Ingest Locally and Use it with Thousands of Documents

No comments:

Favourite Authors

Popular Posts

Oracle Jobs in Pakistan

Blog Honor