This video shows how to install MinerU which is a LLM-powered tool that converts PDFs into machine-readable formats (e.g., markdown, JSON), allowing for easy extraction into any format to create datasets.
Code:
git clone https://github.com/opendatalab/MinerU.git && cd MinerU
conda create -n MinerU python=3.10 && conda activate MinerU
pip install magic-pdf[full]==0.7.0b1 --extra-index-url https://wheels.myhloli.com
magic-pdf --version
git lfs install
mkdir model
cd model
git lfs clone https://huggingface.co/wanderkid/PDF-Extract-Kit
change magic-pdf.json for models-dir and cuda
wget https://github.com/opendatalab/MinerU/raw/master/demo/small_ocr.pdf
magic-pdf -p small_ocr.pdf
No comments:
Post a Comment