Biblio-glutton database and index (2024-04 Crossref dump)

This repository contains the Biblio-glutton (https://github.com/kermitt2/biblio-glutton) databases and indexes

Due to the limitation of the HF upload size to a maximum of 20 GB, we have compressed the files into chunks.

To decompresss, 7 Zip is required.

The repository contains the following files:

  • biblio-glutton-index.7z.*: contains the Zipped Elasticsearch index and engine.
  • data/db: contains the LMDB fast storage databases:
    • data/db/crossref.zip: the crossref dump (2025/04)
    • data/db/hal.7z*: the HAL identifiers
    • data/db/pmid.7z*: the PMID identifiers mapping
    • data/db/unpayWall.7z*: the unpayWall OA links (this comes from an old dump of 2018, we are planning to replace it with OpenALEX)

Getting started

Assuming you are in /home/user/glutton

  1. Clone the biblio-glutton application
git clone https://github.com/kermitt/biblio-glutton

You should have biblio-glutton under a directory of the same name.

  1. Clone this repository
git lfs install
git clone https://huggingface.co/sciencialab/biblio-glutton-dbs
  1. Unpack the Index
7z x biblio-glutton-dbs/biblio-glutton-index.7z (make sure to match the filename)

You should have a new directory biblio-glutton-index organised as follow:

biblio-glutton-index
β”œβ”€β”€ elastic
β”‚   β”œβ”€β”€ elastico_singleNode
β”‚   └── elastico_singleNode.sh
β”‚        β”œβ”€β”€ config
β”‚        β”œβ”€β”€ data
β”‚        └── logs
└── elasticsearch-8.15.0

You need to edit elastico_singleNode.sh and elastico_singleNode/config/ to replace the data and logs absolute paths that match your machine.

Then you can run the index by running:

sh elastic/elastico_singleNode.sh
  1. Unpack the Database

It's better to leave the data outside the biblio-glutton application, so moving it in your root /home/user/glutton/data could be a solution.

mv biblio-glutton-dbs/data/db . 
7z x data/db/*.7z (here also make sure you match the filenames)

Then you need to update the biblio-glutton/config/config.yml config file in biblio-glutton application to match the database path

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support