Model Card for Model ID

This is a model fine-tuned for multilingual NER, specifically for literary texts.

Model Details

Model Description

This model is a fine-tuned version of the multilingual XLM-RoBERTa, trained with English, French and Italian literary data.

Developed by: WpnSta as part of a NLP training course
Language(s) (NLP): English, French, Italian
Finetuned from model: XLM-RoBERTa-base

Model Sources

Repository: Github Repository with training code
Demo: Web Interface

Direct Use

This model is ready to be used to predict Named Entities on new text. It will detect the following entities, according to the LitBank annotation schema:

PER (Person, character, also animals with active roles in the narrative)
FAC (Facility, e.g. the house, the street)
GPE (Geopolitical Entity, e.g. London, the village)
LOC (Location, e.g. the river, the sea)
ORG (Organisation, e.g. the army, the court)
VEH (Vehicle, e.g. thi ship, the coach)
TIME (Temporal or historical reference, e.g. in the morning, Easter)

Limitations

The model was trained on literary texts, specifically from the 18th, 19th and 20th century (see Training Data below). It will perform best on custom text in the same languages and from the same time period, for more recent texts a model trained on news text will probably perform better.

Training Data

The model was trained on ~620'000 tokens in English, French and Italian, the following datasets were used:

Downloads last month: 77

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for WpnSta/lner-xlm-roberta

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3766)

this model

WpnSta
/

lner-xlm-roberta