YAML Metadata Error: "model-index[0].results[0].dataset.type" with value "https://github.com/lijuntaopku/UFD/tree/main/data" fails to match the required pattern: /^(?:[\w-]+\/)?[\w-.]+$/

Cross Lingual Cross Domain

You can try out the model at SGNLP.
If you want to find out more information, please contact us at SGNLP-AISingapore.

Table of Contents

Model Details

Model Name: Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language

  • Description: It is an implementation of Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model paper.
  • Paper: Unsupervised domain adaptation of a pretrained cross-lingual language model. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Nov, 2020 (pp. 3672-3678).
  • Author(s): Li, J., He, R., Ye, H., Ng, H. T., Bing, L., & Yan, R. (2020).
  • URL: https://www.ijcai.org/Proceedings/2020/508

How to Get Started With the Model

Install Python package

SGnlp is an initiative by AI Singapore's NLP Hub. They aim to bridge the gap between research and industry, promote translational research, and encourage adoption of NLP techniques in the industry.

Various NLP models, other than cross lingual cross domain are available in the python package. You can try them out at SGNLP-Demo | SGNLP-Github.

pip install sgnlp

Examples

For more full code guide, please refer to this documentation.
Alternatively, you can also try out the demo for Cross Lingual Cross Domain.

Example of Undersupervised Feature Decomposition (UFD) model (German language):

from sgnlp.models.ufd import UFDModelBuilder, UFDPreprocessor

# Instantiate model builder and preprocessor
model_builder = UFDModelBuilder(
    source_domains=['books'],
    target_languages=['de'],
    target_domains=['dvd'])
preprocessor = UFDPreprocessor()

# Build pretrained model groups
model_groups = model_builder.build_model_group()


# Model predict ('books_de_dvd' model example)
instance = """Wolverine is BACK Der Film ist im Grunde wie alle Teile der X-Men für Comic-Fans auf jeden Fall ein muss.
              Hugh Jackman spielt seine Rolle wie immer so gut was ich von den ein oder anderen Darsteller leider nicht
              sagen kann. Story und Action sind aber genug Gründe um sich die Blu-ray zu kaufen."""

instance_features = preprocessor([instance])
output = model_groups['books_de_dvd'](**instance_features)


Training

The training datasets can be retrieved from the following author's repository(github).

Training Results - For UFD

  • Training Time: (Unsupervised training) ~3 hours for 30 epochs on a single V100 GPU
  • Training Time: (Supervised training) ~3 hours for 60 epochs on a single V100 GPU

Model Parameters

  • Model Weights: refer to documentation for details
  • Model Config: refer to documentation for details
  • Model Inputs: Raw text.
  • Model Outputs: Array of logits with the size of number of classes.
  • Model Size: XLM-Roberta: ~2.2GB, Adaptor Domain: ~8.0MB, Adaptor Global: ~8.0MB, Feature Mapper: ~8.0MB, Classifier: ~9.1KB.
  • Model Inference Info: ~2 sec on Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz.
  • Usage Scenarios: Sentiment analysis for eCommerce with operations across multiple countries.

License

  • For non-commercial use: GNU GPLv3.
  • For commercial use: please contact us SGNLP-AISingapore
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.

Evaluation results

Model card error

This model's model-index metadata is invalid: Schema validation error. "model-index[0].results[0].dataset.type" with value "https://github.com/lijuntaopku/UFD/tree/main/data" fails to match the required pattern: /^(?:[\w-]+\/)?[\w-.]+$/