Cross Lingual Cross Domain

You can try out the model at SGNLP.
If you want to find out more information, please contact us at SGNLP-AISingapore.

Model Details
How to Get Started With the Model
Training
Model Parameters
License

Model Details

Model Name: Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language

Description: It is an implementation of Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model paper.
Paper: Unsupervised domain adaptation of a pretrained cross-lingual language model. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Nov, 2020 (pp. 3672-3678).
Author(s): Li, J., He, R., Ye, H., Ng, H. T., Bing, L., & Yan, R. (2020).
URL: https://www.ijcai.org/Proceedings/2020/508

How to Get Started With the Model

Install Python package

SGnlp is an initiative by AI Singapore's NLP Hub. They aim to bridge the gap between research and industry, promote translational research, and encourage adoption of NLP techniques in the industry.

Various NLP models, other than cross lingual cross domain are available in the python package. You can try them out at SGNLP-Demo | SGNLP-Github.

pip install sgnlp

Examples

For more full code guide, please refer to this documentation.
Alternatively, you can also try out the demo for Cross Lingual Cross Domain.

Example of Undersupervised Feature Decomposition (UFD) model (German language):

from sgnlp.models.ufd import UFDModelBuilder, UFDPreprocessor

# Instantiate model builder and preprocessor
model_builder = UFDModelBuilder(
    source_domains=['books'],
    target_languages=['de'],
    target_domains=['dvd'])
preprocessor = UFDPreprocessor()

# Build pretrained model groups
model_groups = model_builder.build_model_group()


# Model predict ('books_de_dvd' model example)
instance = """Wolverine is BACK Der Film ist im Grunde wie alle Teile der X-Men für Comic-Fans auf jeden Fall ein muss.
              Hugh Jackman spielt seine Rolle wie immer so gut was ich von den ein oder anderen Darsteller leider nicht
              sagen kann. Story und Action sind aber genug Gründe um sich die Blu-ray zu kaufen."""

instance_features = preprocessor([instance])
output = model_groups['books_de_dvd'](**instance_features)

Training

The training datasets can be retrieved from the following author's repository(github).

Training Results - For UFD

Training Time: (Unsupervised training) ~3 hours for 30 epochs on a single V100 GPU
Training Time: (Supervised training) ~3 hours for 60 epochs on a single V100 GPU

Model Parameters

Model Weights: refer to documentation for details
Model Config: refer to documentation for details
Model Inputs: Raw text.
Model Outputs: Array of logits with the size of number of classes.
Model Size: XLM-Roberta: ~2.2GB, Adaptor Domain: ~8.0MB, Adaptor Global: ~8.0MB, Feature Mapper: ~8.0MB, Classifier: ~9.1KB.
Model Inference Info: ~2 sec on Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz.
Usage Scenarios: Sentiment analysis for eCommerce with operations across multiple countries.

License

For non-commercial use: GNU GPLv3.
For commercial use: please contact us SGNLP-AISingapore

aisingapore
/

unsupervised-feature-decomposition