Cross Lingual Cross Domain
You can try out the model at SGNLP.
If you want to find out more information, please contact us at SGNLP-AISingapore.
Table of Contents
Model Details
Model Name: Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
- Description: It is an implementation of Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model paper.
- Paper: Unsupervised domain adaptation of a pretrained cross-lingual language model. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Nov, 2020 (pp. 3672-3678).
- Author(s): Li, J., He, R., Ye, H., Ng, H. T., Bing, L., & Yan, R. (2020).
- URL: https://www.ijcai.org/Proceedings/2020/508
How to Get Started With the Model
Install Python package
SGnlp is an initiative by AI Singapore's NLP Hub. They aim to bridge the gap between research and industry, promote translational research, and encourage adoption of NLP techniques in the industry.
Various NLP models, other than cross lingual cross domain are available in the python package. You can try them out at SGNLP-Demo | SGNLP-Github.
pip install sgnlp
Examples
For more full code guide, please refer to this documentation.
Alternatively, you can also try out the demo for Cross Lingual Cross Domain.
Example of Undersupervised Feature Decomposition (UFD) model (German language):
from sgnlp.models.ufd import UFDModelBuilder, UFDPreprocessor
# Instantiate model builder and preprocessor
model_builder = UFDModelBuilder(
source_domains=['books'],
target_languages=['de'],
target_domains=['dvd'])
preprocessor = UFDPreprocessor()
# Build pretrained model groups
model_groups = model_builder.build_model_group()
# Model predict ('books_de_dvd' model example)
instance = """Wolverine is BACK Der Film ist im Grunde wie alle Teile der X-Men für Comic-Fans auf jeden Fall ein muss.
Hugh Jackman spielt seine Rolle wie immer so gut was ich von den ein oder anderen Darsteller leider nicht
sagen kann. Story und Action sind aber genug Gründe um sich die Blu-ray zu kaufen."""
instance_features = preprocessor([instance])
output = model_groups['books_de_dvd'](**instance_features)
Training
The training datasets can be retrieved from the following author's repository(github).
Training Results - For UFD
- Training Time: (Unsupervised training) ~3 hours for 30 epochs on a single V100 GPU
- Training Time: (Supervised training) ~3 hours for 60 epochs on a single V100 GPU
Model Parameters
- Model Weights: refer to documentation for details
- Model Config: refer to documentation for details
- Model Inputs: Raw text.
- Model Outputs: Array of logits with the size of number of classes.
- Model Size: XLM-Roberta: ~2.2GB, Adaptor Domain: ~8.0MB, Adaptor Global: ~8.0MB, Feature Mapper: ~8.0MB, Classifier: ~9.1KB.
- Model Inference Info: ~2 sec on Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz.
- Usage Scenarios: Sentiment analysis for eCommerce with operations across multiple countries.
License
- For non-commercial use: GNU GPLv3.
- For commercial use: please contact us SGNLP-AISingapore