--- license: apache-2.0 tags: - eu - public procurement - cpv - sector - multilingual widget: - text: "Oppegรฅrd municipality, hereafter called the contracting authority, intends to enter into a framework agreement with one supplier for the procurement of fresh bread and bakery products for Oppegรฅrd municipality. The contract is estimated to NOK 1 400 000 per annum excluding VAT The total for the entire period including options is NOK 5 600 000 excluding VAT" --- # multilingual-cpv-sector-classifier This model is a fine-tuned version of [bert-base-multilingual-cased](https://huggingface.co/bert-base-multilingual-cased) on [the Tenders Economic Daily Public Procurement Data](https://simap.ted.europa.eu/en). It achieves the following results on the evaluation set: - F1 Score: 0.686 ## Model description The model takes procurement descriptions written in any of [104 languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) and classifies them into 45 sector classes represented by [CPV(Common Procurement Vocabulary)](https://simap.ted.europa.eu/en_GB/web/simap/cpv) code descriptions as listed below. | Common Procurement Vocabulary | |:-----------------------------| | Administration, defence and social security services. ๐Ÿ‘ฎโ€โ™€๏ธ | | Agricultural machinery. ๐Ÿšœ | | Agricultural, farming, fishing, forestry and related products. ๐ŸŒพ | | Agricultural, forestry, horticultural, aquacultural and apicultural services. ๐Ÿ‘จ๐Ÿฟโ€๐ŸŒพ | | Architectural, construction, engineering and inspection services. ๐Ÿ‘ทโ€โ™‚๏ธ | | Business services: law, marketing, consulting, recruitment, printing and security. ๐Ÿ‘ฉโ€๐Ÿ’ผ | | Chemical products. ๐Ÿงช | | Clothing, footwear, luggage articles and accessories. ๐Ÿ‘– | | Collected and purified water. ๐ŸŒŠ | | Construction structures and materials; auxiliary products to construction (excepts electric apparatus). ๐Ÿงฑ | | Construction work. ๐Ÿ—๏ธ | | Education and training services. ๐Ÿ‘ฉ๐Ÿฟโ€๐Ÿซ | | Electrical machinery, apparatus, equipment and consumables; Lighting. โšก | | Financial and insurance services. ๐Ÿ‘จโ€๐Ÿ’ผ | | Food, beverages, tobacco and related products. ๐Ÿฝ๏ธ | | Furniture (incl. office furniture), furnishings, domestic appliances (excl. lighting) and cleaning products. ๐Ÿ—„๏ธ | | Health and social work services. ๐Ÿ‘จ๐Ÿฝโ€โš•๏ธ | | Hotel, restaurant and retail trade services. ๐Ÿจ | | IT services: consulting, software development, Internet and support. ๐Ÿ–ฅ๏ธ | | Industrial machinery. ๐Ÿญ | | Installation services (except software). ๐Ÿ› ๏ธ | | Laboratory, optical and precision equipments (excl. glasses). ๐Ÿ”ฌ | | Leather and textile fabrics, plastic and rubber materials. ๐Ÿงต | | Machinery for mining, quarrying, construction equipment. โ›๏ธ | | Medical equipments, pharmaceuticals and personal care products. ๐Ÿ’‰ | | Mining, basic metals and related products. โš™๏ธ | | Musical instruments, sport goods, games, toys, handicraft, art materials and accessories. ๐ŸŽธ | | Office and computing machinery, equipment and supplies except furniture and software packages. ๐Ÿ–จ๏ธ | | Other community, social and personal services. ๐Ÿง‘๐Ÿฝโ€๐Ÿคโ€๐Ÿง‘๐Ÿฝ | | Petroleum products, fuel, electricity and other sources of energy. ๐Ÿ”‹ | | Postal and telecommunications services. ๐Ÿ“ถ | | Printed matter and related products. ๐Ÿ“ฐ | | Public utilities. โ›ฒ | | Radio, television, communication, telecommunication and related equipment. ๐Ÿ“ก | | Real estate services. ๐Ÿ  | | Recreational, cultural and sporting services. ๐Ÿšด | | Repair and maintenance services. ๐Ÿ”ง | | Research and development services and related consultancy services. ๐Ÿ‘ฉโ€๐Ÿ”ฌ | | Security, fire-fighting, police and defence equipment. ๐Ÿงฏ | | Services related to the oil and gas industry. โ›ฝ | | Sewage-, refuse-, cleaning-, and environmental services. ๐Ÿงน | | Software package and information systems. ๐Ÿ”ฃ | | Supporting and auxiliary transport services; travel agencies services. ๐Ÿšƒ | | Transport equipment and auxiliary products to transportation. ๐ŸšŒ | | Transport services (excl. Waste transport). ๐Ÿ’บ ## Intended uses & limitations Input description should be written in any of [the 104 languages](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages) that MBERT supports. ## Training and evaluation data The whole data consists of 744,360 rows. Shuffled and split into train and validation sets by using 80%/20% manner. Each description represents a unique contract notice description awarded between 2011 and 2018. Both training and validation data have contract notice descriptions written in 22 European Languages. (Malta and Irish are extracted due to scarcity compared to whole data) ## Training procedure The training procedure has been completed on Google Cloud V3-8 TPUs. Thanks [Google](https://sites.research.google/trc/about/) for giving the access to Cloud TPUs ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - num_epochs: 3 - gradient_accumulation_steps: 8 - batch_size_per_device: 4 - total_train_batch_size: 32 ### Training results | Epoch | Step | F1 Score| |:-----:|:------:|:------:| | 1 | 18,609 | 0.630 | | 2 | 37,218 | 0.674 | | 3 | 55,827 | 0.686 | | Language| F1 Score| Test Size| |:-----:|:-----:|:-----:| | PL| 0.759| 13950| | RO| 0.736| 3522| | SK| 0.719| 1122| | LT| 0.687| 2424| | HU| 0.681| 1879| | BG| 0.675| 2459| | CS| 0.668| 2694| | LV| 0.664| 836| | DE| 0.645| 35354| | FI| 0.644| 1898| | ES| 0.643| 7483| | PT| 0.631| 874| | EN| 0.631| 16615| | HR| 0.626| 865| | IT| 0.626| 8035| | NL| 0.624| 5640| | EL| 0.623| 1724| | SL| 0.615| 482| | SV| 0.607| 3326| | DA| 0.603| 1925| | FR| 0.601| 33113| | ET| 0.572| 458||