SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Sources

Model Labels

Label Examples
Latitude
  • 'Latitude: 48,87217700, 48,85543800, 48,87416100, 48,87322500, 48,87422500, 48,84189000, 48,86617200, 48,87112100, 48,86552200, 48,87623100, 48,85609000, 48,85642700, 48,86853300, 48,87465400, 48,86995000, 48,85654000, 48,87022000, 48,86962600, 48,85663200, 48,83476200'
  • 'lat: 40.7940823884086, 40.7948509408039, 40.7667178072558, 40.7697032606755, 40.797533370163, 40.7902561000937, 40.7693045133578, 40.7942883045566, 40.7729752391435, 40.7903128889029, 40.7762126854894, 40.7725908847499, 40.7931811701082, 40.7917367820255, 40.7829723919744, 40.7742879599026, 40.7823507678183, 40.7919669739962, 40.7702795904962, 40.7698124821507'
  • 'lat: 83.92115933668057, 89.53277415300325, 85.37696959908148, 85.44622332365381, 84.28538158324413, 87.96664079539569, 86.11414393337242, 85.43864590316868, 87.65474214915454, 81.67725407101064, 90.47817498708324, 89.87993043195812, 81.56791356025577, 88.48808747114165, 89.3843538611984, 87.5218603199103, 83.99238693700401, 82.50195719071465, 85.84865551792468, 87.92121711225418'
Categorical
  • 'SUSPECT_RACE_DESCRIPTION: (null), WHITE, BLACK HISPANIC, BLACK, WHITE HISPANIC, ASIAN/PAC.ISL, AMER IND, MALE'
  • 'OFFICER_IN_UNIFORM_FLAG: Y, N, ('
  • 'SUSPECT_HAIR_COLOR: BLK, BRO, BLD, XXX, (null), GRY, WHI, BLN, RED, ZZZ, PLE, GRN, SDY, ORG, BK, BA, BR, XX'
Day of Month
  • 'Date.Day: 26, 24, 31, 7, 14, 21, 28, 5, 12, 19, 2, 9, 16, 23, 30, 4, 11, 18, 25, 1'
  • 'Incident.Date.Day: 2, 3, 4, 5, 6, 7, 8, 9, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23'
  • 'bibliography.publication.day: 1, 17, 16, 20, 29, 10, 14, 11, 9, 18, 19, 22, 25, 15, 6, 28, 27, 2, 12, 21'
Year
  • 'Year: 2020, 2019, 2018, 2017, 2016, 2015, 2014'
  • 'YEAR2: 2017'
  • 'artist.birth.year: 1930, 1852, 1898, 1760, 1935, 1964, 1967, 1940, 1947, 1938, 1728, 1868, 1927, 1917, 1878, 1895, 1904, 1912, 1899, 1767'
Integer
  • 'unfavorable: 51.0, 55.0, 47.0, 60.0, 61.0, 56.0, 58.0, 57.0, 59.0, 61.9, 62.0, 54.0, 52.0, 53.0, 66.0, 67.0, 63.0, 49.0, 52.42, 56.9'
  • 'Data.Totals.Violent.Rape: 281, 252, 218, 192, 397, 367, 341, 371, 396, 494, 637, 661, 660, 751, 811, 738, 794, 929, 954, 1037'
  • 'AVG: 93429, 78009, 76358, 76258, 75606, 73515, 72885, 71625, 71624, 70534, 69914, 69879, 69671, 68950, 68857, 68846, 68833, 68387, 68099, 67431'
Floating Point Number
  • 'dimensions.width: 0.0, 305.0, 250.0, 756.0, 2095.0, 480.0, 858.0, 558.0, 628.0, 302.0, 1226.0, 1270.0, 940.0, 2276.0, 864.0, 1420.0, 330.0, 267.0, 1225.0, 660.0'
  • 'Data.Fiber: 0.0, 0.2, 0.3, 0.4, 0.7, 0.1, 1.0, 0.6, 0.5, 1.9, 1.1, 2.3, 0.8, 1.6, 0.9, 1.2, 37.0, 4.5, 9.1, 1.5'
  • ' "Weight(Pounds)": 112.9925, 136.4873, 153.0269, 142.3354, 144.2971, 123.3024, 141.4947, 136.4623, 112.3723, 120.6672, 127.4516, 114.143, 125.6107, 122.4618, 116.0866, 139.9975, 129.5023, 142.9733, 137.9025, 124.0449'
Percentage
  • 'pct: 51.0, 48.0, 44.2, 49.0, 48.4, 49.2, 1.4, 47.0, 48.2, 1.6, 50.0, 42.0, 1.0, 40.0, 53.0, 43.0, 46.0, 52.0, 45.0, 47.3'
  • 'pct: 51.6, 41.4, 45.7, 46.8, 5.2, 46.0, 2.0, 48.0, 47.0, 44.0, 4.0, 5.0, 53.0, 55.5, 32.2, 54.7, 40.5, 54.3, 43.7, 45.0'
  • 'PCT.2: 95.5, 96.5, 94.0, 99.4, 97.6, 100.9, 101.0, 101.1, 96.9, 98.0, 97.9, 98.1, 94.8, 100.7, 99.3, 97.1, 98.9, 98.7, 96.1, 99.7'
Secondary Address
  • 'STOP_LOCATION_APARTMENT: (null), 2, 7, 4TH, 2FL, ROOF, ROOF T, BASEME, LOBBY, 17TH, 2 FLOO, 12, 1701, HALLWA, 1E, 5D, SIDEWA, FRONT, 12C, None'
U.S. State Abbreviation
  • 'abbrev: AL, AK, AZ, AR, CA, CO, CT, DE, DC, FL, GA, HI, ID, IL, IN, IA, KS, KY, LA, ME'
  • 'recipient_st: AK, AL, AR, AZ, CA, CO, CT, DC, FL, GA, HI, IA, ID, IL, IN, KA, KS, KY, LA, MA'
  • 'Incident.Location.State: WA, OR, KS, CA, CO, OK, AZ, IA, PA, TX, OH, LA, MT, UT, AR, IL, NV, NM, MN, MO'
Numeric identifier
  • 'pollster_id: 568, 1189, 1508, 1302, 1597, 396, 458, 1699, 1361, 169, 1075, 1406, 241, 1523, 399, 1351, 1528, 1365, 1347, 57'
  • 'SUPERVISING_OFFICER_COMMAND_CODE: 574, 863, 1, 861, 5, 6, 234, 849, 136, 7, 804, 750, 868, 9, 108, 13, 10, 181, 598, 230'
  • 'pollster_rating_id: 245, 609, 48, 437, 88, 599, 600, 263, 280, 314, 124, 357, 667, 317, 494, 325, 522, 556, 593, 216'
Month Number
  • 'mp_month: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12'
  • 'bibliography.publication.month: 6, 11, 3, 8, 1, 10, 7, 2, 4, 5, 9, 12'
  • 'Incident.Date.Month: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12'
Date
  • 'Date.Full: 8/26/1990, 3/24/1991, 3/31/1991, 4/7/1991, 4/14/1991, 4/21/1991, 4/28/1991, 5/5/1991, 5/12/1991, 5/19/1991, 5/26/1991, 6/2/1991, 6/9/1991, 6/16/1991, 6/23/1991, 6/30/1991, 7/7/1991, 7/14/1991, 7/21/1991, 7/28/1991'
  • 'STOP_FRISK_DATE: 1/16/2017, 2/8/2017, 2/20/2017, 2/21/2017, 2/17/2017, 2/25/2017, 3/3/2017, 3/16/2017, 3/31/2017, 4/2/2017, 4/4/2017, 3/24/2017, 4/6/2017, 4/18/2017, 5/6/2017, 5/10/2017, 5/17/2017, 5/7/2017, 5/24/2017, 6/8/2017'
  • 'disb_dt: 15-Sep-15, 16-Nov-15, 30-Sep-15, 18-Dec-15, 22-Oct-15, 3-Dec-15, 23-Nov-15, 29-Feb-16, 18-Mar-16, 27-Feb-16, 17-Feb-16, 25-Feb-16, 25-Jan-16, 14-Jan-16, 12-Jan-16, 22-Jan-16, 1-Jan-16, 3-Jan-16, 6-Jan-16, 11-Jan-16'
Full Name
  • 'candidate_name: Abigail A. Spanberger, Nicholas J. Freitas, Kara Eastman, Don Bacon, Tyler Schaeffer, Jill Schupp, Ann Wagner, Martin Schulte, Dana Balter, John Katko, Steve Williams, Christina Hale, Victoria Spartz, Kenneth Tucker, Joyce Ann Elliott, French Hill, Jared Forrest Golden, Dale John Crafts, Marie Newman, Mike Fricilone'
  • 'bibliography.author.name: Austen, Jane, Gilman, Charlotte Perkins, Carroll, Lewis, Shelley, Mary Wollstonecraft, Kafka, Franz, Twain, Mark, Wilde, Oscar, Douglass, Frederick, Ibsen, Henrik, Melville, Herman, Doyle, Arthur Conan, Dickens, Charles, Joyce, James, Swift, Jonathan, Stoker, Bram, Machiavelli, Niccolo, Tolstoy, Leo, graf, Grimm, Wilhelm, Vatsyayana, Unknown'
  • 'Name: Cristiano Ronaldo, L. Messi, Neymar, L. Suárez, M. Neuer, R. Lewandowski, De Gea, E. Hazard, T. Kroos, G. Higuaín, Sergio Ramos, K. De Bruyne, T. Courtois, A. Sánchez, L. Modrić, G. Bale, S. Agüero, G. Chiellini, G. Buffon, P. Dybala'
Day of Week
  • 'DAY2: Monday, Wednesday, Tuesday, Friday, Saturday, Thursday, Sunday'
  • 'DAY2: Monday, Wednesday, Tuesday, Friday, Saturday, Thursday, Sunday'
  • 'day: Sun, Sat, Thur, Fri'
Timestamp
  • 'created_at: 12/30/20 12:29, 11/2/20 21:26, 11/2/20 22:16, 11/2/20 21:32, 11/2/20 22:01, 11/2/20 22:18, 11/2/20 22:26, 11/2/20 23:31, 11/2/20 21:49, 10/31/20 17:22, 11/1/20 14:39, 11/2/20 08:22, 10/29/20 14:16, 10/31/20 08:36, 10/29/20 11:08, 10/29/20 09:00, 10/29/20 16:13, 10/29/20 16:14, 10/30/20 15:45, 10/28/20 09:24'
  • 'created_at: 12/21/22 09:28, 12/21/22 12:52, 12/16/22 18:27, 12/16/22 21:10, 12/14/22 10:39, 12/14/22 08:22, 12/15/22 18:31, 12/14/22 14:13, 12/13/22 09:36, 12/14/22 08:23, 12/14/22 15:40, 12/15/22 09:40, 12/7/22 10:47, 12/7/22 08:17, 12/7/22 17:56, 12/15/22 09:50, 11/30/22 09:25, 11/23/22 08:46, 12/1/22 09:39, 12/5/22 08:29'
  • 'created_at: 12/21/22 09:28, 12/21/22 12:52, 12/16/22 18:27, 12/16/22 21:10, 12/14/22 10:39, 12/14/22 08:22, 12/15/22 18:31, 12/14/22 14:13, 12/13/22 09:36, 12/14/22 08:23, 12/14/22 15:40, 12/15/22 09:40, 12/7/22 10:47, 12/7/22 08:17, 12/7/22 17:56, 12/15/22 09:50, 11/30/22 09:25, 11/23/22 08:46, 12/1/22 09:39, 12/5/22 08:29'
URL
Street Address
  • 'STOP_LOCATION_FULL_ADDRESS: 180 GREENWICH STREET, WALL STREET && BROADWAY, 75 GREENE STREET, 429 WEST BROADWAY, WEST STREET && CHAMBERS STREET, CHAMBERS STREET && WEST BROADWAY, CORTLANDT STREET && CHURCH STREET, 111 FULTON STREET, 25 CLIFF STREET, SPRING STREET && AVENUE OF THE AMERICAS, 130 CEDAR STREET, 225 LIBERTY STREET, BARCLAY STREET && WEST STREET, 153 GREENWICH STREET, BATTERY PLACE && STATE STREET, MERCER STREET && BROOME STREET, WEST STREET && CANAL STREET, BROADWAY && PRINCE STREET, WEST BROADWAY && AVENUE OF THE AMERICAS, 3 SOUTH STREET'
  • 'STOP_LOCATION_FULL_ADDRESS: 180 GREENWICH STREET, WALL STREET && BROADWAY, 75 GREENE STREET, 429 WEST BROADWAY, WEST STREET && CHAMBERS STREET, CHAMBERS STREET && WEST BROADWAY, CORTLANDT STREET && CHURCH STREET, 111 FULTON STREET, 25 CLIFF STREET, SPRING STREET && AVENUE OF THE AMERICAS, 130 CEDAR STREET, 225 LIBERTY STREET, BARCLAY STREET && WEST STREET, 153 GREENWICH STREET, BATTERY PLACE && STATE STREET, MERCER STREET && BROOME STREET, WEST STREET && CANAL STREET, BROADWAY && PRINCE STREET, WEST BROADWAY && AVENUE OF THE AMERICAS, 3 SOUTH STREET'
  • 'STOP_LOCATION_FULL_ADDRESS: 180 GREENWICH STREET, WALL STREET && BROADWAY, 75 GREENE STREET, 429 WEST BROADWAY, WEST STREET && CHAMBERS STREET, CHAMBERS STREET && WEST BROADWAY, CORTLANDT STREET && CHURCH STREET, 111 FULTON STREET, 25 CLIFF STREET, SPRING STREET && AVENUE OF THE AMERICAS, 130 CEDAR STREET, 225 LIBERTY STREET, BARCLAY STREET && WEST STREET, 153 GREENWICH STREET, BATTERY PLACE && STATE STREET, MERCER STREET && BROOME STREET, WEST STREET && CANAL STREET, BROADWAY && PRINCE STREET, WEST BROADWAY && AVENUE OF THE AMERICAS, 3 SOUTH STREET'
Country ISO Code
  • 'Champion Nationality: AUS, FRA, GBR, NZL, USA, SRB, SUI, SWE, CZE, ESP, GER, NED, CRO, BRA, RUS'
  • 'Runner-up Nationality: AUS, GBR, NZL, FRA, USA, RSA, CZE, ARG, GER, SUI, ESP, CRO, ROM, DEN, TCH, URS, CZ, SRB, CND, SWE'
Partial timestamp
  • 'created_at: 12/17/20 21:39, 6/14/21 15:36, 11/2/20 09:02, 11/2/20 12:49, 11/2/20 19:02, 11/2/20 14:04, 11/2/20 17:37, 11/2/20 18:39, 11/2/20 18:40, 11/4/20 09:17, 11/4/20 10:29, 11/4/20 10:32, 11/4/20 10:38, 11/4/20 10:39, 11/28/20 21:14, 11/2/20 21:25, 11/2/20 21:32, 11/2/20 22:12, 11/2/20 23:30, 11/2/20 23:33'
  • 'bibliography.publication.full: June, 1998, November, 1999, March, 1994, June 17, 2008, August 16, 2005, August 20, 2006, August 29, 2006, January 10, 2006, March, 2001, June, 2001, October 14, 1892, July, 1998, July, 2003, January, 1994, October 1997, August 16, 2013, February 11, 2006, June 9, 2008, January 1, 1870, April, 2001'
  • 'Rating.Experience: Below, Same, None, Above'
Longitude
  • 'Longitude: 2,77228900, 2,77461100, 2,77370600, 2,77423900, 2,77654400, 2,79937600, 2,78064700, 2,77697400, 2,78928200, 2,78032200, 2,77731200, 2,77121300, 2,77167600, 2,78236500, 2,76694300, 2,77139500, 2,76872200, 2,76741500, 2,77156700, 2,82065100'
  • 'Longitude: 6.85, 2.97, 2.53, -4.02, 10.87, 11.93, 12.7, 14.139, 14.426, 13.897, 14.83, 15.213, 15.064, 14.933, 14.962, 14.999, 12.02, 14.399, 23.336, 24.439'
  • 'long: 40.65531753386127, 35.52146509142811, 41.04610174058556, 37.25718863973695, 37.73038191275334, 38.78755702518432, 36.31538469187874, 38.3542649521305, 40.33741738725765, 36.831052736369664, 37.39711396680899, 38.28297641253209, 40.25037415629944, 39.12501528359793, 40.179108531876246, 38.165405118101205, 40.28234452941448, 37.1590112746327, 40.08056518798263, 38.45329795732872'
Country Name
  • 'Geography: United States'
  • 'location.citizenship: United States, Mexico, Switzerland, Spain, Hong Kong, Taiwan, Germany, Saudi Arabia, Japan, Sweden, France, Canada, Philippines, Indonesia, South Korea, Malaysia, Italy, Singapore, Nigeria, Brazil'
  • 'Nation: Afghanistan, Albania, Algeria, Andorra, Angola, Antigua and Barbuda, Argentina, Armenia, Australia, Austria, Azerbaijan, Bahamas, The, Bahrain, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bhutan'
Boolean
  • 'nationwide_batch: False'
  • 'chasing: False, True'
  • 'ID Workforce Status: True'
Short text
  • 'data.title: Backs, Illustration to Judith Shakespeare, Tri-Boro Barber Shop, Portrait of the Engraver Francesco Bartolozzi, Lady in Niche, Assassinations, Zebe, TRANSFERENCE ZONE, Five Sunsets in One Hour, Cartmel Fell, Composition: River in a Gorge, Winters Sleep, Christs Cross and Adams Tree, Space Construction with a Spiral, Toy Sailing Boats, the Round Pond, The Mutilated, The Cruise, Cut Bottle Relief, The Struggle, Figures in a Garden'
  • 'specific_location: None, on tree stump, on tree roots, under a tree, in b/w trees, tree, Branch, bush, hiding in bushes, On a rock, trash can, Under bench, near 65th St arch, in the European Beech, on tree knob, Behind fence, tree, tree near large rock on Bridle Path, Climbing tree, "FIELD", bottom of tree'
  • "Facility.Name: Southeast Alabama Medical Center, Marshall Medical Center South, Eliza Coffee Memorial Hospital, Mizell Memorial Hospital, Crenshaw Community Hospital, St Vincent's East, Dekalb Regional Medical Center, Shelby Baptist Medical Center, Callahan Eye Hospital, Helen Keller Memorial Hospital, Dale Medical Center, Floyd Cherokee Medical Center, Baptist Medical Center South, Jackson Hospital & Clinic Inc, East Alabama Medical Center, Tanner Medical Center-east Alabama, University Of Alabama Hospital, Community Hospital Inc, Cullman Regional Medical Center, Andalusia Health"
Slug
  • 'Slug County: baldwin-county-al, calhoun-county-al, coffee-county-al, colbert-county-al, covington-county-al, cullman-county-al, dale-county-al, dallas-county-al, etowah-county-al, jackson-county-al, jefferson-county-al, lee-county-al, limestone-county-al, madison-county-al, marshall-county-al, mobile-county-al, montgomery-county-al, perry-county-al, pike-county-al, randolph-county-al'
  • 'Slug Geography: united-states'
  • 'Slug Detailed Occupation: physicians, physicians-surgeons, lawyers-judges-magistrates-other-judicial-workers, medical-health-services-managers, chief-executives-legislators, veterinarians, social-community-service-managers, securities-commodities-financial-services-sales-agents, petroleum-mining-geological-engineers-including-mining-safety-engineers, economists, miscellaneous-social-scientists-including-survey-researchers-sociologists, natural-sciences-managers, geoscientists-and-hydrologists-except-geographers, detectives-criminal-investigators, judicial-law-clerks, other-psychologists, architectural-engineering-managers, education-administrators, astronomers-physicists, public-relations-and-fundraising-managers'
Postal Code
  • 'Code postal: 77700.0, nan'
Structured field
  • 'SUSPECT_HEIGHT: 5.8, 6.2, 5.1, 5, 5.11, 5.5, 5.4, 5.7, 6.1, 6, 6.3, 5.6, 5.9, 6.4, 5.2, 6.5, 5.3, 4.11, , 2.2'
  • 'SUSPECT_HEIGHT: 5.8, 6.2, 5.1, 5, 5.11, 5.5, 5.4, 5.7, 6.1, 6, 6.3, 5.6, 5.9, 6.4, 5.2, 6.5, 5.3, 4.11, , 2.2'
  • 'Score: 6-3, 7-5, 6-2, 6-3, 6-4, 6-2, 6-4, 6-2, 6-2, 6-4, 6-4, 7-5, 6-1, 3-6, 6-1, 3-6, 6-4, 6-3, 6-3, 4-6, 8-6, 8-6, 6-1, 6-3, 8-6, 6-2, 6-4, 6-3, 6-2, 4-6, 3-6, 6-4, 6-8, 1-6, 6-2, 6-2, 6-2, 4-6, 7-5, 6-4, 6-4, 1-6, 6-3, 3-6, 6-4, 6-4, 6-4, 6-3, 3-6, 6-0, 6-1, 7-5, 8-6, 6-4, 4-6, 6-2, 6-3, 7-5, 6-3, 6-0, 6-2, 6-8, 5-7, 8-6, 6-3, 10-8, 6-2, 8-6, 3-6, 6-1, 6-3, 6-2, 6-3, 7-5, 6-0'
Alphanumeric identifier
  • 'ID Geography: 04000US04, 04000US06, 04000US32, 04000US41'
  • 'ID County: 05000US01003, 05000US01015, 05000US01031, 05000US01033, 05000US01039, 05000US01043, 05000US01045, 05000US01047, 05000US01055, 05000US01071, 05000US01073, 05000US01081, 05000US01083, 05000US01089, 05000US01095, 05000US01097, 05000US01101, 05000US01105, 05000US01109, 05000US01111'
  • 'ID Geography: 01000US, 04000US04, 04000US06, 04000US32, 04000US41, 31000US31080, 31000US40140, 31000US41740, 31000US41860'
Color
  • 'color: Yellow, Black, White'
  • 'primary_fur_color: None, Gray, Cinnamon, Black'
  • 'highlight_fur_color: None, Cinnamon, White, Gray, Cinnamon, White, Gray, White, Black, Cinnamon, White, Black, Black, White, Black, Cinnamon, Gray, Black'
Month Name
  • 'MONTH2: January, February, March, April, May, June, July, August, September, October, November, December'
  • 'bibliography.publication.month name: June, November, March, August, January, October, July, February, April, May, September, December'
  • 'MONTH2: January, February, March, April, May, June, July, August, September, October, November, December'
Currency Code
  • 'cur_name: AFN, DZD, AOA, ARS, AMD, AZN, BDT, INR, BYR, XOF, BTN, BOB, BIF, KHR, XAF, CVE, CNY, COP, USD, CDF'
Time
  • 'STOP_FRISK_TIME: 14:26:00, 11:10:00, 11:35:00, 13:20:00, 21:25:00, 20:00:00, 19:58:00, 13:15:00, 8:16:00, 18:44:00, 22:30:00, 4:45:00, 18:30:00, 0:00:00, 9:58:00, 11:15:00, 13:00:00, 8:00:00, 14:57:00, 4:15:00'
  • 'STOP_FRISK_TIME: 14:26:00, 11:10:00, 11:35:00, 13:20:00, 21:25:00, 20:00:00, 19:58:00, 13:15:00, 8:16:00, 18:44:00, 22:30:00, 4:45:00, 18:30:00, 0:00:00, 9:58:00, 11:15:00, 13:00:00, 8:00:00, 14:57:00, 4:15:00'
  • 'STOP_FRISK_TIME: 14:26:00, 11:10:00, 11:35:00, 13:20:00, 21:25:00, 20:00:00, 19:58:00, 13:15:00, 8:16:00, 18:44:00, 22:30:00, 4:45:00, 18:30:00, 0:00:00, 9:58:00, 11:15:00, 13:00:00, 8:00:00, 14:57:00, 4:15:00'
Last Name
  • 'candidat: Bush, Perot, Clinton'
  • 'answer: Spanberger, Freitas, Eastman, Bacon, Schaeffer, Schupp, Wagner, Schulte, Balter, Katko, Williams, Hale, Spartz, Tucker, Elliott, Hill, Golden, Crafts, Newman, Fricilone'
U.S. State
  • 'Slug Geography: california'
  • 'state_name: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, District of Columbia, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine'
  • 'state: Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Hawaii, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland'
Street Name
  • 'STOP_LOCATION_STREET_NAME: GREENWICH STREET, WALL STREET, GREENE STREET, WEST BROADWAY, WEST STREET, CHAMBERS STREET, CORTLANDT STREET, FULTON STREET, CLIFF STREET, SPRING STREET, CEDAR STREET, LIBERTY STREET, BARCLAY STREET, BATTERY PLACE, MERCER STREET, BROADWAY, SOUTH STREET, THOMPSON STREET, JAY STREET, CHURCH STREET'
  • 'STOP_LOCATION_STREET_NAME: GREENWICH STREET, WALL STREET, GREENE STREET, WEST BROADWAY, WEST STREET, CHAMBERS STREET, CORTLANDT STREET, FULTON STREET, CLIFF STREET, SPRING STREET, CEDAR STREET, LIBERTY STREET, BARCLAY STREET, BATTERY PLACE, MERCER STREET, BROADWAY, SOUTH STREET, THOMPSON STREET, JAY STREET, CHURCH STREET'
  • 'STOP_LOCATION_STREET_NAME: GREENWICH STREET, WALL STREET, GREENE STREET, WEST BROADWAY, WEST STREET, CHAMBERS STREET, CORTLANDT STREET, FULTON STREET, CLIFF STREET, SPRING STREET, CEDAR STREET, LIBERTY STREET, BARCLAY STREET, BATTERY PLACE, MERCER STREET, BROADWAY, SOUTH STREET, THOMPSON STREET, JAY STREET, CHURCH STREET'
AM/PM
  • 'shift: PM, AM'
Occupation
  • 'Detailed Occupation: Physicians, Physicians & surgeons, Lawyers, & judges, magistrates, & other judicial workers, Medical & health services managers, Chief executives & legislators, Veterinarians, Social & community service managers, Securities, commodities, & financial services sales agents, Petroleum, mining & geological engineers, including mining safety engineers, Economists, Miscellaneous social scientists, including survey researchers & sociologists, Natural sciences managers, Geoscientists and hydrologists, except geographers, Detectives & criminal investigators, Judicial law clerks, Other psychologists, Architectural & engineering managers, Education administrators, Astronomers & physicists, Public relations and fundraising managers'
  • 'occupation: Operatives, Craftsmen, Sales, Other, Managers/admin, Professional/technical, Clerical/unskilled, Laborers, Transport, Service, nan, Household workers, Farm laborers, Farmers'
  • 'Detailed Occupation: Other managers, Cashiers, Retail salespersons, Driver/sales workers & truck drivers, Registered nurses'
Zip Code
  • 'recipient_zip: 995084442, 99503, 995163436, 352124572, 35216, 35976, 358021277, 352174710, 35203, 35233, 35805, 72716, 72201, 72035, 72015, 72223, 72019, 72113, 72758, 72227'
  • 'STOP_LOCATION_ZIP_CODE: (null), 20292, AVENUE, 5 AVEN, 10019, 22768, 10035, 10026, 10128, 24231, 10030, 10039, 23874, 11213, 11233, 100652, 10451, 23543, 100745, PROSPE'
  • 'zip_codes: nan, 12081.0, 10090.0, 12423.0, 12420.0'
Company Name
  • "company.name: Microsoft, Berkshire Hathaway, Telmex, F. Hoffmann-La Roche, Zara, Henderson Land Development, Oracle, Lin Yuan Group, Aldi, Sun Hung Kai Properties, Kingdom Holding Company, Koch industries, Cheung king, Walmart, Seibu Corporation, Las Vegas Sands, Aldi Nord, Tetra Pak, BMW, L'Oreal"
First Name
  • 'Top Name: Mary, Linda, Debra, Lisa, Michelle, Jennifer, Jessica, Samantha, Ashley, Hannah, Emily, Madison, Emma, Isabella, Sophia, Olivia, John, Robert, James, David'
Very short text
  • 'above_ground_sighter_measurement: None, FALSE, 4, 3, 30, 10, 6, 24, 8, 25, 5, 50, 70, 12, 2, 20, 7, 13, 15, 28'
  • 'review_reason_code: 2, 1, 4, None, 5, 3, 7, 3?, 8, D, ?, 3, 1, 1 or 2, D or 1, 7B, 1, 2, 1 OR 2, D OR 2, B, 4?'
  • 'status: N, Y, REMOVE, None, 1, ?, H, R, M, T'
License Plate
  • 'plate: AZIZ714, BATBOX1, BBOMBS, BEACHY1, BLK PWR5, BOT TAK, CHERIPI, CIO FTW, DAVES88, DMOBGFY, DOITFKR, EGGPUTT, F DIABDZ, FJ 666, FKK OFF, FKN BLAK, FLT ATCK, F LUPUS, HVNNHEL, H8DES'
URI
City Name
  • 'Incident.Location.City: Shelton, Aloha, Wichita, San Francisco, Evans, Guthrie, Chandler, Assaria, Burlington, Knoxville, Stockton, Freeport, Columbus, Des Moines, New Orleans, Huntley, Salt Lake City, Strong, Syracuse, England'

Evaluation

Metrics

Label Accuracy
all 0.6705

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("quantisan/bge-small-en-v1.5-93dataset")
# Run inference
preds = model("variety: Western, Eastern")

Training Details

Training Set Metrics

Training set Min Median Max
Word count 2 24.0542 111
Label Training Sample Count
Categorical 8
Timestamp 5
Date 8
Integer 8
Partial timestamp 4
Short text 8
Very short text 3
AM/PM 1
Boolean 8
City Name 1
Color 3
Company Name 1
Country ISO Code 2
Country Name 8
Currency Code 1
Day of Month 4
Day of Week 4
First Name 1
Floating Point Number 8
Full Name 8
Last Name 2
Latitude 4
License Plate 1
Longitude 4
Month Name 6
Month Number 4
Occupation 3
Postal Code 1
Secondary Address 1
Slug 8
Street Address 3
Street Name 3
Time 3
U.S. State 8
U.S. State Abbreviation 6
URI 1
URL 8
Year 8
Zip Code 4

Training Hyperparameters

  • batch_size: (8, 8)
  • num_epochs: (4, 4)
  • max_steps: -1
  • sampling_strategy: oversampling
  • body_learning_rate: (2e-05, 1e-05)
  • head_learning_rate: 0.01
  • loss: CosineSimilarityLoss
  • distance_metric: cosine_distance
  • margin: 0.25
  • end_to_end: False
  • use_amp: False
  • warmup_proportion: 0.1
  • l2_weight: 0.01
  • seed: 42
  • eval_max_steps: -1
  • load_best_model_at_end: True

Training Results

Epoch Step Training Loss Validation Loss
0.0002 1 0.1131 -
0.0100 50 0.2113 -
0.0200 100 0.1906 -
0.0301 150 0.1843 -
0.0401 200 0.161 -
0.0501 250 0.1418 -
0.0601 300 0.131 -
0.0701 350 0.1224 -
0.0802 400 0.114 -
0.0902 450 0.1039 -
0.1002 500 0.0825 -
0.1102 550 0.0838 -
0.1202 600 0.0745 -
0.1303 650 0.0705 -
0.1403 700 0.0586 -
0.1503 750 0.0552 -
0.1603 800 0.0567 -
0.1703 850 0.0553 -
0.1804 900 0.0456 -
0.1904 950 0.0443 -
0.2004 1000 0.0422 -
0.2104 1050 0.0363 -
0.2204 1100 0.0397 -
0.2305 1150 0.0361 -
0.2405 1200 0.0269 -
0.2505 1250 0.0331 -
0.2605 1300 0.0278 -
0.2705 1350 0.0297 -
0.2806 1400 0.022 -
0.2906 1450 0.0265 -
0.3006 1500 0.0344 -
0.3106 1550 0.0218 -
0.3206 1600 0.0275 -
0.3307 1650 0.0275 -
0.3407 1700 0.0207 -
0.3507 1750 0.0156 -
0.3607 1800 0.0246 -
0.3707 1850 0.0154 -
0.3808 1900 0.0117 -
0.3908 1950 0.0201 -
0.4008 2000 0.0153 -
0.4108 2050 0.018 -
0.4208 2100 0.017 -
0.4309 2150 0.011 -
0.4409 2200 0.0158 -
0.4509 2250 0.015 -
0.4609 2300 0.0109 -
0.4709 2350 0.0151 -
0.4810 2400 0.0085 -
0.4910 2450 0.0121 -
0.5010 2500 0.0118 -
0.5110 2550 0.0083 -
0.5210 2600 0.0094 -
0.5311 2650 0.0078 -
0.5411 2700 0.0123 -
0.5511 2750 0.0085 -
0.5611 2800 0.0046 -
0.5711 2850 0.0081 -
0.5812 2900 0.0085 -
0.5912 2950 0.0064 -
0.6012 3000 0.0113 -
0.6112 3050 0.0087 -
0.6212 3100 0.0071 -
0.6313 3150 0.01 -
0.6413 3200 0.0093 -
0.6513 3250 0.0056 -
0.6613 3300 0.007 -
0.6713 3350 0.0076 -
0.6814 3400 0.0077 -
0.6914 3450 0.0038 -
0.7014 3500 0.0051 -
0.7114 3550 0.0063 -
0.7214 3600 0.004 -
0.7315 3650 0.0036 -
0.7415 3700 0.0043 -
0.7515 3750 0.0086 -
0.7615 3800 0.0051 -
0.7715 3850 0.0056 -
0.7816 3900 0.0042 -
0.7916 3950 0.0062 -
0.8016 4000 0.0058 -
0.8116 4050 0.0034 -
0.8216 4100 0.0062 -
0.8317 4150 0.0091 -
0.8417 4200 0.0056 -
0.8517 4250 0.0039 -
0.8617 4300 0.0072 -
0.8717 4350 0.0051 -
0.8818 4400 0.0025 -
0.8918 4450 0.0051 -
0.9018 4500 0.0049 -
0.9118 4550 0.0024 -
0.9218 4600 0.0026 -
0.9319 4650 0.0046 -
0.9419 4700 0.0024 -
0.9519 4750 0.0026 -
0.9619 4800 0.0045 -
0.9719 4850 0.0022 -
0.9820 4900 0.0042 -
0.9920 4950 0.0067 -
1.0 4990 - 0.0996
1.0020 5000 0.0044 -
1.0120 5050 0.0023 -
1.0220 5100 0.0025 -
1.0321 5150 0.004 -
1.0421 5200 0.002 -
1.0521 5250 0.0042 -
1.0621 5300 0.0028 -
1.0721 5350 0.006 -
1.0822 5400 0.0043 -
1.0922 5450 0.0065 -
1.1022 5500 0.0042 -
1.1122 5550 0.004 -
1.1222 5600 0.0045 -
1.1323 5650 0.0049 -
1.1423 5700 0.0042 -
1.1523 5750 0.0044 -
1.1623 5800 0.002 -
1.1723 5850 0.0037 -
1.1824 5900 0.0038 -
1.1924 5950 0.0071 -
1.2024 6000 0.0044 -
1.2124 6050 0.0031 -
1.2224 6100 0.0021 -
1.2325 6150 0.0019 -
1.2425 6200 0.002 -
1.2525 6250 0.0059 -
1.2625 6300 0.002 -
1.2725 6350 0.0036 -
1.2826 6400 0.0019 -
1.2926 6450 0.0041 -
1.3026 6500 0.0042 -
1.3126 6550 0.0062 -
1.3226 6600 0.002 -
1.3327 6650 0.0016 -
1.3427 6700 0.0019 -
1.3527 6750 0.0055 -
1.3627 6800 0.0042 -
1.3727 6850 0.0023 -
1.3828 6900 0.0018 -
1.3928 6950 0.0041 -
1.4028 7000 0.008 -
1.4128 7050 0.0021 -
1.4228 7100 0.0017 -
1.4329 7150 0.0021 -
1.4429 7200 0.0017 -
1.4529 7250 0.0035 -
1.4629 7300 0.002 -
1.4729 7350 0.0016 -
1.4830 7400 0.0014 -
1.4930 7450 0.0041 -
1.5030 7500 0.0053 -
1.5130 7550 0.0026 -
1.5230 7600 0.002 -
1.5331 7650 0.0017 -
1.5431 7700 0.0017 -
1.5531 7750 0.0016 -
1.5631 7800 0.0021 -
1.5731 7850 0.0039 -
1.5832 7900 0.0034 -
1.5932 7950 0.0061 -
1.6032 8000 0.0025 -
1.6132 8050 0.002 -
1.6232 8100 0.0017 -
1.6333 8150 0.0016 -
1.6433 8200 0.0015 -
1.6533 8250 0.0037 -
1.6633 8300 0.0015 -
1.6733 8350 0.0035 -
1.6834 8400 0.0023 -
1.6934 8450 0.0051 -
1.7034 8500 0.0041 -
1.7134 8550 0.0018 -
1.7234 8600 0.0016 -
1.7335 8650 0.0016 -
1.7435 8700 0.0013 -
1.7535 8750 0.0022 -
1.7635 8800 0.0015 -
1.7735 8850 0.0017 -
1.7836 8900 0.0035 -
1.7936 8950 0.0013 -
1.8036 9000 0.0015 -
1.8136 9050 0.0034 -
1.8236 9100 0.0013 -
1.8337 9150 0.0037 -
1.8437 9200 0.0037 -
1.8537 9250 0.0014 -
1.8637 9300 0.0014 -
1.8737 9350 0.0013 -
1.8838 9400 0.0016 -
1.8938 9450 0.0013 -
1.9038 9500 0.0038 -
1.9138 9550 0.0013 -
1.9238 9600 0.0038 -
1.9339 9650 0.0013 -
1.9439 9700 0.0012 -
1.9539 9750 0.0016 -
1.9639 9800 0.0053 -
1.9739 9850 0.0018 -
1.9840 9900 0.0036 -
1.9940 9950 0.0014 -
2.0 9980 - 0.1052
2.0040 10000 0.0012 -
2.0140 10050 0.0014 -
2.0240 10100 0.0013 -
2.0341 10150 0.0014 -
2.0441 10200 0.0012 -
2.0541 10250 0.0014 -
2.0641 10300 0.0013 -
2.0741 10350 0.0012 -
2.0842 10400 0.0013 -
2.0942 10450 0.0033 -
2.1042 10500 0.0013 -
2.1142 10550 0.003 -
2.1242 10600 0.0036 -
2.1343 10650 0.0013 -
2.1443 10700 0.0036 -
2.1543 10750 0.0037 -
2.1643 10800 0.0024 -
2.1743 10850 0.0038 -
2.1844 10900 0.0014 -
2.1944 10950 0.0012 -
2.2044 11000 0.0035 -
2.2144 11050 0.0015 -
2.2244 11100 0.0012 -
2.2345 11150 0.0012 -
2.2445 11200 0.0011 -
2.2545 11250 0.0035 -
2.2645 11300 0.0012 -
2.2745 11350 0.0011 -
2.2846 11400 0.0011 -
2.2946 11450 0.0011 -
2.3046 11500 0.0035 -
2.3146 11550 0.0012 -
2.3246 11600 0.0011 -
2.3347 11650 0.0011 -
2.3447 11700 0.0014 -
2.3547 11750 0.0011 -
2.3647 11800 0.0011 -
2.3747 11850 0.0012 -
2.3848 11900 0.0011 -
2.3948 11950 0.001 -
2.4048 12000 0.001 -
2.4148 12050 0.0011 -
2.4248 12100 0.0011 -
2.4349 12150 0.0011 -
2.4449 12200 0.001 -
2.4549 12250 0.0034 -
2.4649 12300 0.0011 -
2.4749 12350 0.0013 -
2.4850 12400 0.0012 -
2.4950 12450 0.0015 -
2.5050 12500 0.0011 -
2.5150 12550 0.0034 -
2.5251 12600 0.001 -
2.5351 12650 0.0011 -
2.5451 12700 0.0011 -
2.5551 12750 0.001 -
2.5651 12800 0.001 -
2.5752 12850 0.0034 -
2.5852 12900 0.0033 -
2.5952 12950 0.0011 -
2.6052 13000 0.001 -
2.6152 13050 0.001 -
2.6253 13100 0.0012 -
2.6353 13150 0.0011 -
2.6453 13200 0.0033 -
2.6553 13250 0.0034 -
2.6653 13300 0.001 -
2.6754 13350 0.001 -
2.6854 13400 0.0034 -
2.6954 13450 0.001 -
2.7054 13500 0.001 -
2.7154 13550 0.001 -
2.7255 13600 0.0009 -
2.7355 13650 0.001 -
2.7455 13700 0.001 -
2.7555 13750 0.0009 -
2.7655 13800 0.001 -
2.7756 13850 0.0009 -
2.7856 13900 0.0031 -
2.7956 13950 0.001 -
2.8056 14000 0.0031 -
2.8156 14050 0.0033 -
2.8257 14100 0.001 -
2.8357 14150 0.0009 -
2.8457 14200 0.0009 -
2.8557 14250 0.0009 -
2.8657 14300 0.001 -
2.8758 14350 0.001 -
2.8858 14400 0.0033 -
2.8958 14450 0.001 -
2.9058 14500 0.001 -
2.9158 14550 0.001 -
2.9259 14600 0.0033 -
2.9359 14650 0.001 -
2.9459 14700 0.0009 -
2.9559 14750 0.001 -
2.9659 14800 0.001 -
2.9760 14850 0.0009 -
2.9860 14900 0.0009 -
2.9960 14950 0.0009 -
3.0 14970 - 0.1077
3.0060 15000 0.0033 -
3.0160 15050 0.0009 -
3.0261 15100 0.0009 -
3.0361 15150 0.0009 -
3.0461 15200 0.0009 -
3.0561 15250 0.0008 -
3.0661 15300 0.001 -
3.0762 15350 0.0009 -
3.0862 15400 0.0009 -
3.0962 15450 0.0032 -
3.1062 15500 0.0009 -
3.1162 15550 0.0009 -
3.1263 15600 0.0009 -
3.1363 15650 0.0009 -
3.1463 15700 0.0008 -
3.1563 15750 0.0009 -
3.1663 15800 0.0009 -
3.1764 15850 0.0008 -
3.1864 15900 0.0008 -
3.1964 15950 0.0009 -
3.2064 16000 0.0009 -
3.2164 16050 0.0033 -
3.2265 16100 0.0031 -
3.2365 16150 0.0008 -
3.2465 16200 0.0008 -
3.2565 16250 0.0008 -
3.2665 16300 0.0008 -
3.2766 16350 0.0008 -
3.2866 16400 0.0008 -
3.2966 16450 0.0008 -
3.3066 16500 0.0009 -
3.3166 16550 0.0008 -
3.3267 16600 0.0032 -
3.3367 16650 0.0008 -
3.3467 16700 0.0008 -
3.3567 16750 0.0009 -
3.3667 16800 0.0031 -
3.3768 16850 0.0009 -
3.3868 16900 0.0008 -
3.3968 16950 0.0009 -
3.4068 17000 0.0009 -
3.4168 17050 0.0008 -
3.4269 17100 0.0009 -
3.4369 17150 0.0031 -
3.4469 17200 0.0032 -
3.4569 17250 0.0008 -
3.4669 17300 0.0008 -
3.4770 17350 0.0008 -
3.4870 17400 0.0008 -
3.4970 17450 0.0057 -
3.5070 17500 0.0032 -
3.5170 17550 0.0009 -
3.5271 17600 0.0052 -
3.5371 17650 0.0008 -
3.5471 17700 0.0009 -
3.5571 17750 0.0008 -
3.5671 17800 0.0008 -
3.5772 17850 0.0008 -
3.5872 17900 0.0008 -
3.5972 17950 0.0009 -
3.6072 18000 0.0032 -
3.6172 18050 0.0008 -
3.6273 18100 0.0008 -
3.6373 18150 0.0008 -
3.6473 18200 0.0008 -
3.6573 18250 0.0008 -
3.6673 18300 0.0008 -
3.6774 18350 0.0008 -
3.6874 18400 0.0008 -
3.6974 18450 0.0008 -
3.7074 18500 0.0008 -
3.7174 18550 0.0007 -
3.7275 18600 0.0008 -
3.7375 18650 0.0008 -
3.7475 18700 0.003 -
3.7575 18750 0.0008 -
3.7675 18800 0.0008 -
3.7776 18850 0.0008 -
3.7876 18900 0.0007 -
3.7976 18950 0.0008 -
3.8076 19000 0.0007 -
3.8176 19050 0.0007 -
3.8277 19100 0.0029 -
3.8377 19150 0.0007 -
3.8477 19200 0.0008 -
3.8577 19250 0.0031 -
3.8677 19300 0.0007 -
3.8778 19350 0.0007 -
3.8878 19400 0.0008 -
3.8978 19450 0.0008 -
3.9078 19500 0.0031 -
3.9178 19550 0.0008 -
3.9279 19600 0.0008 -
3.9379 19650 0.0007 -
3.9479 19700 0.0008 -
3.9579 19750 0.0008 -
3.9679 19800 0.0008 -
3.9780 19850 0.0008 -
3.9880 19900 0.0008 -
3.9980 19950 0.0007 -
4.0 19960 - 0.1050

Framework Versions

  • Python: 3.11.10
  • SetFit: 1.1.0
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.1+cu124
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}
Downloads last month
19
Safetensors
Model size
33.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for quantisan/bge-small-en-v1.5-93dataset

Finetuned
(135)
this model

Evaluation results