metadata
base_model: Snowflake/snowflake-arctic-embed-m
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
- dot_accuracy@1
- dot_accuracy@3
- dot_accuracy@5
- dot_accuracy@10
- dot_precision@1
- dot_precision@3
- dot_precision@5
- dot_precision@10
- dot_recall@1
- dot_recall@3
- dot_recall@5
- dot_recall@10
- dot_ndcg@10
- dot_mrr@10
- dot_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:363
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
What are some examples of algorithmic discrimination mentioned in the
context, and how do they impact different areas such as hiring and
healthcare?
sentences:
- >
For example, facial recognition technology that can contribute to
wrongful and discriminatory
arrests,31 hiring algorithms that inform discriminatory decisions, and
healthcare algorithms that discount
the severity of certain diseases in Black Americans. Instances of
discriminatory practices built into and
resulting from AI and other automated systems exist across many
industries, areas, and contexts. While automated
systems have the capacity to drive extraordinary advances and
innovations, algorithmic discrimination
protections should be built into their design, deployment, and ongoing
use. Many companies, non-profits, and federal government agencies are
already taking steps to ensure the public
is protected from algorithmic discrimination. Some companies have
instituted bias testing as part of their product
quality assessment and launch procedures, and in some cases this testing
has led products to be changed or not
launched, preventing harm to the public. Federal government agencies
have been developing standards and guidance
for the use of automated systems in order to help prevent bias.
Non-profits and companies have developed best
practices for audits and impact assessments to help identify potential
algorithmic discrimination and provide
transparency to the public in the mitigation of such biases. But there
is much more work to do to protect the public from algorithmic
discrimination to use and design
automated systems in an equitable way. The guardrails protecting the
public from discrimination in their daily
lives should include their digital lives and impacts—basic safeguards
against abuse, bias, and discrimination to
ensure that all people are treated fairly when automated systems are
used. This includes all dimensions of their
lives, from hiring to loan approvals, from medical treatment and payment
to encounters with the criminal
justice system. Ensuring equity should also go beyond existing
guardrails to consider the holistic impact that
automated systems make on underserved communities and to institute
proactive protections that support these
communities. •
An automated system using nontraditional factors such as educational
attainment and employment history as
part of its loan underwriting and pricing model was found to be much
more likely to charge an applicant who
attended a Historically Black College or University (HBCU) higher loan
prices for refinancing a student loan
than an applicant who did not attend an HBCU. This was found to be true
even when controlling for
other credit-related factors.32
•
A hiring tool that learned the features of a company's employees
(predominantly men) rejected women appli
cants for spurious and discriminatory reasons; resumes with the word
“women’s,” such as “women’s
chess club captain,” were penalized in the candidate ranking.33
•
A predictive model marketed as being able to predict whether students
are likely to drop out of school was
used by more than 500 universities across the country. The model was
found to use race directly as a predictor,
and also shown to have large disparities by race; Black students were as
many as four times as likely as their
otherwise similar white peers to be deemed at high risk of dropping out.
These risk scores are used by advisors
to guide students towards or away from majors, and some worry that they
are being used to guide
Black students away from math and science subjects.34
•
A risk assessment tool designed to predict the risk of recidivism for
individuals in federal custody showed
evidence of disparity in prediction. The tool overpredicts the risk of
recidivism for some groups of color on the
general recidivism tools, and underpredicts the risk of recidivism for
some groups of color on some of the
violent recidivism tools. The Department of Justice is working to reduce
these disparities and has
publicly released a report detailing its review of the tool.35
24
- >
SECTION: APPENDIX: EXAMPLES OF AUTOMATED SYSTEMS
APPENDIX
Systems that impact the safety of communities such as automated traffic
control systems, elec
-ctrical grid controls, smart city technologies, and industrial
emissions and environmental
impact control algorithms; and
Systems related to access to benefits or services or assignment of
penalties such as systems that
support decision-makers who adjudicate benefits such as collating or
analyzing information or
matching records, systems which similarly assist in the adjudication of
administrative or criminal
penalties, fraud detection algorithms, services or benefits access
control algorithms, biometric
systems used as access control, and systems which make benefits or
services related decisions on a
fully or partially autonomous basis (such as a determination to revoke
benefits). 54
- >-
SECTION: SAFE AND EFFECTIVE SYSTEMS
SAFE AND EFFECTIVE
SYSTEMS
WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS
The expectations for automated systems are meant to serve as a blueprint
for the development of additional
technical standards and practices that are tailored for particular
sectors and contexts. In order to ensure that an automated system is
safe and effective, it should include safeguards to protect the
public from harm in a proactive and ongoing manner; avoid use of data
inappropriate for or irrelevant to the task
at hand, including reuse that could cause compounded harm; and
demonstrate the safety and effectiveness of
the system. These expectations are explained below. Protect the public
from harm in a proactive and ongoing manner
Consultation. The public should be consulted in the design,
implementation, deployment, acquisition, and
maintenance phases of automated system development, with emphasis on
early-stage consultation before a
system is introduced or a large change implemented. This consultation
should directly engage diverse impact
ed communities to consider concerns and risks that may be unique to
those communities, or disproportionate
ly prevalent or severe for them. The extent of this engagement and the
form of outreach to relevant stakehold
ers may differ depending on the specific automated system and
development phase, but should include
subject matter, sector-specific, and context-specific experts as well as
experts on potential impacts such as
civil rights, civil liberties, and privacy experts. For private sector
applications, consultations before product
launch may need to be confidential. Government applications,
particularly law enforcement applications or
applications that raise national security considerations, may require
confidential or limited engagement based
on system sensitivities and preexisting oversight laws and structures.
Concerns raised in this consultation
should be documented, and the automated system developers were proposing
to create, use, or deploy should
be reconsidered based on this feedback.
- source_sentence: >-
What are some key needs identified by panelists for the future design of
critical AI systems?
sentences:
- >
It included discussion of the
technical aspects
of
designing
non-discriminatory
technology,
explainable
AI,
human-computer
interaction with an emphasis on community participation, and
privacy-aware design. Welcome:
•
Sorelle Friedler, Assistant Director for Data and Democracy, White House
Office of Science and
Technology Policy
•
J. Bob Alotta, Vice President for Global Programs, Mozilla Foundation
•
Navrina Singh, Board Member, Mozilla Foundation
Moderator: Kathy Pham Evans, Deputy Chief Technology Officer for Product
and Engineering, U.S
Federal Trade Commission. Panelists:
•
Liz O’Sullivan, CEO, Parity AI
•
Timnit Gebru, Independent Scholar
•
Jennifer Wortman Vaughan, Senior Principal Researcher, Microsoft
Research, New York City
•
Pamela Wisniewski, Associate Professor of Computer Science, University
of Central Florida; Director,
Socio-technical Interaction Research (STIR) Lab
•
Seny Kamara, Associate Professor of Computer Science, Brown University
Each panelist individually emphasized the risks of using AI in
high-stakes settings, including the potential for
biased data and discriminatory outcomes, opaque decision-making
processes, and lack of public trust and
understanding of the algorithmic systems. The interventions and key
needs various panelists put forward as
necessary to the future design of critical AI systems included ongoing
transparency, value sensitive and
participatory design, explanations designed for relevant stakeholders,
and public consultation. Various
panelists emphasized the importance of placing trust in people, not
technologies, and in engaging with
impacted communities to understand the potential harms of technologies
and build protection by design into
future systems. Panel 5: Social Welfare and Development. This event
explored current and emerging uses of technology to
implement or improve social welfare systems, social development
programs, and other systems that can impact
life chances. Welcome:
•
Suresh Venkatasubramanian, Assistant Director for Science and Justice,
White House Office of Science
and Technology Policy
•
Anne-Marie Slaughter, CEO, New America
Moderator: Michele Evermore, Deputy Director for Policy, Office of
Unemployment Insurance
Modernization, Office of the Secretary, Department of Labor
Panelists:
•
Blake Hall, CEO and Founder, ID.Me
•
Karrie Karahalios, Professor of Computer Science, University of
Illinois, Urbana-Champaign
•
Christiaan van Veen, Director of Digital Welfare State and Human Rights
Project, NYU School of Law's
Center for Human Rights and Global Justice
58
- >
20, 2021.
https://www.vice.com/en/article/88npjv/amazons-ai-cameras-are-punishing
drivers-for-mistakes-they-didnt-make
63
- >-
Jan. 11, 2022.
https://themarkup.org/machine-learning/2022/01/11/this-private-equity-firm-is-amassing-companies
that-collect-data-on-americas-children
77. Reed Albergotti. Every employee who leaves Apple becomes an
‘associate’: In job databases used by
employers to verify resume information, every former Apple employee’s
title gets erased and replaced with
a generic title. The Washington Post.
- source_sentence: >-
How do automated identity controls at airports ensure assistance for
individuals facing misidentification?
sentences:
- >-
SECTION: ALGORITHMIC DISCRIMINATION PROTECTIONS
ALGORITHMIC DISCRIMINATION Protections
You should not face discrimination by algorithms
and systems should be used and designed in an
equitable
way. Algorithmic
discrimination
occurs when
automated systems contribute to unjustified different treatment or
impacts disfavoring people based on their race, color, ethnicity,
sex
(including
pregnancy,
childbirth,
and
related
medical
conditions,
gender
identity,
intersex
status,
and
sexual
orientation), religion, age, national origin, disability, veteran
status,
genetic infor-mation, or any other classification protected by law.
Depending on the specific circumstances, such algorithmic
discrimination may violate legal protections. Designers, developers,
and deployers of automated systems should take proactive and
continuous measures to protect individuals and communities
from algorithmic discrimination and to use and design systems in
an equitable way. This protection should include proactive equity
assessments as part of the system design, use of representative data
and protection against proxies for demographic features, ensuring
accessibility for people with disabilities in design and development,
pre-deployment and ongoing disparity testing and mitigation, and
clear organizational oversight. Independent evaluation and plain
language reporting in the form of an algorithmic impact assessment,
including disparity testing results and mitigation information,
should be performed and made public whenever possible to confirm
these protections.
- >-
These critical protections have been adopted in some scenarios. Where
automated systems have been introduced to
provide the public access to government benefits, existing human paper
and phone-based processes are generally still
in place, providing an important alternative to ensure access. Companies
that have introduced automated call centers
often retain the option of dialing zero to reach an operator. When
automated identity controls are in place to board an
airplane or enter the country, there is a person supervising the systems
who can be turned to for help or to appeal a
misidentification. The American people deserve the reassurance that such
procedures are in place to protect their rights, opportunities,
and access.
- >
SECTION: APPENDIX: EXAMPLES OF AUTOMATED SYSTEMS
APPENDIX
Systems that impact the safety of communities such as automated traffic
control systems, elec
-ctrical grid controls, smart city technologies, and industrial
emissions and environmental
impact control algorithms; and
Systems related to access to benefits or services or assignment of
penalties such as systems that
support decision-makers who adjudicate benefits such as collating or
analyzing information or
matching records, systems which similarly assist in the adjudication of
administrative or criminal
penalties, fraud detection algorithms, services or benefits access
control algorithms, biometric
systems used as access control, and systems which make benefits or
services related decisions on a
fully or partially autonomous basis (such as a determination to revoke
benefits). 54
- source_sentence: >-
How should the availability of human consideration and fallback mechanisms
be determined in relation to the potential impact of automated systems on
rights, opportunities, or access?
sentences:
- >
In many scenarios, there is a reasonable expectation
of human involvement in attaining rights, opportunities, or access. When
automated systems make up part of
the attainment process, alternative timely human-driven processes should
be provided. The use of a human
alternative should be triggered by an opt-out process. Timely and not
burdensome human alternative. Opting out should be timely and not
unreasonably
burdensome in both the process of requesting to opt-out and the
human-driven alternative provided. Provide timely human consideration
and remedy by a fallback and escalation system in the
event that an automated system fails, produces error, or you would like
to appeal or con
test its impacts on you
Proportionate. The availability of human consideration and fallback,
along with associated training and
safeguards against human bias, should be proportionate to the potential
of the automated system to meaning
fully impact rights, opportunities, or access. Automated systems that
have greater control over outcomes,
provide input to high-stakes decisions, relate to sensitive domains, or
otherwise have a greater potential to
meaningfully impact rights, opportunities, or access should have greater
availability (e.g., staffing) and over
sight of human consideration and fallback mechanisms. Accessible.
Mechanisms for human consideration and fallback, whether in-person, on
paper, by phone, or
otherwise provided, should be easy to find and use. These mechanisms
should be tested to ensure that users
who have trouble with the automated system are able to use human
consideration and fallback, with the under
standing that it may be these users who are most likely to need the
human assistance. Similarly, it should be
tested to ensure that users with disabilities are able to find and use
human consideration and fallback and also
request reasonable accommodations or modifications. Convenient.
Mechanisms for human consideration and fallback should not be
unreasonably burdensome as
compared to the automated system’s equivalent. 49
- >-
SECTION: DATA PRIVACY
DATA PRIVACY
WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS
The expectations for automated systems are meant to serve as a blueprint
for the development of additional
technical standards and practices that are tailored for particular
sectors and contexts. Data access and correction. People whose data is
collected, used, shared, or stored by automated
systems should be able to access data and metadata about themselves,
know who has access to this data, and
be able to correct it if necessary. Entities should receive consent
before sharing data with other entities and
should keep records of what data is shared and with whom. Consent
withdrawal and data deletion. Entities should allow (to the extent
legally permissible) with
drawal of data access consent, resulting in the deletion of user data,
metadata, and the timely removal of
their data from any systems (e.g., machine learning models) derived from
that data.68
Automated system support. Entities designing, developing, and deploying
automated systems should
establish and maintain the capabilities that will allow individuals to
use their own automated systems to help
them make consent, access, and control decisions in a complex data
ecosystem. Capabilities include machine
readable data, standardized data formats, metadata or tags for
expressing data processing permissions and
preferences and data provenance and lineage, context of use and
access-specific tags, and training models for
assessing privacy risk. Demonstrate that data privacy and user control
are protected
Independent evaluation. As described in the section on Safe and
Effective Systems, entities should allow
independent evaluation of the claims made regarding data policies. These
independent evaluations should be
made public whenever possible. Care will need to be taken to balance
individual privacy with evaluation data
access needs.
- >-
SECTION: NOTICE AND EXPLANATION
NOTICE &
EXPLANATION
WHY THIS PRINCIPLE IS IMPORTANT
This section provides a brief summary of the problems which the
principle seeks to address and protect
against, including illustrative examples. •
A predictive policing system claimed to identify individuals at greatest
risk to commit or become the victim of
gun violence (based on automated analysis of social ties to gang
members, criminal histories, previous experi
ences of gun violence, and other factors) and led to individuals being
placed on a watch list with no
explanation or public transparency regarding how the system came to its
conclusions.85 Both police and
the public deserve to understand why and how such a system is making
these determinations. •
A system awarding benefits changed its criteria invisibly.
- source_sentence: >-
What topics were discussed during the meetings related to the development
of the Blueprint for an AI Bill of Rights?
sentences:
- >2-
GAI systems can produce content that is inciting, radicalizing, or
threatening, or that glorifies violence,
with greater ease and scale than other technologies. LLMs have been
reported to generate dangerous or
violent recommendations, and some models have generated actionable
instructions for dangerous or
9 Confabulations of falsehoods are most commonly a problem for
text-based outputs; for audio, image, or video
content, creative generation of non-factual content can be a desired
behavior. 10 For example, legal confabulations have been shown to be
pervasive in current state-of-the-art LLMs. See also,
e.g.,
7
unethical behavior.
- >-
SECTION: LISTENING TO THE AMERICAN PEOPLE
APPENDIX
• OSTP conducted meetings with a variety of stakeholders in the private
sector and civil society. Some of these
meetings were specifically focused on providing ideas related to the
development of the Blueprint for an AI
Bill of Rights while others provided useful general context on the
positive use cases, potential harms, and/or
oversight possibilities for these technologies.
- >
Transgender travelers have described degrading experiences associated
with these extra screenings.43 TSA has recently announced plans to
implement a gender-neutral algorithm44
while simultaneously enhancing the security effectiveness capabilities
of the existing technology. •
The National Disabled Law Students Association expressed concerns that
individuals with disabilities were
more likely to be flagged as potentially suspicious by remote proctoring
AI systems because of their disabili-
ty-specific access needs such as needing longer breaks or using screen
readers or dictation software.45
•
An algorithm designed to identify patients with high needs for
healthcare systematically assigned lower
scores (indicating that they were not as high need) to Black patients
than to those of white patients, even
when those patients had similar numbers of chronic conditions and other
markers of health.46 In addition,
healthcare clinical algorithms that are used by physicians to guide
clinical decisions may include
sociodemographic variables that adjust or “correct” the algorithm’s
output on the basis of a patient’s race or
ethnicity, which can lead to race-based health inequities.47
25
Algorithmic
Discrimination
Protections
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.7608695652173914
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.8695652173913043
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.9130434782608695
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.9782608695652174
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.7608695652173914
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.2898550724637682
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.18260869565217389
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.0978260869565217
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.7608695652173914
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.8695652173913043
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.9130434782608695
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.9782608695652174
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.8567216523715442
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.8190217391304349
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.8203804347826088
name: Cosine Map@100
- type: dot_accuracy@1
value: 0.7608695652173914
name: Dot Accuracy@1
- type: dot_accuracy@3
value: 0.8695652173913043
name: Dot Accuracy@3
- type: dot_accuracy@5
value: 0.9130434782608695
name: Dot Accuracy@5
- type: dot_accuracy@10
value: 0.9782608695652174
name: Dot Accuracy@10
- type: dot_precision@1
value: 0.7608695652173914
name: Dot Precision@1
- type: dot_precision@3
value: 0.2898550724637682
name: Dot Precision@3
- type: dot_precision@5
value: 0.18260869565217389
name: Dot Precision@5
- type: dot_precision@10
value: 0.0978260869565217
name: Dot Precision@10
- type: dot_recall@1
value: 0.7608695652173914
name: Dot Recall@1
- type: dot_recall@3
value: 0.8695652173913043
name: Dot Recall@3
- type: dot_recall@5
value: 0.9130434782608695
name: Dot Recall@5
- type: dot_recall@10
value: 0.9782608695652174
name: Dot Recall@10
- type: dot_ndcg@10
value: 0.8567216523715442
name: Dot Ndcg@10
- type: dot_mrr@10
value: 0.8190217391304349
name: Dot Mrr@10
- type: dot_map@100
value: 0.8203804347826088
name: Dot Map@100
SentenceTransformer based on Snowflake/snowflake-arctic-embed-m
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: Snowflake/snowflake-arctic-embed-m
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("northstaranlyticsma24/artic_ft_midterm")
# Run inference
sentences = [
'What topics were discussed during the meetings related to the development of the Blueprint for an AI Bill of Rights?',
'SECTION: LISTENING TO THE AMERICAN PEOPLE\nAPPENDIX\n• OSTP conducted meetings with a variety of stakeholders in the private sector and civil society. Some of these\nmeetings were specifically focused on providing ideas related to the development of the Blueprint for an AI\nBill of Rights while others provided useful general context on the positive use cases, potential harms, and/or\noversight possibilities for these technologies.',
' \nGAI systems can produce content that is inciting, radicalizing, or threatening, or that glorifies violence, \nwith greater ease and scale than other technologies. LLMs have been reported to generate dangerous or \nviolent recommendations, and some models have generated actionable instructions for dangerous or \n \n \n9 Confabulations of falsehoods are most commonly a problem for text-based outputs; for audio, image, or video \ncontent, creative generation of non-factual content can be a desired behavior. 10 For example, legal confabulations have been shown to be pervasive in current state-of-the-art LLMs. See also, \ne.g., \n \n7 \nunethical behavior.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.7609 |
cosine_accuracy@3 | 0.8696 |
cosine_accuracy@5 | 0.913 |
cosine_accuracy@10 | 0.9783 |
cosine_precision@1 | 0.7609 |
cosine_precision@3 | 0.2899 |
cosine_precision@5 | 0.1826 |
cosine_precision@10 | 0.0978 |
cosine_recall@1 | 0.7609 |
cosine_recall@3 | 0.8696 |
cosine_recall@5 | 0.913 |
cosine_recall@10 | 0.9783 |
cosine_ndcg@10 | 0.8567 |
cosine_mrr@10 | 0.819 |
cosine_map@100 | 0.8204 |
dot_accuracy@1 | 0.7609 |
dot_accuracy@3 | 0.8696 |
dot_accuracy@5 | 0.913 |
dot_accuracy@10 | 0.9783 |
dot_precision@1 | 0.7609 |
dot_precision@3 | 0.2899 |
dot_precision@5 | 0.1826 |
dot_precision@10 | 0.0978 |
dot_recall@1 | 0.7609 |
dot_recall@3 | 0.8696 |
dot_recall@5 | 0.913 |
dot_recall@10 | 0.9783 |
dot_ndcg@10 | 0.8567 |
dot_mrr@10 | 0.819 |
dot_map@100 | 0.8204 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 363 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 363 samples:
sentence_0 sentence_1 type string string details - min: 2 tokens
- mean: 20.1 tokens
- max: 36 tokens
- min: 2 tokens
- mean: 228.97 tokens
- max: 512 tokens
- Samples:
sentence_0 sentence_1 What are the five principles outlined in the Blueprint for an AI Bill of Rights intended to protect against?
SECTION: USING THIS TECHNICAL COMPANION
-
USING THIS TECHNICAL COMPANION
The Blueprint for an AI Bill of Rights is a set of five principles and associated practices to help guide the design,
use, and deployment of automated systems to protect the rights of the American public in the age of artificial
intelligence. This technical companion considers each principle in the Blueprint for an AI Bill of Rights and
provides examples and concrete steps for communities, industry, governments, and others to take in order to
build these protections into policy, practice, or the technological design process. Taken together, the technical protections and practices laid out in the Blueprint for an AI Bill of Rights can help
guard the American public against many of the potential and actual harms identified by researchers, technolo
gists, advocates, journalists, policymakers, and communities in the United States and around the world. This
technical companion is intended to be used as a reference by people across many circumstances – anyone
impacted by automated systems, and anyone developing, designing, deploying, evaluating, or making policy to
govern the use of an automated system. Each principle is accompanied by three supplemental sections:
1
2
WHY THIS PRINCIPLE IS IMPORTANT:
This section provides a brief summary of the problems that the principle seeks to address and protect against, including
illustrative examples. WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS:
• The expectations for automated systems are meant to serve as a blueprint for the development of additional technical
standards and practices that should be tailored for particular sectors and contexts. • This section outlines practical steps that can be implemented to realize the vision of the Blueprint for an AI Bill of Rights. The
expectations laid out often mirror existing practices for technology development, including pre-deployment testing, ongoing
monitoring, and governance structures for automated systems, but also go further to address unmet needs for change and offer
concrete directions for how those changes can be made. • Expectations about reporting are intended for the entity developing or using the automated system. The resulting reports can
be provided to the public, regulators, auditors, industry standards groups, or others engaged in independent review, and should
be made public as much as possible consistent with law, regulation, and policy, and noting that intellectual property, law
enforcement, or national security considerations may prevent public release. Where public reports are not possible, the
information should be provided to oversight bodies and privacy, civil liberties, or other ethics officers charged with safeguard
ing individuals’ rights. These reporting expectations are important for transparency, so the American people can have
confidence that their rights, opportunities, and access as well as their expectations about technologies are respected. 3
HOW THESE PRINCIPLES CAN MOVE INTO PRACTICE:
This section provides real-life examples of how these guiding principles can become reality, through laws, policies, and practices. It describes practical technical and sociotechnical approaches to protecting rights, opportunities, and access. The examples provided are not critiques or endorsements, but rather are offered as illustrative cases to help
provide a concrete vision for actualizing the Blueprint for an AI Bill of Rights. Effectively implementing these
processes require the cooperation of and collaboration among industry, civil society, researchers, policymakers,
technologists, and the public.How does the technical companion suggest that automated systems should be monitored and reported on to ensure transparency and protect individual rights?
SECTION: USING THIS TECHNICAL COMPANION
-
USING THIS TECHNICAL COMPANION
The Blueprint for an AI Bill of Rights is a set of five principles and associated practices to help guide the design,
use, and deployment of automated systems to protect the rights of the American public in the age of artificial
intelligence. This technical companion considers each principle in the Blueprint for an AI Bill of Rights and
provides examples and concrete steps for communities, industry, governments, and others to take in order to
build these protections into policy, practice, or the technological design process. Taken together, the technical protections and practices laid out in the Blueprint for an AI Bill of Rights can help
guard the American public against many of the potential and actual harms identified by researchers, technolo
gists, advocates, journalists, policymakers, and communities in the United States and around the world. This
technical companion is intended to be used as a reference by people across many circumstances – anyone
impacted by automated systems, and anyone developing, designing, deploying, evaluating, or making policy to
govern the use of an automated system. Each principle is accompanied by three supplemental sections:
1
2
WHY THIS PRINCIPLE IS IMPORTANT:
This section provides a brief summary of the problems that the principle seeks to address and protect against, including
illustrative examples. WHAT SHOULD BE EXPECTED OF AUTOMATED SYSTEMS:
• The expectations for automated systems are meant to serve as a blueprint for the development of additional technical
standards and practices that should be tailored for particular sectors and contexts. • This section outlines practical steps that can be implemented to realize the vision of the Blueprint for an AI Bill of Rights. The
expectations laid out often mirror existing practices for technology development, including pre-deployment testing, ongoing
monitoring, and governance structures for automated systems, but also go further to address unmet needs for change and offer
concrete directions for how those changes can be made. • Expectations about reporting are intended for the entity developing or using the automated system. The resulting reports can
be provided to the public, regulators, auditors, industry standards groups, or others engaged in independent review, and should
be made public as much as possible consistent with law, regulation, and policy, and noting that intellectual property, law
enforcement, or national security considerations may prevent public release. Where public reports are not possible, the
information should be provided to oversight bodies and privacy, civil liberties, or other ethics officers charged with safeguard
ing individuals’ rights. These reporting expectations are important for transparency, so the American people can have
confidence that their rights, opportunities, and access as well as their expectations about technologies are respected. 3
HOW THESE PRINCIPLES CAN MOVE INTO PRACTICE:
This section provides real-life examples of how these guiding principles can become reality, through laws, policies, and practices. It describes practical technical and sociotechnical approaches to protecting rights, opportunities, and access. The examples provided are not critiques or endorsements, but rather are offered as illustrative cases to help
provide a concrete vision for actualizing the Blueprint for an AI Bill of Rights. Effectively implementing these
processes require the cooperation of and collaboration among industry, civil society, researchers, policymakers,
technologists, and the public.What is the significance of the number 14 in the given context?
14
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 20per_device_eval_batch_size
: 20num_train_epochs
: 5multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 20per_device_eval_batch_size
: 20per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Epoch | Step | cosine_map@100 |
---|---|---|
1.0 | 19 | 0.7434 |
2.0 | 38 | 0.7973 |
2.6316 | 50 | 0.8048 |
3.0 | 57 | 0.8048 |
4.0 | 76 | 0.8204 |
5.0 | 95 | 0.8204 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.1.1
- Transformers: 4.44.2
- PyTorch: 2.4.1+cu121
- Accelerate: 0.34.2
- Datasets: 3.0.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}