Spaces:
Sleeping
Sleeping
Upload 6 files
Browse files- README.md +2 -12
- app.py +117 -0
- french.txt +1 -0
- german.txt +10 -0
- requirements.txt +4 -0
- translated_file.txt +6 -0
README.md
CHANGED
@@ -1,12 +1,2 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
emoji: 📊
|
4 |
-
colorFrom: red
|
5 |
-
colorTo: purple
|
6 |
-
sdk: gradio
|
7 |
-
sdk_version: 4.32.2
|
8 |
-
app_file: app.py
|
9 |
-
pinned: false
|
10 |
-
---
|
11 |
-
|
12 |
-
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
1 |
+
# gov-tech-lab
|
2 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
app.py
ADDED
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
from langchain_community.llms import OpenAI
|
3 |
+
from langchain.prompts import PromptTemplate
|
4 |
+
import os
|
5 |
+
from dotenv import load_dotenv
|
6 |
+
|
7 |
+
load_dotenv()
|
8 |
+
|
9 |
+
open_api_key = os.getenv('OPENAI_API_KEY')
|
10 |
+
|
11 |
+
os.environ["OPENAI_API_KEY"] = open_api_key
|
12 |
+
|
13 |
+
system_prompt_1 = """
|
14 |
+
You are an advanced AI assistant tasked with helping to develop a system that automatically transcribes texts into
|
15 |
+
simplified languages, specifically FALC (Facile à Lire et à Comprendre) and "Leichte Sprache" (Simple Language).
|
16 |
+
This system is intended to streamline the creation of accessible content for government websites, where the current
|
17 |
+
manual process is time-consuming and limits the deployment of simplified language texts.
|
18 |
+
|
19 |
+
Requirements:
|
20 |
+
|
21 |
+
1. Input and Output Formats:
|
22 |
+
- Input Formats: The AI tool must accept input in Rich Text Format (.rtf) and Free Text (.txt).
|
23 |
+
- Output Formats: The output should be generated in the same format as the input file (i.e., if the input is .rtf, the
|
24 |
+
output should be .rtf, and if the input is .txt, the output should be .txt).
|
25 |
+
- Default Output Language: The output language must match the language detected in the input file.
|
26 |
+
|
27 |
+
2. Language Simplification Rules:
|
28 |
+
- The transcription must adhere to the rules of FALC and "Leichte Sprache," ensuring the content is simple, clear, and
|
29 |
+
accessible.
|
30 |
+
- Use simple vocabulary and avoid complex terms.
|
31 |
+
- Construct short, straightforward sentences with one main idea per sentence.
|
32 |
+
- Structure information clearly, using bullet points or numbered lists where applicable.
|
33 |
+
- Incorporate illustrations, icons, or symbols to support textual information if needed.
|
34 |
+
|
35 |
+
3. Accessibility Standards:
|
36 |
+
- The final solution must comply with accessibility standards to ensure content is usable by individuals with intellectual
|
37 |
+
disabilities and other target groups.
|
38 |
+
- Ensure that the output is compatible with screen readers and other assistive technologies.
|
39 |
+
|
40 |
+
4. Scalability and Efficiency:
|
41 |
+
- The tool should significantly reduce the time required for the transcription process compared to the current manual
|
42 |
+
methods.
|
43 |
+
- It should be capable of handling large volumes of text efficiently to support widespread deployment across various
|
44 |
+
government websites.
|
45 |
+
|
46 |
+
5. User Collaboration:
|
47 |
+
- The tool should allow for revisions and feedback from collaborators affected by intellectual disabilities to ensure the
|
48 |
+
output meets the necessary standards of FALC and "Leichte Sprache."
|
49 |
+
|
50 |
+
Instructions for AI Development:
|
51 |
+
|
52 |
+
Implement a language detection mechanism to identify the language of the input text.
|
53 |
+
Develop natural language processing (NLP) models trained specifically on FALC and "Leichte Sprache" guidelines to
|
54 |
+
accurately transcribe complex texts into simplified language.
|
55 |
+
|
56 |
+
Ensure the models are capable of maintaining the context and meaning of the original text while simplifying its language.
|
57 |
+
Include features for user feedback and revisions to refine and improve the transcriptions based on real-world use and
|
58 |
+
collaborator input.
|
59 |
+
Test the tool rigorously to ensure it meets accessibility standards and performs well across different types of content
|
60 |
+
and input formats.
|
61 |
+
|
62 |
+
Your goal is to create an AI tool that makes the process of generating FALC and "Leichte Sprache" content more efficient,
|
63 |
+
scalable, and accessible, ultimately facilitating better communication and inclusivity on government websites.
|
64 |
+
|
65 |
+
User Text: {text}
|
66 |
+
|
67 |
+
transcribes text: """
|
68 |
+
|
69 |
+
system_prompt_2 = """Please translate the following text field content in english langauge.
|
70 |
+
text: {text}
|
71 |
+
"""
|
72 |
+
|
73 |
+
def translate_text(file, text_input):
|
74 |
+
llm = OpenAI()
|
75 |
+
|
76 |
+
with open(file.name, 'r', encoding='utf-8') as f:
|
77 |
+
file_text = f.read()
|
78 |
+
|
79 |
+
template_1 = PromptTemplate(input_variables=["text"], template=system_prompt_1)
|
80 |
+
prompt_1 = template_1.format(text=file_text)
|
81 |
+
file_translation = llm(prompt_1)
|
82 |
+
|
83 |
+
template_2 = PromptTemplate(input_variables=["text"], template=system_prompt_2)
|
84 |
+
prompt_2 = template_2.format(text=file_translation)
|
85 |
+
text_translation_op = llm(prompt_2)
|
86 |
+
|
87 |
+
|
88 |
+
template_3 = PromptTemplate(input_variables=["text"], template=system_prompt_2)
|
89 |
+
prompt_3 = template_2.format(text=file_text)
|
90 |
+
text_translation_ip = llm(prompt_3)
|
91 |
+
|
92 |
+
output_file_path = "translated_file.txt"
|
93 |
+
with open(output_file_path, 'w', encoding='utf-8') as f:
|
94 |
+
f.write(file_translation)
|
95 |
+
|
96 |
+
return text_translation_ip, file_translation, text_translation_op, output_file_path
|
97 |
+
|
98 |
+
iface = gr.Interface(
|
99 |
+
fn=translate_text,
|
100 |
+
inputs=[
|
101 |
+
gr.File(label="Upload Text File")
|
102 |
+
|
103 |
+
],
|
104 |
+
outputs=[
|
105 |
+
gr.Textbox(label="transcribes content in english Translated of input content"),
|
106 |
+
gr.Textbox(label="transcribes content"),
|
107 |
+
gr.Textbox(label="transcribes content in english Translated of output content"),
|
108 |
+
gr.File(label="Download Translated File Text")
|
109 |
+
|
110 |
+
],
|
111 |
+
title="Text Transcribes",
|
112 |
+
description="Upload a text file and provide a text input to translate the text using LangChain and OpenAI with predefined system prompts.",
|
113 |
+
allow_flagging="never"
|
114 |
+
|
115 |
+
)
|
116 |
+
|
117 |
+
iface.launch(debug=True)
|
french.txt
ADDED
@@ -0,0 +1 @@
|
|
|
|
|
1 |
+
La technologie joue un rôle crucial dans notre vie quotidienne. Elle nous permet de rester connectés avec nos proches, de travailler à distance, et d'accéder à une quantité infinie d'informations. Les avancées dans le domaine de l'intelligence artificielle et de la robotique transforment également de nombreux secteurs, de la santé à l'éducation, en passant par les transports. Il est important de comprendre et de maîtriser ces technologies pour pouvoir en tirer le meilleur parti et anticiper les défis futurs qu'elles pourraient poser.
|
german.txt
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Titel: Die Schönheit der deutschen Sprache
|
2 |
+
|
3 |
+
Die deutsche Sprache, reich an Geschichte und Kultur, bietet eine Vielzahl von Ausdrucksmöglichkeiten und Nuancen.
|
4 |
+
Sie ist bekannt für ihre langen, zusammengesetzten Wörter und präzisen Begriffe.
|
5 |
+
Ein Beispiel hierfür ist das Wort "Donaudampfschifffahrtsgesellschaftskapitän",
|
6 |
+
das Kapitän einer Donaudampfschifffahrtsgesellschaft bedeutet.
|
7 |
+
|
8 |
+
Die Literatur auf Deutsch ist ebenfalls beeindruckend. Dichter und Denker wie Goethe,
|
9 |
+
Schiller und Kafka haben Werke geschaffen, die weltweit Anerkennung gefunden haben. Ihre Texte, voll von
|
10 |
+
tiefen Gedanken und komplexen Charakteren, laden die Leser ein, in die Tiefen der menschlichen Seele einzutauchen.
|
requirements.txt
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio
|
2 |
+
langchain
|
3 |
+
python-dotenv
|
4 |
+
langchain-community
|
translated_file.txt
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
Titel: Die Schönheit der deutschen Sprache
|
3 |
+
|
4 |
+
Die deutsche Sprache ist bekannt für ihre reiche Geschichte und Kultur. Sie bietet eine Vielzahl von Möglichkeiten, um sich auszudrücken und Nuancen zu betonen. Ein Beispiel hierfür ist das Wort "Donaudampfschifffahrtsgesellschaftskapitän", welches den Kapitän eines Donaudampfschifffahrtsgesellschaft bedeutet.
|
5 |
+
|
6 |
+
Auch die Literatur auf Deutsch ist beeindruckend. Schriftsteller und Denker wie Goethe, Schiller und Kafka haben Werke geschaffen, die weltweit Anerkennung gefunden haben. Ihre Texte sind voller tiefer Gedanken und komplexer Charaktere, die den Leser einladen, in die Tiefen der menschlichen Seele einzutauchen.
|