deepakaiplanet commited on
Commit
39ea455
·
verified ·
1 Parent(s): 15a6726

Upload 6 files

Browse files
Files changed (6) hide show
  1. README.md +2 -12
  2. app.py +117 -0
  3. french.txt +1 -0
  4. german.txt +10 -0
  5. requirements.txt +4 -0
  6. translated_file.txt +6 -0
README.md CHANGED
@@ -1,12 +1,2 @@
1
- ---
2
- title: Gov Tech Lab
3
- emoji: 📊
4
- colorFrom: red
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 4.32.2
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # gov-tech-lab
2
+
 
 
 
 
 
 
 
 
 
 
app.py ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from langchain_community.llms import OpenAI
3
+ from langchain.prompts import PromptTemplate
4
+ import os
5
+ from dotenv import load_dotenv
6
+
7
+ load_dotenv()
8
+
9
+ open_api_key = os.getenv('OPENAI_API_KEY')
10
+
11
+ os.environ["OPENAI_API_KEY"] = open_api_key
12
+
13
+ system_prompt_1 = """
14
+ You are an advanced AI assistant tasked with helping to develop a system that automatically transcribes texts into
15
+ simplified languages, specifically FALC (Facile à Lire et à Comprendre) and "Leichte Sprache" (Simple Language).
16
+ This system is intended to streamline the creation of accessible content for government websites, where the current
17
+ manual process is time-consuming and limits the deployment of simplified language texts.
18
+
19
+ Requirements:
20
+
21
+ 1. Input and Output Formats:
22
+ - Input Formats: The AI tool must accept input in Rich Text Format (.rtf) and Free Text (.txt).
23
+ - Output Formats: The output should be generated in the same format as the input file (i.e., if the input is .rtf, the
24
+ output should be .rtf, and if the input is .txt, the output should be .txt).
25
+ - Default Output Language: The output language must match the language detected in the input file.
26
+
27
+ 2. Language Simplification Rules:
28
+ - The transcription must adhere to the rules of FALC and "Leichte Sprache," ensuring the content is simple, clear, and
29
+ accessible.
30
+ - Use simple vocabulary and avoid complex terms.
31
+ - Construct short, straightforward sentences with one main idea per sentence.
32
+ - Structure information clearly, using bullet points or numbered lists where applicable.
33
+ - Incorporate illustrations, icons, or symbols to support textual information if needed.
34
+
35
+ 3. Accessibility Standards:
36
+ - The final solution must comply with accessibility standards to ensure content is usable by individuals with intellectual
37
+ disabilities and other target groups.
38
+ - Ensure that the output is compatible with screen readers and other assistive technologies.
39
+
40
+ 4. Scalability and Efficiency:
41
+ - The tool should significantly reduce the time required for the transcription process compared to the current manual
42
+ methods.
43
+ - It should be capable of handling large volumes of text efficiently to support widespread deployment across various
44
+ government websites.
45
+
46
+ 5. User Collaboration:
47
+ - The tool should allow for revisions and feedback from collaborators affected by intellectual disabilities to ensure the
48
+ output meets the necessary standards of FALC and "Leichte Sprache."
49
+
50
+ Instructions for AI Development:
51
+
52
+ Implement a language detection mechanism to identify the language of the input text.
53
+ Develop natural language processing (NLP) models trained specifically on FALC and "Leichte Sprache" guidelines to
54
+ accurately transcribe complex texts into simplified language.
55
+
56
+ Ensure the models are capable of maintaining the context and meaning of the original text while simplifying its language.
57
+ Include features for user feedback and revisions to refine and improve the transcriptions based on real-world use and
58
+ collaborator input.
59
+ Test the tool rigorously to ensure it meets accessibility standards and performs well across different types of content
60
+ and input formats.
61
+
62
+ Your goal is to create an AI tool that makes the process of generating FALC and "Leichte Sprache" content more efficient,
63
+ scalable, and accessible, ultimately facilitating better communication and inclusivity on government websites.
64
+
65
+ User Text: {text}
66
+
67
+ transcribes text: """
68
+
69
+ system_prompt_2 = """Please translate the following text field content in english langauge.
70
+ text: {text}
71
+ """
72
+
73
+ def translate_text(file, text_input):
74
+ llm = OpenAI()
75
+
76
+ with open(file.name, 'r', encoding='utf-8') as f:
77
+ file_text = f.read()
78
+
79
+ template_1 = PromptTemplate(input_variables=["text"], template=system_prompt_1)
80
+ prompt_1 = template_1.format(text=file_text)
81
+ file_translation = llm(prompt_1)
82
+
83
+ template_2 = PromptTemplate(input_variables=["text"], template=system_prompt_2)
84
+ prompt_2 = template_2.format(text=file_translation)
85
+ text_translation_op = llm(prompt_2)
86
+
87
+
88
+ template_3 = PromptTemplate(input_variables=["text"], template=system_prompt_2)
89
+ prompt_3 = template_2.format(text=file_text)
90
+ text_translation_ip = llm(prompt_3)
91
+
92
+ output_file_path = "translated_file.txt"
93
+ with open(output_file_path, 'w', encoding='utf-8') as f:
94
+ f.write(file_translation)
95
+
96
+ return text_translation_ip, file_translation, text_translation_op, output_file_path
97
+
98
+ iface = gr.Interface(
99
+ fn=translate_text,
100
+ inputs=[
101
+ gr.File(label="Upload Text File")
102
+
103
+ ],
104
+ outputs=[
105
+ gr.Textbox(label="transcribes content in english Translated of input content"),
106
+ gr.Textbox(label="transcribes content"),
107
+ gr.Textbox(label="transcribes content in english Translated of output content"),
108
+ gr.File(label="Download Translated File Text")
109
+
110
+ ],
111
+ title="Text Transcribes",
112
+ description="Upload a text file and provide a text input to translate the text using LangChain and OpenAI with predefined system prompts.",
113
+ allow_flagging="never"
114
+
115
+ )
116
+
117
+ iface.launch(debug=True)
french.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ La technologie joue un rôle crucial dans notre vie quotidienne. Elle nous permet de rester connectés avec nos proches, de travailler à distance, et d'accéder à une quantité infinie d'informations. Les avancées dans le domaine de l'intelligence artificielle et de la robotique transforment également de nombreux secteurs, de la santé à l'éducation, en passant par les transports. Il est important de comprendre et de maîtriser ces technologies pour pouvoir en tirer le meilleur parti et anticiper les défis futurs qu'elles pourraient poser.
german.txt ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ Titel: Die Schönheit der deutschen Sprache
2
+
3
+ Die deutsche Sprache, reich an Geschichte und Kultur, bietet eine Vielzahl von Ausdrucksmöglichkeiten und Nuancen.
4
+ Sie ist bekannt für ihre langen, zusammengesetzten Wörter und präzisen Begriffe.
5
+ Ein Beispiel hierfür ist das Wort "Donaudampfschifffahrtsgesellschaftskapitän",
6
+ das Kapitän einer Donaudampfschifffahrtsgesellschaft bedeutet.
7
+
8
+ Die Literatur auf Deutsch ist ebenfalls beeindruckend. Dichter und Denker wie Goethe,
9
+ Schiller und Kafka haben Werke geschaffen, die weltweit Anerkennung gefunden haben. Ihre Texte, voll von
10
+ tiefen Gedanken und komplexen Charakteren, laden die Leser ein, in die Tiefen der menschlichen Seele einzutauchen.
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio
2
+ langchain
3
+ python-dotenv
4
+ langchain-community
translated_file.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+
2
+ Titel: Die Schönheit der deutschen Sprache
3
+
4
+ Die deutsche Sprache ist bekannt für ihre reiche Geschichte und Kultur. Sie bietet eine Vielzahl von Möglichkeiten, um sich auszudrücken und Nuancen zu betonen. Ein Beispiel hierfür ist das Wort "Donaudampfschifffahrtsgesellschaftskapitän", welches den Kapitän eines Donaudampfschifffahrtsgesellschaft bedeutet.
5
+
6
+ Auch die Literatur auf Deutsch ist beeindruckend. Schriftsteller und Denker wie Goethe, Schiller und Kafka haben Werke geschaffen, die weltweit Anerkennung gefunden haben. Ihre Texte sind voller tiefer Gedanken und komplexer Charaktere, die den Leser einladen, in die Tiefen der menschlichen Seele einzutauchen.