YvesP commited on
Commit
2e02639
·
1 Parent(s): 4141dcd

initial commit

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. data/.DS_Store +0 -0
  2. data/AccomodationAndMealsForfaits_en.csv +0 -0
  3. data/AccomodationAndMealsForfaits_en.numbers +0 -0
  4. data/AccomodationAndMealsForfaits_fr.csv +31 -0
  5. data/AccomodationAndMealsForfaits_fr.numbers +0 -0
  6. data/BaremeTauxEloignement.csv +84 -0
  7. data/DeplacementsEtVoyages.docx +0 -0
  8. data/DeplacementsEtVoyagesRev.docx +0 -0
  9. data/ForfaitsRemboursements.csv +31 -0
  10. data/NonPrisEnCharge.csv +14 -0
  11. data/business_trips_content_en.docx +0 -0
  12. data/business_trips_content_fr.docx +0 -0
  13. data/business_trips_content_until_3_en.docx +0 -0
  14. data/business_trips_content_until_3_enfr.docx +0 -0
  15. data/business_trips_content_until_3_fr.docx +0 -0
  16. data/business_trips_content_until_9_en.docx +0 -0
  17. data/business_trips_plan_en.docx +0 -0
  18. data/business_trips_plan_until_3_en.docx +0 -0
  19. data/business_trips_plan_until_3_fr.docx +0 -0
  20. data/business_trips_plan_until_9_en.docx +0 -0
  21. data/transports.docx +0 -0
  22. data/transports_content_en.docx +0 -0
  23. data/transports_content_fr.docx +0 -0
  24. data/transports_plan.docx +0 -0
  25. data/transports_plan_en.docx +0 -0
  26. data/transports_plan_short_en.docx +0 -0
  27. data/transports_plan_short_fr.docx +0 -0
  28. data/~$ansports.docx +0 -0
  29. data/~$ansports_contenu.txt +0 -0
  30. data/~$placementsEtVoyages.docx +0 -0
  31. data/~$siness_trip_plan_until_3_fr.docx +0 -0
  32. data/~$siness_trips_content_until_3_fr.docx +0 -0
  33. data/~$siness_trips_content_until_9_en.docx +0 -0
  34. data/~$siness_trips_plan_until_9_en.docx +0 -0
  35. src/__pycache__/control.cpython-310.pyc +0 -0
  36. src/app.py +91 -0
  37. src/app2.py +16 -0
  38. src/control.py +49 -0
  39. src/control2.py +36 -0
  40. src/domain/__pycache__/container.cpython-310.pyc +0 -0
  41. src/domain/__pycache__/doc.cpython-310.pyc +0 -0
  42. src/domain/__pycache__/paragraph.cpython-310.pyc +0 -0
  43. src/domain/__pycache__/style.cpython-310.pyc +0 -0
  44. src/domain/container.py +136 -0
  45. src/domain/doc.py +71 -0
  46. src/domain/paragraph.py +27 -0
  47. src/domain/project.py +9 -0
  48. src/domain/style.py +121 -0
  49. src/domain/user.py +4 -0
  50. src/tools/__pycache__/llm.cpython-310.pyc +0 -0
data/.DS_Store ADDED
Binary file (6.15 kB). View file
 
data/AccomodationAndMealsForfaits_en.csv ADDED
Binary file (1.91 kB). View file
 
data/AccomodationAndMealsForfaits_en.numbers ADDED
Binary file (160 kB). View file
 
data/AccomodationAndMealsForfaits_fr.csv ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Destination;Hebergement;Repas
2
+ France;125;27
3
+ Allemagne;150;35
4
+ Arabie Saoudite;200;40
5
+ Autriche;110;40
6
+ Belgique;150;35
7
+ Canada;150;30
8
+ Chine;113;37
9
+ Egypte;150;25
10
+ Emirats Arabes Unis;160;46
11
+ Espagne;130;30
12
+ Etats-Unis;140;47
13
+ Gr�ce;140;25
14
+ Inde;160;47
15
+ Irlande;180;30
16
+ Italie;120;37
17
+ Japon;150;25
18
+ Maroc;110;25
19
+ Mexique;130;27
20
+ Norv�ge;160;40
21
+ Pays-Bas;150;32
22
+ Pologne;110;23
23
+ Portugal;108;25
24
+ Qatar;210;35
25
+ Royaume-Uni;130;28
26
+ Russie;180;50
27
+ Singapour;170;42
28
+ Su�de;90;30
29
+ Suisse;192;35
30
+ Taiwan;123;37
31
+ Turquie;150;28
data/AccomodationAndMealsForfaits_fr.numbers ADDED
Binary file (163 kB). View file
 
data/BaremeTauxEloignement.csv ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Tableau 1
2
+ Barème Taux d’Éloignement;
3
+ Afrique du sud;10 %
4
+ Algérie;15 %
5
+ Allemagne;0 %
6
+ Arabie saoudite;12 %
7
+ Argentine;11 %
8
+ Australie;3 %
9
+ Autriche;0 %
10
+ Belgique;0 %
11
+ Bolivie;11 %
12
+ Brésil;11 %
13
+ Bulgarie;10 %
14
+ Cameroun;13 %
15
+ Canada;3 %
16
+ Chili;9 %
17
+ Chine;13 %
18
+ Chypre;4 %
19
+ Colombie;13 %
20
+ Corée;11 %
21
+ Croatie;7 %
22
+ Danemark;0 %
23
+ Djibouti;13 %
24
+ E.A.U;9 %
25
+ Egypte;16 %
26
+ Equateur;12 %
27
+ Espagne;0 %
28
+ Estonie;7 %
29
+ Etats unis;3 %
30
+ Ethiopie;12 %
31
+ Finlande;0 %
32
+ Grande Bretagne;0 %
33
+ Grèce;0 %
34
+ Guadeloupe;3 %
35
+ Guyane;7 %
36
+ Hong Kong;8 %
37
+ Hongrie;6 %
38
+ Ile Maurice;8 %
39
+ Inde;15 %
40
+ Indonésie;17 %
41
+ Irlande;0 %
42
+ Israël;9 %
43
+ Italie;0 %
44
+ Japon;8 %
45
+ Jordanie;10 %
46
+ Kenya;13 %
47
+ Koweït;11 %
48
+ Laos;13 %
49
+ Luxembourg;0 %
50
+ Madagascar;13 %
51
+ Malaisie;14 %
52
+ Maroc;8 %
53
+ Martinique;3 %
54
+ Mauritanie;10 %
55
+ Mexique;12 %
56
+ Mozambique;14 %
57
+ Nigeria;17 %
58
+ Norvège;0 %
59
+ Nouvelle Calédonie;4 %
60
+ Pakistan;17 %
61
+ Pérou;13 %
62
+ Philippines;16 %
63
+ Pologne;8 %
64
+ Polynésie;5 %
65
+ Portugal;0 %
66
+ Qatar;9 %
67
+ République Congo;14 %
68
+ République tchèque;6 %
69
+ Roumanie;11 %
70
+ Russie;13 %
71
+ Sénégal;10 %
72
+ Serbie;11 %
73
+ Singapour;6 %
74
+ Slovaquie;6 %
75
+ Sri Lanka;15 %
76
+ Suède;0 %
77
+ Suisse;0 %
78
+ Taiwan;11 %
79
+ Thaïlande;12 %
80
+ Tunisie;7 %
81
+ Turquie;10 %
82
+ Ukraine;12 %
83
+ Venezuela;13 %
84
+ Vietnam;13 %
data/DeplacementsEtVoyages.docx ADDED
Binary file (709 kB). View file
 
data/DeplacementsEtVoyagesRev.docx ADDED
Binary file (723 kB). View file
 
data/ForfaitsRemboursements.csv ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Destination;;Hébergement;;Repas
2
+ France;;IDF 125€ / Province 100€;;27 €
3
+ Allemagne;;150 € ;;35 €
4
+ Arabie Saoudite;;200 € ;;40 €
5
+ Autriche;;110 € ;;40 €
6
+ Belgique;;150 € ;;35 €
7
+ Canada;;150 € ;;30 €
8
+ Chine;;113 € ;;37 €
9
+ Egypte;;150 € ;;25 €
10
+ Emirats Arabes Unis;;160 € ;;46 €
11
+ Espagne;;130 € ;;30 €
12
+ Etats-Unis;;140 € ;;47 €
13
+ Grèce;;140 € ;;25 €
14
+ Inde;;160 € ;;47 €
15
+ Irlande;;180 € ;;30 €
16
+ Italie;;120 € ;;37 €
17
+ Japon;;150 € ;;25 €
18
+ Maroc;;110 € ;;25 €
19
+ Mexique;;130 € ;;27 €
20
+ Norvège;;160 € ;;40 €
21
+ Pays-Bas;;150 € ;;32 €
22
+ Pologne;;110 € ;;23 €
23
+ Portugal;;108 € ;;25 €
24
+ Qatar;;210 € ;;35 €
25
+ Royaume-Uni;;130 € ;;28 €
26
+ Russie;;180 € ;;50 €
27
+ Singapour;;170 € ;;42 €
28
+ Suède;;90 € ;;30 €
29
+ Suisse;;192 € ;;35 €
30
+ Taiwan;;123 € ;;37 €
31
+ Turquie;;150 € ;;28 €
data/NonPrisEnCharge.csv ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Non pris en charge via Note de Frais
2
+ "Matériel informatique : téléphone, chargeur, tablette, adaptateur prise, etc."
3
+ "Outillage : balai, tournevis, disque de disqueuse, etc."
4
+ "Mobilier/Aménagement de bureau : plantes, dalles, poufs, etc."
5
+ "Fournitures de bureau : café, piles, etc."
6
+ Conférence/Cotisation
7
+ Séminaire/Réunion Team Building
8
+ Doublon de clés
9
+ Equipement de Protection Individuelle (EPI)
10
+ Achat de bagagerie : neuf/perdu/endommagé
11
+ Lavage et recharge Carte de Lavage : tous types de véhicule
12
+ Consommation alcoolisée
13
+ "Collation : confiserie, gâteau, boisson, etc."
14
+ "Prestation de loisirs : Spa, piscine, massage, remontée mécanique, escape game, etc."
data/business_trips_content_en.docx ADDED
Binary file (42.7 kB). View file
 
data/business_trips_content_fr.docx ADDED
Binary file (70 kB). View file
 
data/business_trips_content_until_3_en.docx ADDED
Binary file (42.7 kB). View file
 
data/business_trips_content_until_3_enfr.docx ADDED
Binary file (65.7 kB). View file
 
data/business_trips_content_until_3_fr.docx ADDED
Binary file (70.1 kB). View file
 
data/business_trips_content_until_9_en.docx ADDED
Binary file (48.9 kB). View file
 
data/business_trips_plan_en.docx ADDED
Binary file (50.5 kB). View file
 
data/business_trips_plan_until_3_en.docx ADDED
Binary file (36.1 kB). View file
 
data/business_trips_plan_until_3_fr.docx ADDED
Binary file (35.6 kB). View file
 
data/business_trips_plan_until_9_en.docx ADDED
Binary file (37.4 kB). View file
 
data/transports.docx ADDED
Binary file (41.2 kB). View file
 
data/transports_content_en.docx ADDED
Binary file (40.4 kB). View file
 
data/transports_content_fr.docx ADDED
Binary file (41.2 kB). View file
 
data/transports_plan.docx ADDED
Binary file (35.5 kB). View file
 
data/transports_plan_en.docx ADDED
Binary file (35.5 kB). View file
 
data/transports_plan_short_en.docx ADDED
Binary file (35.1 kB). View file
 
data/transports_plan_short_fr.docx ADDED
Binary file (35 kB). View file
 
data/~$ansports.docx ADDED
Binary file (162 Bytes). View file
 
data/~$ansports_contenu.txt ADDED
Binary file (162 Bytes). View file
 
data/~$placementsEtVoyages.docx ADDED
Binary file (162 Bytes). View file
 
data/~$siness_trip_plan_until_3_fr.docx ADDED
Binary file (162 Bytes). View file
 
data/~$siness_trips_content_until_3_fr.docx ADDED
Binary file (162 Bytes). View file
 
data/~$siness_trips_content_until_9_en.docx ADDED
Binary file (162 Bytes). View file
 
data/~$siness_trips_plan_until_9_en.docx ADDED
Binary file (162 Bytes). View file
 
src/__pycache__/control.cpython-310.pyc ADDED
Binary file (1.94 kB). View file
 
src/app.py ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+
3
+
4
+ import src.control as ctrl
5
+
6
+
7
+ """
8
+ ==================================
9
+ A. Component part
10
+ ==================================
11
+ """
12
+
13
+ with gr.Blocks() as hrqa:
14
+
15
+ with gr.Row():
16
+
17
+ with gr.Column():
18
+ pass
19
+
20
+ with gr.Column(scale=10):
21
+ """
22
+ 1. input docs components
23
+ """
24
+
25
+ gr.Markdown("# Questions sur le vivre ensemble en entreprise")
26
+
27
+ input_text_comp = gr.Textbox(
28
+ label="",
29
+ lines=1,
30
+ max_lines=3,
31
+ interactive=True,
32
+ placeholder="Posez votre question ici",
33
+ )
34
+ input_example_comp = gr.Radio(
35
+ label="Examples de questions",
36
+ choices=["Remboursement de frais de voiture", "Recommandations de transport"],
37
+ )
38
+ output_text_comp = gr.Textbox(
39
+ label="La réponse automatique",
40
+ lines=2,
41
+ max_lines=10,
42
+ interactive=False,
43
+ visible=False,
44
+ )
45
+ sources_comp = gr.CheckboxGroup(
46
+ label="Documents sources",
47
+ visible=False,
48
+ interactive=False,
49
+ )
50
+
51
+ with gr.Column():
52
+ pass
53
+
54
+
55
+ def input_text_fn1():
56
+ update_ = {
57
+ output_text_comp: gr.update(visible=True),
58
+ }
59
+ return update_
60
+
61
+ def input_text_fn2(input_text_):
62
+ answer, sources = ctrl.get_response(query=input_text_)
63
+ source_labels = [s['distance']+' '+s['paragraph']+' '+s['title']+' from '+s['doc'] for s in sources]
64
+ update_ = {
65
+ output_text_comp: gr.update(value=answer),
66
+ sources_comp: gr.update(visible=True, choices=source_labels, value=source_labels)
67
+ }
68
+ return update_
69
+
70
+
71
+ def input_example_fn(input_example_):
72
+ examples = {
73
+ "Remboursement de frais de voiture": "Comment sont remboursés mes frais kilométriques sur mes trajets "
74
+ "professionnels?",
75
+ "Recommandations de transport": "Quelles sont les recommandations de l'entreprise? Vaut-il mieux voyager en "
76
+ "train ou en avion?"
77
+ }
78
+ update_ = {
79
+ input_text_comp: gr.update(value=examples[input_example_]),
80
+ output_text_comp: gr.update(visible=True),
81
+ }
82
+ return update_
83
+
84
+ input_text_comp\
85
+ .submit(input_text_fn1, inputs=[], outputs=[output_text_comp])\
86
+ .then(input_text_fn2, inputs=[input_text_comp], outputs=[output_text_comp, sources_comp])
87
+ input_example_comp\
88
+ .change(input_example_fn, inputs=[input_example_comp], outputs=[input_text_comp, output_text_comp])\
89
+ .then(input_text_fn2, inputs=[input_text_comp], outputs=[output_text_comp, sources_comp])
90
+
91
+ hrqa.queue().launch()
src/app2.py ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain.agents import create_csv_agent
2
+ from langchain.agents import create_pandas_dataframe_agent
3
+ import src.tools.llm as llm
4
+
5
+ import pandas as pd
6
+
7
+ path = '../data/AccomodationAndMealsForfaits_en.csv'
8
+ #path = '../data/test_utf32.csv'
9
+ df = pd.read_csv(path, encoding='utf32', sep=";")
10
+ agent = create_pandas_dataframe_agent(llm.OpenAI(temperature=0), df, verbose=True)
11
+ refund = agent.run("Quel est le remboursement pour un repas en Turkiye?")
12
+ print(refund)
13
+
14
+
15
+
16
+ pass
src/control.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import chromadb
2
+
3
+ import src.tools.retriever as rtrvr
4
+ import src.tools.llm as llm
5
+ from src.domain.doc import Doc
6
+
7
+ chroma_client = chromadb.Client()
8
+
9
+ plan_language = 'en'
10
+ content_language = 'en'
11
+ path_plan = '../data/business_trips_plan_until_9_en.docx'
12
+ path_content = '../data/business_trips_content_until_9_en.docx'
13
+ collection_name = "until_9"
14
+
15
+ doc_plan = Doc(path_plan)
16
+ doc_content = Doc(path_content)
17
+ collection_ = rtrvr.init_collections(chroma_client, doc_plan, doc_content, collection_name)
18
+
19
+
20
+ def get_response(query):
21
+ if plan_language == 'en':
22
+ query = llm.translate(query)
23
+ sources = rtrvr.similarity_search(collection=collection_, query=query)
24
+ sources = select_best_sources(sources)
25
+ sources_contents = [s['content'] for s in sources]
26
+ context = '\n'.join(sources_contents)
27
+ answer = llm.generate_paragraph(query=query, context=context, language=content_language)
28
+ if content_language == 'en':
29
+ answer = llm.translate(text=answer, language='fr')
30
+ return answer.lstrip(), sources
31
+
32
+
33
+ def select_best_sources(sources: [], delta_1_2=0.1, delta_1_n=0.25, absolute=1.1) -> []:
34
+ best_sources = []
35
+ for idx, s in enumerate(sources):
36
+ if idx == 0 \
37
+ or (s['distance_f'] - sources[idx - 1]['distance_f'] < delta_1_2
38
+ and s['distance_f'] - sources[0]['distance_f'] < delta_1_n) \
39
+ or s['distance_f'] < absolute:
40
+ best_sources.append(s)
41
+ return best_sources
42
+
43
+
44
+ q1 = "Comment sont remboursés mes frais kilométriques sur mes déplacements avec mon véhicule personnel?"
45
+ q2 = "Quels sont les moyens de transport recommandés par la société?"
46
+ q3 = "est-ce que mes billets de cinéma peuvent être remboursés?"
47
+
48
+ a2 = get_response(q3)
49
+ print(a2)
src/control2.py ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from langchain.agents import AgentType, initialize_agent
2
+ from langchain.tools import BaseTool, StructuredTool, Tool, tool
3
+
4
+
5
+ from src.control import *
6
+
7
+
8
+ @tool
9
+ def similarity_search(query: str) -> str:
10
+ """
11
+ useful for when you look for relevant content about business trip policy : transport, accomodation, etc.
12
+ """
13
+ query = llm.translate(query)
14
+ sources = rtrvr.similarity_search(collection=collection_, query=query)
15
+ sources = select_best_sources(sources)
16
+ sources_contents = [s['content'] for s in sources]
17
+ context = '\n'.join(sources_contents)
18
+ return context
19
+
20
+
21
+ @tool
22
+ def generate_answer(query_and_context: str) -> str:
23
+ """
24
+ useful for when you have a query and the relevant content to generate an answer
25
+ """
26
+ answer = llm.generate_paragraph2(query_and_context=query_and_context, language='en')
27
+ answer = llm.translate(text=answer, language='fr')
28
+ return answer.lstrip()
29
+
30
+
31
+ tools = [similarity_search, generate_answer]
32
+
33
+ agent = initialize_agent(tools, llm.openai_llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
34
+ q1 = "Comment sont remboursés mes frais kilométriques sur mes déplacements avec mon véhicule personnel?"
35
+ q2 = "Quels sont les moyens de transport recommandés par la société?"
36
+ ans = agent.run(q2)
src/domain/__pycache__/container.cpython-310.pyc ADDED
Binary file (3.88 kB). View file
 
src/domain/__pycache__/doc.cpython-310.pyc ADDED
Binary file (2.61 kB). View file
 
src/domain/__pycache__/paragraph.cpython-310.pyc ADDED
Binary file (998 Bytes). View file
 
src/domain/__pycache__/style.cpython-310.pyc ADDED
Binary file (1.57 kB). View file
 
src/domain/container.py ADDED
@@ -0,0 +1,136 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from src.domain.paragraph import Paragraph
2
+
3
+ INFINITE = 10000
4
+
5
+
6
+ class Container:
7
+
8
+ def __init__(self, paragraphs: [Paragraph], title: Paragraph = None, level: int = 0, rank: int = 0, father=None,
9
+ id_=0):
10
+ self.level = level
11
+ self.title = title
12
+ self.paragraphs = []
13
+ self.children = []
14
+ self.rank = rank
15
+ self.father = father # if not father, then the container is at the top of the hierarchy
16
+ self.id_ = int(str(1) + str(father.id_) + str(id_))
17
+ if paragraphs:
18
+ self.paragraphs, self.children = self.create_children(paragraphs, level, rank + 1)
19
+
20
+ @property
21
+ def text(self):
22
+ text = ""
23
+ if self.title:
24
+ text = "Titre " + str(self.level) + " : " + self.title.text + '\n'
25
+ for p in self.paragraphs:
26
+ text += p.text + '\n'
27
+ for child in self.children:
28
+ text += child.text
29
+ return text
30
+
31
+ @property
32
+ def text_chunks(self, chunk=500):
33
+ text_chunks = []
34
+ text_chunk = ""
35
+ for p in self.paragraphs:
36
+ if chunk < len(text_chunk) + len(p.text):
37
+ text_chunks.append(text_chunk)
38
+ text_chunk = ""
39
+ else:
40
+ text_chunk += " " + p.text
41
+ if text_chunk and not text_chunk.isspace():
42
+ text_chunks.append(text_chunk)
43
+ for child in self.children:
44
+ text_chunks += child.text_chunks
45
+ return text_chunks
46
+
47
+ @property
48
+ def blocks(self):
49
+ block = {'content': "", 'rank': self.rank, 'level': self.level, 'title': ''}
50
+ if self.title:
51
+ block['title'] = self.title.text
52
+ for p in self.paragraphs:
53
+ block['content'] += p.text + '. '
54
+ blocks = [block]
55
+ for child in self.children:
56
+ blocks += child.blocks
57
+ return blocks
58
+
59
+ @property
60
+ def table_of_contents(self):
61
+ toc = []
62
+ if self.title:
63
+ toc += [{str(self.level): self.title.text}]
64
+ if self.children:
65
+ for child in self.children:
66
+ toc += child.table_of_contents
67
+ return toc
68
+
69
+ def move(self, position: int, new_father=None):
70
+ current_father = self.father # should be added in the domain
71
+ current_father.children.remove(self)
72
+
73
+ self.rank = new_father.rank + 1 if new_father else 0
74
+ self.father = new_father
75
+ if position < len(new_father.children):
76
+ new_father.children.insert(position, self)
77
+ else:
78
+ new_father.children.append(self)
79
+
80
+ def create_children(self, paragraphs, level, rank) -> ([], []):
81
+ """
82
+ creates children containers or directly attached content
83
+ and returns the list of containers and contents of level+1
84
+ :return:
85
+ [Content or Container]
86
+ """
87
+ attached_paragraphs = []
88
+ container_paragraphs = []
89
+ container_title = None
90
+ children = []
91
+ in_children = False
92
+ level = INFINITE
93
+ child_id = 0
94
+
95
+ while paragraphs:
96
+ p = paragraphs.pop(0)
97
+ if not in_children and not p.is_structure:
98
+ attached_paragraphs.append(p)
99
+ else:
100
+ in_children = True
101
+ if p.is_structure and p.level <= level: # if p is higher or equal in hierarchy
102
+ if container_paragraphs or container_title:
103
+ children.append(Container(container_paragraphs, container_title, level, rank, self, child_id))
104
+ child_id += 1
105
+ container_paragraphs = []
106
+ container_title = p
107
+ level = p.level
108
+
109
+ else: # p is strictly lower in hierarchy
110
+ container_paragraphs.append(p)
111
+
112
+ if container_paragraphs or container_title:
113
+ children.append(Container(container_paragraphs, container_title, level, rank, self, child_id))
114
+ child_id += 1
115
+
116
+ return attached_paragraphs, children
117
+
118
+ @property
119
+ def structure(self):
120
+
121
+ self_structure = {str(self.id_): {
122
+ 'index': str(self.id_),
123
+ 'canMove': True,
124
+ 'isFolder': True,
125
+ 'children': [p.id_ for p in self.paragraphs] + [child.id_ for child in self.children],
126
+ 'canRename': True,
127
+ 'data': {},
128
+ 'level': self.level,
129
+ 'rank': self.rank,
130
+ 'title': self.title.text if self.title else 'root'
131
+ }}
132
+ paragraphs_structure = [p.structure for p in self.paragraphs]
133
+ structure = [self_structure] + paragraphs_structure
134
+ for child in self.children:
135
+ structure += child.structure
136
+ return structure
src/domain/doc.py ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import docx
2
+
3
+ from src.domain.container import Container
4
+ from src.domain.paragraph import Paragraph
5
+ from src.domain.style import Style
6
+
7
+
8
+ class Doc:
9
+
10
+ def __init__(self, path='', id_=None):
11
+
12
+ self.xdoc = docx.Document(path)
13
+ self.title = path.split('/')[-1]
14
+ self.id_ = id(self)
15
+ self.path = path
16
+ paragraphs = [Paragraph(xp, self.id_, i) for (i, xp) in enumerate(self.xdoc.paragraphs)]
17
+ self.container = Container(paragraphs, father=self)
18
+ self.styles = [Style(xs, self.id_, i) for (i, xs) in enumerate(self.xdoc.styles)]
19
+
20
+ def save_as_docx(self, path):
21
+ self.xdoc.save(path)
22
+
23
+ def apply_styles_from(self, ref_doc):
24
+
25
+ ref_doc_styles_names = [s.xstyle.name for s in ref_doc.styles]
26
+ common_styles = [s for s in self.styles if s.xstyle.name in ref_doc_styles_names]
27
+
28
+ for s in common_styles:
29
+ s.copy_from(ref_doc.xdoc.styles[s.xstyle.name])
30
+
31
+ @property
32
+ def structure(self):
33
+
34
+ return self.container.structure
35
+
36
+ @property
37
+ def blocks(self):
38
+
39
+ def from_list_to_str(index_list):
40
+ index_str = str(index_list[0])
41
+ for el in index_list[1:]:
42
+ index_str += '.' + str(el)
43
+ return index_str
44
+
45
+ current_index = []
46
+ blocks = []
47
+ for block in self.container.blocks:
48
+ block['doc'] = self.title
49
+ current_level = len(current_index)
50
+ if 0 < block['level']:
51
+ if block['level'] == current_level:
52
+ current_index[-1] += 1
53
+ elif current_level < block['level']:
54
+ current_index.append(1)
55
+ elif block['level'] < current_level:
56
+ current_index = current_index[:block['level']]
57
+ current_index[-1] += 1
58
+ block['paragraph'] = from_list_to_str(current_index)
59
+ else:
60
+ block['paragraph'] = "0"
61
+ blocks.append(block)
62
+ return blocks
63
+
64
+
65
+
66
+
67
+
68
+
69
+
70
+
71
+
src/domain/paragraph.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ INFINITE = 10000
2
+
3
+
4
+ class Paragraph:
5
+
6
+ def __init__(self, xparagraph, doc_id: int, id_: int):
7
+
8
+ self.xparagraph = xparagraph
9
+ self.id_ = int(str(2)+str(doc_id)+str(id_))
10
+ name = self.xparagraph.style.name
11
+ self.level = int(name.split(' ')[-1]) if 'Heading' in name else INFINITE
12
+ self.is_structure = self.level < INFINITE
13
+ self.text = self.xparagraph.text
14
+
15
+ @property
16
+ def structure(self):
17
+ structure = {str(self.id_): {
18
+ 'index': str(self.id_),
19
+ 'canMove': True,
20
+ 'isFolder': False,
21
+ 'children': [],
22
+ 'title': self.text,
23
+ 'canRename': True,
24
+ 'data': {},
25
+ 'level': self.level,
26
+ }}
27
+ return structure
src/domain/project.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ from src.domain.doc import Doc
2
+
3
+
4
+ class Project:
5
+
6
+ def __init__(self, name: str, docs: [Doc]):
7
+
8
+ self.docs = docs
9
+ self.name = name
src/domain/style.py ADDED
@@ -0,0 +1,121 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from docx.enum.style import WD_STYLE_TYPE
2
+ class Style:
3
+
4
+ def __init__(self, xstyle, doc_id, id_):
5
+
6
+ self.id_ = int(str(doc_id)+str(id_))
7
+ self.xstyle = xstyle
8
+ #self.new_style = self.copy_from
9
+
10
+ def copy_from(self, xref): # need to be further developed
11
+
12
+ if xref.type == WD_STYLE_TYPE.PARAGRAPH:
13
+ self.xstyle.font.size = xref.font.size
14
+ self.xstyle.font.color.rgb = xref.font.color.rgb
15
+ self.xstyle.font.name = xref.font.name
16
+ self.xstyle.font.all_caps = xref.font.all_caps
17
+ # Read/write. Causes text in this font to appear in capital letters.
18
+ self.xstyle.font.bold = xref.font.bold
19
+ # Read/write. Causes text in this font to appear in bold.
20
+ self.xstyle.font.complex_script= xref.font.complex_script
21
+ # Read/write tri-state value. When True, causes the characters in
22
+ # the run to be treated as complex script regardless of their Unicode values.
23
+ # "complex script" refers to text written using a complex writing system such as Arabic, Hebrew, Tamil,
24
+ # Persian, and others.These scripts require special typesetting and handling because they have different
25
+ # writing directions, glyph connections, and letter shape variations. Word provides features that support
26
+ # these complex scripts, allowing users to easily create, edit, and format this type of text.
27
+ self.xstyle.font.cs_bold = xref.font.cs_bold
28
+ # Read/write tri-state value. When True, causes the complex script characters
29
+ # in the run to be displayed in bold typeface.
30
+ self.xstyle.font.cs_italic = xref.font.cs_italic
31
+ # Read/write tri-state value. When True, causes the complex script characters
32
+ # in the run to be displayed in italic typeface
33
+ self.xstyle.font.double_strike = xref.font.double_strike
34
+ # Read/write tri-state value. When True, causes the text in the run to appear with double strikethrough.
35
+ self.xstyle.font.emboss = xref.font.emboss
36
+ # Read/write tri-state value. When True, causes the text in the run to appear
37
+ # as if raised off the page in relief.
38
+ self.xstyle.font.hidden = xref.font.hidden
39
+ # Read/write tri-state value. When True, causes the text in the run to be hidden from display,
40
+ # unless applications settings force hidden text to be shown.
41
+ self.xstyle.font.highlight_color = xref.font.highlight_color
42
+ # A member of WD_COLOR_INDEX indicating the color of highlighting applied,
43
+ # or None if no highlighting is applied.
44
+ self.xstyle.font.imprint = xref.font.imprint
45
+ # Read/write tri-state value. When True,
46
+ # causes the text in the run to appear as if pressed into the page.
47
+ self.xstyle.font.italic = xref.font.italic
48
+ self.xstyle.font.math = xref.font.math
49
+ self.xstyle.font.no_proof = xref.font.no_proof
50
+ # Read/write tri-state value. When True, specifies that the contents of this run
51
+ # should not report any errors when the document is scanned for spelling and grammar.
52
+ self.xstyle.font.outline = xref.font.outline
53
+ # Read/write tri-state value. When True causes the characters in the run to appear as if they
54
+ # have an outline, by drawing a one pixel wide border around the inside and
55
+ # outside borders of each character glyph.
56
+ self.xstyle.font.rtl = xref.font.rtl
57
+ # Read/write tri-state value. When True causes the text in the
58
+ # run to have right-to-left characteristics.
59
+ self.xstyle.font.shadow = xref.font.shadow
60
+ self.xstyle.font.small_caps = xref.font.small_caps
61
+ self.xstyle.font.snap_to_grid = xref.font.snap_to_grid
62
+ # Read/write tri-state value. When True causes the run to use the document grid characters per line
63
+ # settings defined in the docGrid element when laying out the characters in this run.
64
+ # Snap to grid" is a layout feature that helps users align text boxes, images, or other objects precisely
65
+ # to a virtual gridline, ensuring consistent spacing and alignment of objects in a document. It improves the
66
+ # visual appearance of a document and makes it easier to read and understand. This feature is particularly
67
+ # useful for creating large documents such as reports, posters, and flyers, making them look more
68
+ # professional, organized, and readable."""
69
+ self.xstyle.font.spec_vanish = xref.font.spec_vanish
70
+ # Read/write tri-state value. When True, specifies that the given run shall always behave as if it is
71
+ # hidden, even when hidden text is being displayed in the current document. The property has a very narrow,
72
+ # specialized use related to the table of contents.
73
+ self.xstyle.font.strike = xref.font.strike
74
+ # Read/write tri-state value. When True causes the text in the run to appear with a single horizontal line
75
+ # through the center of the line.
76
+ self.xstyle.font.subscript = xref.font.subscript
77
+ # Boolean indicating whether the characters in this Font appear as subscript. None indicates the
78
+ # subscript/subscript value is inherited from the style hierarchy.
79
+ self.xstyle.font.superscript = xref.font.superscript
80
+ self.xstyle.font.underline = xref.font.underline
81
+ self.xstyle.font.web_hidden = xref.font.web_hidden
82
+ # Using the "Web hidden" property allows us to create multiple versions of a document where some content
83
+ # can be hidden, while other content can be displayed publicly. For example, in a resume, you can use the
84
+ # "Web hidden" property to hide private information such as phone numbers and addresses. This information
85
+ # will only be displayed when an employer chooses to view it.
86
+
87
+ self.xstyle.base_style = xref.base_style
88
+ # Style object this style inherits from or None if this style is not based on another style.
89
+ # self.xstyle.builtin = xref.builtin
90
+ self.xstyle.hidden = xref.hidden
91
+ # True if display of this style in the style gallery and list of recommended styles is suppressed.
92
+ # False otherwise. In order to be shown in the style gallery, this value must be False and quick_style
93
+ # must be True.
94
+ self.xstyle.locked = xref.locked
95
+ # True if this style is locked. not appear in the styles panel or the style gallery and cannot be applied
96
+ # to document content
97
+ self.xstyle.name = xref.name
98
+ self.xstyle.priority = xref.priority
99
+ # The integer sort key governing display sequence of this style in the Word UI. None indicates no setting
100
+ # is defined, causing Word to use the default value of 0. Style name is used as a secondary sort key to
101
+ # resolve ordering of styles having the same priority value.
102
+ # In Microsoft Word, "priority" is typically used to describe the importance of markers and comments to
103
+ # help authors and editors determine the urgency and priority of the feedback and changes being provided.
104
+ # For example, a document may use priority markers such as "high," "medium," "low," etc.
105
+ # to indicate issues that need to be addressed with a higher priority.
106
+
107
+ self.xstyle.quick_style = xref.quick_style
108
+ # True if this style should be displayed in the style gallery when hidden is False. Read/write Boolean.
109
+ # for example, Quick Styles can be found in the "Styles" group on the "Home" tab.
110
+ # self.xstyle.type = xref.type
111
+ self.xstyle.unhide_when_used = xref.unhide_when_used
112
+ # True if an application should make this style visible the next time it is applied to content.
113
+ # False otherwise. Note that python-docx does not automatically unhide a style having True for this
114
+ # attribute when it is applied to content.
115
+
116
+ # "unhide_when_used" can refer to a feature in Microsoft Excel. It is a cell format option that allows the
117
+ # cell to automatically show when it is being used and hide when it is not being used. This is useful when
118
+ # dealing with complex worksheets as it helps users manage and organize data better. When the user needs to
119
+ # edit or input data, the cell will automatically show, and once the user has completed the operation, the
120
+ # cell will automatically hide to better present the data.
121
+
src/domain/user.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ class User:
2
+
3
+ def __init__(self, username, ):
4
+ self.name = username
src/tools/__pycache__/llm.cpython-310.pyc ADDED
Binary file (2.06 kB). View file