AngryBacteria commited on
Commit
c76346e
·
verified ·
1 Parent(s): 9757a41

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -2
README.md CHANGED
@@ -1,10 +1,26 @@
1
  ---
2
  title: README
3
- emoji: 🏢
4
  colorFrom: blue
5
  colorTo: blue
6
  sdk: static
7
  pinned: false
8
  ---
 
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: README
3
+ emoji: 🚀
4
  colorFrom: blue
5
  colorTo: blue
6
  sdk: static
7
  pinned: false
8
  ---
9
+ This is the official Repository for the bachelor thesis "Fine-tuning of large language models for the analysis of medical texts" at the [bern university of applied sciences](https://www.bfh.ch/de/).
10
 
11
+ Medical documentation is a fundamental part of modern medicine. However, an estimated 80% of medical data is unstructured, which complicates the analysis and further
12
+ processing of information. Additionally, time-consuming documentation is one of the main sources of stress for physicians. The use of artificial intelligence,
13
+ especially Large Language Models (LLMs), offers significant potential for the analysis of medical texts thanks to their advanced language understanding.
14
+ That is why we tried to find out what can be achieved with Open-Source Large Language Models by finetuning them on task specific data. Compared to using big names like
15
+ GPT-4, with locally deployable models all data stays safe in your institution at all times. The focus of the developed models is german unstructured text, which the models
16
+ should be able to extract relevant data out of. Additionally the extracted entities should be normalized and relevant relations/attributes should be identified as well. Also the
17
+ models should be able to create summarizations of clinical texts. The two main participants of the thesis are:
18
+ - Nicolas Gujer ([[email protected]](mailto:[email protected]))
19
+ - Jorma Steiner ([[email protected]](mailto:[email protected]))
20
+
21
+ The code for acquiring the necessary data to finetune the models can be found on our [GitHub Repository](https://github.com/AngryBacteria/ba-gujen1-steij14). Some of the datasets we used are not publicly available and
22
+ you have to formally issue a request to the institutions. You can find more information on the individual models, such as their performance and how to use them, in the respective
23
+ model repositories. We used a mix of different kind of data to finetune the models:
24
+ - Two annotated german medical datasets: [BRONCO150](https://www2.informatik.hu-berlin.de/~leser/bronco/index.html) and [Cardio:DE](https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/AFYQDY).
25
+ - 220 Synthetic summarizations
26
+ - Data from the coding systems ICD10GM, ATC and OPS