Spaces:

BachelorThesis
/

README

Running

App Files Files Community

AngryBacteria commited on May 25, 2024

Commit

c76346e

verified ·

1 Parent(s): 9757a41

Update README.md

Browse files

Files changed (1) hide show

README.md +18 -2

README.md CHANGED Viewed

@@ -1,10 +1,26 @@
 ---
 title: README
-emoji: 🏢
 colorFrom: blue
 colorTo: blue
 sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 ---
 title: README
+emoji: 🚀
 colorFrom: blue
 colorTo: blue
 sdk: static
 pinned: false
 ---
+This is the official Repository for the bachelor thesis "Fine-tuning of large language models for the analysis of medical texts" at the [bern university of applied sciences](https://www.bfh.ch/de/).
+Medical documentation is a fundamental part of modern medicine. However, an estimated 80% of medical data is unstructured, which complicates the analysis and further
+processing of information. Additionally, time-consuming documentation is one of the main sources of stress for physicians. The use of artificial intelligence,
+especially Large Language Models (LLMs), offers significant potential for the analysis of medical texts thanks to their advanced language understanding.
+That is why we tried to find out what can be achieved with Open-Source Large Language Models by finetuning them on task specific data. Compared to using big names like
+GPT-4, with locally deployable models all data stays safe in your institution at all times. The focus of the developed models is german unstructured text, which the models
+should be able to extract relevant data out of. Additionally the extracted entities should be normalized and relevant relations/attributes should be identified as well. Also the
+models should be able to create summarizations of clinical texts. The two main participants of the thesis are:
+- Nicolas Gujer ([[email protected]](mailto:[email protected]))
+- Jorma Steiner ([[email protected]](mailto:[email protected]))
+The code for acquiring the necessary data to finetune the models can be found on our [GitHub Repository](https://github.com/AngryBacteria/ba-gujen1-steij14). Some of the datasets we used are not publicly available and
+you have to formally issue a request to the institutions. You can find more information on the individual models, such as their performance and how to use them, in the respective
+model repositories. We used a mix of different kind of data to finetune the models:
+- Two annotated german medical datasets: [BRONCO150](https://www2.informatik.hu-berlin.de/~leser/bronco/index.html) and [Cardio:DE](https://heidata.uni-heidelberg.de/dataset.xhtml?persistentId=doi:10.11588/data/AFYQDY).
+- 220 Synthetic summarizations
+- Data from the coding systems ICD10GM, ATC and OPS