import streamlit as st from openai import OpenAI import os client = OpenAI(api_key=os.environ.get('apikey')) def generate_response(variant, research_paper): response = client.chat.completions.create( model="gpt-4o", response_format={ "type": "json_object" }, messages=[ { "role": "system", "content": [ { "type": "text", "text": "You are a helpful Genomics paper assistant designed to output JSON for genetics experts at Labcorp. Your job is to take the input of a research paper and then output the following questions (use the questions as keys) as a json output. You will given a specific variant to look for in the paper and the paper itself. If you know the different nomenclature for the same varient then assume we are talking about the same one. \n\n(i) Is the variant in the study? – genomic coordinates, HGVS naming conventions, legacy naming conventions and QA to ensure high specificity (True Positive rate). (ii) Where in the article is it located? – abstract/results/tables/multiple tables/tables and text etc./Figure(s) (iii) Is it located in multiple sections (Tables/supplementary/Figures) (iv) Is the study a case report study or a cohort study or a case control study? (v) Does the study describe an individual occurrence or as part of a family or several families with the variant? (vi) What was the extent of genotyping? WES/small panel/few select mutations/extensively genotyped to include CNV’s – sub-criteria to scan each of these would be needed, (vii) Is the variant reported in affected individuals or unaffected controls? (viii) If seen in affected, is there a patient specific ID or case number or pedigree? (ix) Does the patient have a reported phenotype? (yes/no) – the reader can review the phenotype to determine correlation and/or applicability. (x) Are there any affected individuals without the variant? (xi) Are there any unaffected individuals with the variant? (xii) How many affected’ s are there? (xiii) Are all affected’ s from a single family or multiple families? (see V above) (xiv) Is the variant a de-novo event? (xv) Does it locate in a critically important domain? (xvi) Is the variant located in a mutational hotspot (yes/no) – the reader can review the study to determine the relevance. (xvii) Are there any functional studies reported? (yes/no) - the reader can review the study to determine correlation and/or applicability. (xviii) Any conclusion(s) drawn at the variant specific level? – LLM could parse sentences here. at the end also have a key called totalsubmission that should write up something like this: Quest Diagnostics Nichols Institute San Juan Capistrano 2023-02-22 Uncertain significance The frequency of this variant in the general population, 0.003 (105/35414 chromosomes, http://gnomad.broadinstitute.org), is uninformative in assessment of its pathogenicity. In the published literature, the variant has been reported in individuals with cystic fibrosis (CF) (PMID: 12752573 (2003), 16784904 (2007), 17272608 (2007)). It was also reported in an individual with congenital bilateral absence of the vas deferens (CVABD) (PMID: 19897426 (2010)), an individual atypical CF (PMID: 16189704 (2005)), and multiple individuals with borderline/elevated sweat chloride results (PMID: 19014821 (2008), 22043142 (2010)). It was also identified in healthy, unaffected individuals (PMID: 16126774 (2005), 26755536 (2019)). An in vitro study found that this variant in combination with the CFTR p.Phe508del variant resulted in 12.1% of normal CFTR activity, however, the effect of this variant alone on CFTR protein function/activity was not established (PMID: 30888834 (2019)). Analysis of this variant using bioinformatics tools for the prediction of the effect of amino acid changes on protein structure and function yielded predictions that this variant is damaging. Based on the available information, we are unable to determine the clinical significance of this variant." } ] }, {"role": "user", "content": f" variant: {variant} \n research_paper: {research_paper}"}, ], temperature=1, max_tokens=2500, top_p=1, frequency_penalty=0, presence_penalty=0 ) return((response.choices[0].message.content)) st.title("GPT4o Genomics Paper Assistant") variant = st.text_input("Enter the variant you are looking for") research_paper = st.text_area("Enter your research paper here", height=200) if st.button("Generate Questions"): with st.spinner("Generating JSON Structure..."): response = generate_response(variant, research_paper) st.json(response)