fiboa-demo / fiboa /app.py
cholmes's picture
improvements for percent, area, crop querying
39bf231 verified
raw
history blame
11.3 kB
import ibis
import os
import streamlit as st
from langchain.chains import create_sql_query_chain
from langchain_community.utilities import SQLDatabase
from langchain_core.prompts.prompt import PromptTemplate
from langchain_openai import ChatOpenAI
from query import execute_prompt
# from data import DATA
if os.path.exists("duck.db"):
os.remove("duck.db")
if os.path.exists("duck.db.wal"):
os.remove("duck.db.wal")
geoparquet = "data.parquet"
con = ibis.connect("duckdb://duck.db", extensions = ["spatial"])
con.read_parquet(geoparquet, "crops").cast({"geometry": "geometry"})
# for code, url in DATA.items():
# tbl = con.read_parquet(url, code).cast({"geometry": "geometry"})
st.set_page_config(
page_title="fiboaGPT",
page_icon="🦜",
)
st.title("fiboaGPT")
new_prompt = PromptTemplate(input_variables=['dialect', 'input', 'table_info', 'top_k'],
template=
'''
Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query
and return the answer. Only limit for {top_k} when asked for "some" or "examples".
This duckdb database includes full support for spatial queries, so it will understand most PostGIS-type
queries as well. Remember that you must cast blob column to a geom type using ST_GeomFromWKB(geometry) AS geometry
before any spatial operations. Do not use ST_GeomFromWKB for non-spatial queries.
If you are asked to "map" or "show on a map", then be select the "geometry" column in your query.
If asked to show a "table", you must not include the "geometry" column from the query results.
Use the following format: return only the SQLQuery to run. DO NOT use the prefix with "SQLQuery:".
Do not include an explanation. Do only use SQL functions that DuckDB supports.
Use ONLY the column names that you can see in the table description.
Do NOT query for columns that do not exist. Pay attention to which column is in which table.
Tables include {table_info}. The data comes comes always from the table called "crops".
NEVER use the "testing" table. Pay close attention to this table schema.
The column "area" is in the unit hectares, you may need to convert it to other units, e.g. square meters.
There is no other column related to area information, especially not total_area or similar!
If you need to compute the total area, do it manually, with a SUM of the area column. You should always use the 'area' column - never use a 'total_area' column.
The column "perimeter" is in the unit meters, you may need to convert it to other units, e.g. kilometers.
The column "collection" contains the country codes for the Baltic states:
"ec_lt" for Latvia, "ec_lv" for Lithuania, "ec_es" for Estonia.
Many questions will be about specific crops. The following is the list of strings that will appear in the crop_type column:
arable_crops, cereal, common_soft_wheat, winter_common_soft_wheat, spring_common_soft_wheat, unspecified_season_common_soft_wheat, durum_hard_wheat, winter_durum_hard_wheat, spring_durum_hard_wheat, unspecified_season_durum_hard_wheat, rye, winter_rye, spring_rye, unspecified_season_rye, barley, winter_barley, spring_barley, unspecified_season_barley, oats, winter_oats, spring_oats, unspecified_season_oats, grain_maize_corn_popcorn, unspecified_season_grain_maize_corn_popcorn, rice, unspecified_season_rice, triticale, winter_triticale, spring_triticale, unspecified_season_triticale, millet_sorghum, winter_millet_sorghum, spring_millet_sorghum, teff, unspecified_season_millet_sorghum, spelt, winter_spelt, spring_spelt, unspecified_season_spelt, meslin, winter_meslin, spring_meslin, unspecified_season_meslin, emmer, winter_emmer, spring_emmer, unspecified_season_emmer, einkorn, winter_einkorn, spring_einkorn, unspecified_season_einkorn, canary_seed_canaryseed, unspecified_season_canary_seed_canaryseed, unspecified_cereals, winter_unspecified_cereals, spring_unspecified_cereals, summer_unspecified_cereals, unspecified_season_unspecified_cereals, other_cereals, unspecified_season_other_cereals, legumes_dried_pulses_protein_crops, beans, chickpeas, esparsette_onobrychis, fenugreek, lentils, peas, sweet_lupins, unspecified_legumes_dried_pulses_protein_crops, other_dry_pulses, potatoes, sweet_potatoes, fodder_roots, industrial_nonfood_crops, tobacco, hops, cotton, rapeseed_rape, winter_rapeseed_rape, spring_rapeseed_rape, summer_rapeseed_rape, unspecified_season_rapeseed_rape, sunflower, poppy, winter_poppy, summer_poppy, flax_linseed, flax_linen, flax_linseed_oil, oilseed_crops, guizotia_abyssinica_nyger, hemp_cannabis, finola, fibre_crops, aromatic_medicinal_culinary_plants_spices_herbs, actaea_baneberry_christopher_herbs, alchemilla_ladys_mantle, anethum_dill, angelica, anise_aniseed, artemisia, basil, black_cumin, borage, calendula_marigold, caraway, catnip, chamomile, chervil, coriander, ericaceae_heather, galium_bedstraw, hibiscus, lavender_lavandula, lemon_balm_melissa, lovage_maggiplant, mints_peppermint, moldavian_dragonhead, nasturtiums, nettles, oregano, parsly, piper_pepper, polygonum, rosemary, rubia_tinctorum_common_madder, saffron_crocus_sativus, silver_comb, st_johns_wort, stachys_hedgenettle_chinese_artichoke, tarragon, thyme, valerian, yarrow, unspecified_aromatic_medicinal_culinary_plants_spices_herbs, other_aromatic_medicinal_culinary_plants_spices_herbs, marian_thistles, phacelia, camelina, onobrychis_sainfoins, other_industrial_crops, fresh_vegetables, flowers_ornamental_plants, adonis, anemones_windflowers, asters, begonias, bluebells, bulrush, burnet, carnation, chrysanthemum, cornflowers, corsican_hellebore, dahlia, daisy_daisies, dandelions, echinacea_sun_hat, edelweiss, fiddleneck_amsinckia, fuchsias, galega, gentians, gladiolus_gladioli, goldenrod, iris, isatis_tinctoria_woad, lilies, lotus, lunaria_honesty_silver, malva, milk_star, miscanthus_silvergrass, monstera_adansonii_eyelet, moonseed, narcissus_daffodil, peony_peonies, primrose, rhododendron, roses, rudbeckia_coneflowers, safflower, salsify, sanvitalia_procumbens, serradella, silene_catchfly, silphium_rosinweeds, snapdragons, stonecrop, tagetes, thimbles, tulips, viola, violets_pansies, zinnias, unspecified_flowers_ornamental_plants, other_flowers_ornamental_plants, plants_harvested_green, temporary_grass, poaceae_grasses, elymus, festuca_fescue, cocksfoot_catgrass, festulolium, lolium_ryegrass, setaria, sod_turf, switchgrass, timothy, legumes_harvested_green, alfalfa_lucerne, arachis, clover, melilot, vetches, unspecified_legumes_harvested_green, green_silo_maize, other_plants_harvested_green, arable_land_seed_seedlings, fallow_land_not_crop, kitchen_gardens, strawberries, cucurbits, cucumber_pickle, honeydew, melon, pumpkin_squash_gourd, watermelon, zucchini_courgette, pseudocereal, amaranth, buckwheat, quinoa, soy_soybeans, fennel, topinambur_jerusalem_artichoke, sage_chia, asparagus, brassicaceae_cruciferae, mustard, brassica_oleracea_cabbage, bok_choy_pak_choi, broccoli, brussels_sprouts, cauliflower, chinese_cabbage, collard_greens, gai_lan, kale, kohlrabi, red_cabbage, savoy_cabbage, white_cabbage, other_brassica_oleracea_cabbage, cress, horseradish, swede_rutabaga, alliums, chives, garlic, leek, onions, scallion, shallot, rhubarb, purslane, celery, celeriac, leaf_celery, aubergine_eggplant, artichoke, tomato, root_vegetables, arctium_burdock, beetroot_beets, carrots_daucus, mangelwurzel_fodder_beet, parsnips, radish, sugar_beet, turnips, unspecified_root_vegetables, capsicum, bell_pepper_paprika, chili_pepper, salads_lettuce_leaf_vegetables, chard, chicory_chicories, endive, iceberg, lambs_lettuce_rapunzel, rocket_arugula, sorrel, spinach, other_salads_lettuce_leaf_vegetables, other_arable_land_crops, pasture_meadow_grassland_grass, permanent_crops_perennial, orchards_fruits, amelanchier_serviceberry, apples, apricots, cherry_cherries, feijoa, fig, kiwi, medlar_loquat, nectarine, pawpaw, peach, pears, plums, pomegranate, quinces, unspecified_orchards_fruits, berries_berry_species, aronia_chokeberries, blackberry, blackcurrant_cassis, blueberry, cranberry, currants, gooseberry_gooseberries_cranberries, hippophae_sea_buckthorns_seaberry, jostaberry, raspberry_raspberries, redcurrant, rose_hip_rosehip, rowan_rowanberries, tayberry, unspecified_berries_berry_species, nuts, almond, hazelnuts_hazel, pecan, pistachio, sweet_chestnuts, walnuts, citrus_plantations, olive_plantations, olives_for_oil_production, table_olives, vineyards_wine_vine_rebland_grapes, nurseries_nursery, shrubberries_shrubs, azaleas, chaenomeles_cathayensis, crataegus_hawthorn, elder_elderberry, honeysuckle, ricinus_castor, wire_bush, ginko, avocado, legumes_from_trees, carob, mesquite, tamarind, unspecified_permanent_crops, other_permanent_crops_plantations, mushrooms_energy_genetically_modified_crops, energy_crops, genetically_modified_crops, igniscum_candy, sida_virginia_mallow, truffle, other_mushrooms_energy_crops_genetically_modified_crops, greenhouse_foil_film, tree_wood_forest, afforestation_reforestation, aspen, birch, dogwood_cornus, eucalyptus, oak, populus, willows_osiers, unspecified_tree_wood_forest, other_tree_wood_forest, peat_turf, unmaintained, not_known_and_other
You should take any crop name the user mentions and convert it to one of the strings in the list above. Always query on the crop_type column, using one of the above, never query on 'crop' column.
If the user asks for 'percent' of crops or fields for one of the countries you must always calculate the percentage manually, by summing up the area manually. You total number of hectares to calculate the percentage from is 1583923 for Lithuania, 1788859 for latvia and 973945 for Estonia. If they don't specify a country use 4346727.
There is no 'percent' column, so when you calculate the percentage manually you must sum the crop area and then use the total area of the country.
one of the countries you must always calculate the percentage manually, by summing up the area manually. You can use 1583923 175015 km²
If the user asks for the 'top 10' (or other number) of a crop then sum by area and then sort by that sum.
Question: {input}
'''
)
# todo: if data get's updated, change "ec_es" to "ec_ee"
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=st.secrets["OPENAI_API_KEY"])
# Create the SQL query chain with the custom prompt
db = SQLDatabase.from_uri("duckdb:///duck.db", view_support=True)
chain = create_sql_query_chain(llm, db, prompt=new_prompt, k=100)
'''
Ask me about fiboa data (here: all baltic states)!
Request "a map" to get map output, or table for tabular output, e.g.
- Show a map with the 10 largest fields
- Show a map with the largest field in Estonia
- Show a table of the total area by crop type
- Compute the total area of all fields in km² and compute the percentage the total area of the baltic states (175015 km²)
'''
example = "Which are the 10 largest fields?"
with st.container():
if prompt := st.chat_input(example, key="chain"):
st.chat_message("user").write(prompt)
with st.chat_message("assistant"):
execute_prompt(con, chain, prompt)
st.divider()
'''
Data sources: https://source.coop/fiboa |
Data License: CC-BY-SA-4.0 |
Software License: BSD
'''