Files changed (3) hide show
  1. fiboa/app.py +7 -7
  2. fiboa/query.py +3 -4
  3. questions.md +0 -133
fiboa/app.py CHANGED
@@ -42,7 +42,7 @@ and return the answer. Only limit for {top_k} when asked for "some" or "examples
42
  This duckdb database includes full support for spatial queries, so it will understand most PostGIS-type
43
  queries as well. Remember that you must cast blob column to a geom type using ST_GeomFromWKB(geometry) AS geometry
44
  before any spatial operations. Do not use ST_GeomFromWKB for non-spatial queries.
45
- If you are asked to "map" or "show on a map", then be sure to select the "geometry", "area", and "crop" columns in your query.
46
  If asked to show a "table", you must not include the "geometry" column from the query results.
47
 
48
  Use the following format: return only the SQLQuery to run. DO NOT use the prefix with "SQLQuery:".
@@ -59,7 +59,7 @@ There is no other column related to area information, especially not total_area
59
  If you need to compute the total area, do it manually, with a SUM of the area column. You should always use the 'area' column - never use a 'total_area' column.
60
  The column "perimeter" is in the unit meters, you may need to convert it to other units, e.g. kilometers.
61
  The column "collection" contains the country codes for the Baltic states:
62
- "ec_lv" for Latvia, "ec_lt" for Lithuania, "ec_es" for Estonia.
63
  Be sure to always include the collection with the right country for any query about a specific country, including it in the WHERE clause.
64
 
65
  If the user asks for 'percent' of crops or fields for one of the countries you must always calculate the percentage manually, by summing up the area manually. You total number of hectares to calculate the percentage from is 1583923 for Lithuania, 1788859 for latvia and 973945 for Estonia. If they don't specify a country use 4346727. If you use one of these be sure to always include the right collection in the where clause.
@@ -94,15 +94,15 @@ chain = create_sql_query_chain(llm, db, prompt=new_prompt, k=100)
94
  Ask me about fiboa data (here: all baltic states)!
95
  Request "a map" to get map output, or table for tabular output, e.g.
96
 
97
- - Show a table of the top ten crops in the Baltics.
98
- - Show a map with the 10 largest sugar beet fields.
99
- - What is the percent of oats in each country?
100
  - Show a map with the largest field in Estonia
101
- - What are the quantiles of field size for Latvia?
 
 
102
 
103
  '''
104
 
105
- example = "How many berry fields are there in each country?"
106
  with st.container():
107
  if prompt := st.chat_input(example, key="chain"):
108
  st.chat_message("user").write(prompt)
 
42
  This duckdb database includes full support for spatial queries, so it will understand most PostGIS-type
43
  queries as well. Remember that you must cast blob column to a geom type using ST_GeomFromWKB(geometry) AS geometry
44
  before any spatial operations. Do not use ST_GeomFromWKB for non-spatial queries.
45
+ If you are asked to "map" or "show on a map", then be select the "geometry" column in your query.
46
  If asked to show a "table", you must not include the "geometry" column from the query results.
47
 
48
  Use the following format: return only the SQLQuery to run. DO NOT use the prefix with "SQLQuery:".
 
59
  If you need to compute the total area, do it manually, with a SUM of the area column. You should always use the 'area' column - never use a 'total_area' column.
60
  The column "perimeter" is in the unit meters, you may need to convert it to other units, e.g. kilometers.
61
  The column "collection" contains the country codes for the Baltic states:
62
+ "ec_lt" for Latvia, "ec_lv" for Lithuania, "ec_es" for Estonia.
63
  Be sure to always include the collection with the right country for any query about a specific country, including it in the WHERE clause.
64
 
65
  If the user asks for 'percent' of crops or fields for one of the countries you must always calculate the percentage manually, by summing up the area manually. You total number of hectares to calculate the percentage from is 1583923 for Lithuania, 1788859 for latvia and 973945 for Estonia. If they don't specify a country use 4346727. If you use one of these be sure to always include the right collection in the where clause.
 
94
  Ask me about fiboa data (here: all baltic states)!
95
  Request "a map" to get map output, or table for tabular output, e.g.
96
 
97
+ - Show a map with the 10 largest sugar beet fields
 
 
98
  - Show a map with the largest field in Estonia
99
+ - Show a table of the top ten crops
100
+ - What are the top ten crops that have a field size over 10 hectares?
101
+ - Compute the total area of all fields in km² and compute the percentage the total area of the baltic states (175015 km²)
102
 
103
  '''
104
 
105
+ example = "Which are the 10 largest fields?"
106
  with st.container():
107
  if prompt := st.chat_input(example, key="chain"):
108
  st.chat_message("user").write(prompt)
fiboa/query.py CHANGED
@@ -8,7 +8,7 @@ from ibis import _
8
 
9
  def execute_prompt(con, chain, prompt):
10
  response = chain.invoke({"question": prompt})
11
- st.write(response.replace("testing", "crops"))
12
  gdf = as_geopandas(con, response)
13
 
14
  if len(gdf) == 0:
@@ -28,8 +28,7 @@ def execute_prompt(con, chain, prompt):
28
  })
29
 
30
  def as_geopandas(con, response):
31
- #import code; code.interact(local=locals())
32
- response = re.sub(";$", "", response).replace("testing", "crops")
33
  sql_query = f"CREATE OR REPLACE VIEW testing AS ({response})"
34
  con.raw_sql(sql_query)
35
  gdf = con.table("testing")
@@ -48,4 +47,4 @@ def as_geopandas(con, response):
48
  if dtype.startswith("datetime64"):
49
  gdf[col] = gdf[col].astype(str)
50
 
51
- return gdf
 
8
 
9
  def execute_prompt(con, chain, prompt):
10
  response = chain.invoke({"question": prompt})
11
+ st.write(response)
12
  gdf = as_geopandas(con, response)
13
 
14
  if len(gdf) == 0:
 
28
  })
29
 
30
  def as_geopandas(con, response):
31
+ response = re.sub(";$", "", response)
 
32
  sql_query = f"CREATE OR REPLACE VIEW testing AS ({response})"
33
  con.raw_sql(sql_query)
34
  gdf = con.table("testing")
 
47
  if dtype.startswith("datetime64"):
48
  gdf[col] = gdf[col].astype(str)
49
 
50
+ return gdf
questions.md DELETED
@@ -1,133 +0,0 @@
1
- # Core questions (no hcat)
2
-
3
- * What are the average field sizes of each country in the baltics?
4
- * How many fields are there in Latvia?
5
- * What's the total area of devoted to agriculture in Latvia?
6
- * Is the average field size larger in lithuania or latvia?
7
- * How many fields are there that are under 1 hectare?
8
- * Show a map with the 10 largest fields
9
- * What percent of fields are under 2 hectares?
10
- * Show a map with the largest field in Estonia
11
- * Show a map with the ten largest fields
12
- * Which country has the most area covered by fields?
13
- * what is the average field size of the largest 10 percent of fields?
14
- * what percent of fields are over 20 hectares?
15
- * how big on average are the largest 20% of fields?
16
- * can you print me a table that calculates deciles of field area?
17
- * can you print me a table that shows the average field area by decile?
18
-
19
-
20
-
21
- ## More coding / prompting needed
22
-
23
- ### Maps with more
24
- * Show a map with the largest fields
25
- *seems to have a limit of 100, but nothing shows up*
26
-
27
- ### Percent of total area
28
- * What precent of Latvia is used for agriculture?
29
- (just need to put in total area of latvia, etc. somewhere)
30
-
31
-
32
- ### Quantiles / deciles
33
- * Can you make a table with quantiles / deciles of the field sizes?
34
- ```
35
- SELECT NTILE(10) OVER (ORDER BY area) AS decile, MIN(area) AS min_size, MAX(area) AS max_size, AVG(area) AS avg_size FROM crops GROUP BY decile ORDER BY decile;
36
- ```
37
- *GROUP BY clause cannot contain window functions!*
38
- - need to teach the right duckdb calculation for this.
39
-
40
- ### Graphs / charts
41
- * Show a chart of field size by decile/quantile
42
- * Show a chart of field size by decile, with the most common crop for that decile (hcat)
43
-
44
- ### Admin 2 level questions
45
- - Need to pre-process admin 2 names for each row.
46
- * How many fields are there in each county of Estonia? 
47
- * What state/county has the highest percent of its land as agriculture? 
48
-
49
-
50
- # hcat / crop questions
51
-
52
- * Show a map with the ten largest sugar beet fields
53
- * What are the top ten crops by area for Lithuania?
54
- * What are the top ten crops by number of fields for Lithuania?
55
- * What are the top ten crops that have a field size over 10 hectares in the baltics?
56
- - sometimes gets this TODO: teach manual sum of rows for 'number of fields' and field count', 'what are the most common crops'
57
-
58
- * What is the percent of wheat in the baltics?
59
- * what percent of latvia agricultural area is corn?
60
- * what is the average field size of corn in latvia?
61
- * what crop has the smallest average field size in latvia?
62
- * what are the ten crops with the smallest average field sizes in latvia?
63
- * what are the ten crops with the smallest average field sizes (with at least 20 fields) in latvia?
64
- * What percent of latvia is strawberries?
65
- * what are the ten crops with the largest average field sizes (with at least 20 fields) in latvia?
66
- * how many fields plant vetches in Latvia?
67
- * how many fields of corn are there in each of the baltic states?
68
- * what's the total area of corn in each of the baltic states?
69
- * what percent of lithuania is corn?
70
- * what is the average field size for wheat in the baltics?
71
- * What is the most common crop on fields over 5 hectares?
72
- * what is the most common crop on fields over 10 hectares in estonia?
73
- * What percent of sugar beet fields are over 10 hectares?
74
- - 45.74898785425101
75
- * what are the ten most common crops by number of fields?
76
- * What are the top 5 flowers in the baltics?
77
- * what are the top 5 legumes by field count in the baltics?
78
- * What are the top 5 legumes in the baltics?
79
- * what are the average field sizes of peas in the baltics?
80
- * what are the average field sizes of beans in the baltics?
81
- * what percent of estonia is not fallow or pasture?
82
- * What percent of latvia is fallow?
83
- * what are the sizes of strawberry fields by quantile in latvia?
84
- * what are the sizes of wheat fields by quantile in latvia?
85
-
86
-
87
-
88
-
89
-
90
-
91
- ## More coding / prompting needed
92
-
93
-
94
- - *this is using the percent of the country, not the percent of the fields*
95
- * what is the percent of wheat in each country?
96
- - `SELECT collection, SUM(area) / 4346727 * 100 AS percent_wheat FROM crops WHERE crop_type IN ('common_soft_wheat', 'durum_hard_wheat') GROUP BY collection;` - should use percent of the country, not the total area. Also this doesn't return any results.
97
- * What are the average field sizes of the top ten crops by area?
98
- - Tried to teach it about not using windowed functions like row_number(), but seems like we need to explicitly train it for this type of query like we did for quantiles.
99
- * Which country has the largest area of arable crops (crop code starts with 3301)?
100
- * Which country has the largest area of grassland (crop code starts with 3302)?
101
- * Which country has the largest area of Permanent perenniel crops (crop code starts with 3303)?
102
- What is the most unique crop type for each country?
103
- - returned the values with the crop_type for each one.
104
- * What percent of sugar in the baltics is sugar beet?
105
-
106
- # Ideas needing more coding
107
-
108
- - show a chart or a graph
109
- - Natural language processing of responses, particularly when there's only one result.
110
- - decide on what to use of map, table, graph, answer
111
- - give more common names in output - ie clean up some of the weird quirks of eurocrops.- multi-step analysis, like get complex results from each country and then compare / analyze- pre-process country level stats
112
- - total area, total perimeter
113
- - by crop stats - total area, total percent of fields, total percent of overall land
114
- - add admin 2 (state) level attribute to each field
115
- - update the table when starting up? Save the new table?
116
- - do admin 2 level stats like country level ones- geospatial queries (should wait for duckdb 1.1 support)
117
- - like bounding box / polygon- joins with environmental data, etc.
118
-
119
-
120
-
121
-
122
-
123
-
124
-
125
-
126
-
127
-
128
-
129
-
130
-
131
-
132
-
133
-