Chris4K commited on
Commit
2113210
·
verified ·
1 Parent(s): 9e4f92e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -1
README.md CHANGED
@@ -9,4 +9,154 @@ app_file: app.py
9
  pinned: false
10
  tags:
11
  - tool
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  pinned: false
10
  tags:
11
  - tool
12
+ ---
13
+
14
+ # Advanced Named Entity Recognition (NER) Tool for smolagents
15
+
16
+ This repository contains an enhanced Named Entity Recognition tool built for the `smolagents` library from Hugging Face. This tool allows you to:
17
+
18
+ - Identify named entities (people, organizations, locations, dates, etc.) in text
19
+ - Choose from multiple NER models for different languages and use cases
20
+ - Configure different output formats and confidence thresholds
21
+ - Use with smolagents for AI agents that can understand entities in text
22
+
23
+ ## Installation
24
+
25
+ ```bash
26
+ pip install smolagents transformers torch gradio
27
+ ```
28
+
29
+ For faster inference on GPU:
30
+ ```bash
31
+ pip install smolagents transformers torch gradio accelerate
32
+ ```
33
+
34
+ ## Basic Usage
35
+
36
+ ```python
37
+ from ner_tool import NamedEntityRecognitionTool
38
+
39
+ # Initialize the NER tool
40
+ ner_tool = NamedEntityRecognitionTool()
41
+
42
+ # Analyze text with default settings
43
+ result = ner_tool("Apple Inc. is planning to open a new store in Paris, France next year.")
44
+ print(result)
45
+
46
+ # Analyze with custom settings
47
+ detailed_result = ner_tool(
48
+ text="Apple Inc. is planning to open a new store in Paris, France next year.",
49
+ model="Babelscape/wikineural-multilingual-ner", # Different model
50
+ aggregation="detailed", # More detailed output format
51
+ min_score=0.7 # Lower confidence threshold
52
+ )
53
+ print(detailed_result)
54
+ ```
55
+
56
+ ## Available Models
57
+
58
+ The tool includes several pre-configured models:
59
+
60
+ | Model ID | Description |
61
+ |----------|-------------|
62
+ | dslim/bert-base-NER | Standard NER (English) - Default |
63
+ | jean-baptiste/camembert-ner | French NER |
64
+ | Davlan/bert-base-multilingual-cased-ner-hrl | Multilingual NER |
65
+ | Babelscape/wikineural-multilingual-ner | WikiNeural Multilingual NER |
66
+ | flair/ner-english-ontonotes-large | OntoNotes English (fine-grained) |
67
+ | elastic/distilbert-base-cased-finetuned-conll03-english | CoNLL (fast) |
68
+
69
+ ## Output Formats
70
+
71
+ The tool supports three output formats:
72
+
73
+ 1. **Simple** - A simple list of entities found with their types and confidence scores
74
+ 2. **Grouped** - Entities grouped by their category (default)
75
+ 3. **Detailed** - A detailed analysis including the original text with entity markers
76
+
77
+ ## Using with an Agent
78
+
79
+ ```python
80
+ from smolagents import CodeAgent, InferenceClientModel
81
+ from ner_tool import NamedEntityRecognitionTool
82
+
83
+ # Initialize the NER tool
84
+ ner_tool = NamedEntityRecognitionTool()
85
+
86
+ # Create an agent model
87
+ model = InferenceClientModel(
88
+ model_id="mistralai/Mistral-7B-Instruct-v0.2",
89
+ token="your_huggingface_token"
90
+ )
91
+
92
+ # Create the agent with our NER tool
93
+ agent = CodeAgent(tools=[ner_tool], model=model)
94
+
95
+ # Run the agent
96
+ result = agent.run(
97
+ "Analyze this text and identify all entities: 'The European Union and United Kingdom finalized a trade deal on Tuesday.'"
98
+ )
99
+ print(result)
100
+ ```
101
+
102
+ ## Interactive Gradio Interface
103
+
104
+ For an interactive experience, run the Gradio app:
105
+
106
+ ```bash
107
+ python gradio_app.py
108
+ ```
109
+
110
+ This provides a web interface where you can:
111
+ - Enter custom text or select from samples
112
+ - Choose different NER models
113
+ - Configure display formats and confidence thresholds
114
+ - See immediate results
115
+
116
+ ## Customization Options
117
+
118
+ ### Entity Confidence Score
119
+
120
+ - Use `min_score` parameter to filter entities by confidence
121
+ - Range: 0.0 (include all) to 1.0 (only highest confidence)
122
+ - Default: 0.8
123
+
124
+ ### Entity Types
125
+
126
+ The tool can identify various entity types including:
127
+ - People (PER, PERSON)
128
+ - Organizations (ORG, ORGANIZATION)
129
+ - Locations (LOC, LOCATION, GPE)
130
+ - Dates and Times (DATE, TIME)
131
+ - Money and Percentages (MONEY, PERCENT)
132
+ - Products (PRODUCT)
133
+ - Events (EVENT)
134
+ - Works of Art (WORK_OF_ART)
135
+ - Laws (LAW)
136
+ - Languages (LANGUAGE)
137
+ - Facilities (FAC)
138
+ - Miscellaneous (MISC)
139
+
140
+ The exact entity types available depend on the chosen model.
141
+
142
+ ## Sharing Your Tool
143
+
144
+ You can share your tool on the Hugging Face Hub:
145
+
146
+ ```python
147
+ ner_tool.push_to_hub("your-username/advanced-ner-tool", token="your_huggingface_token")
148
+ ```
149
+
150
+ ## Limitations
151
+
152
+ - First-time model loading may take some time
153
+ - Some models may require significant memory (especially larger ones)
154
+ - Entity recognition accuracy varies by model and language
155
+
156
+ ## Contributing
157
+
158
+ Contributions are welcome! Feel free to open an issue or submit a pull request.
159
+
160
+ ## License
161
+
162
+ MIT