Spaces:

macrocosm-os
/

prompting-dashboard

Running on CPU Upgrade

App Files Files Community

steffenc commited on May 13, 2024

Commit

1f6940d

unverified ·

2 Parent(s): 89b6d9e dbcaf0f

Merge pull request #3 from macrocosm-os/features/random-stream

Browse files

Files changed (8) hide show

README.md +111 -4
assets/macrocosmos-black.png +0 -0
assets/macrocosmos-white.png +0 -0
responses.py +27 -0
server.py +4 -35
utils.py +21 -21
validators/base.py +3 -3
validators/sn1_validator_wrapper.py +64 -35

README.md CHANGED Viewed

@@ -1,18 +1,125 @@
-# chattensor-backend
-Backend for Chattensor app
-To run, you will need a bittensor wallet which is registered to the relevant subnet (1@mainnet or 61@testnet).
 ## Install
-Create a new python environment and install the dependencies with the command
 ```bash
 pip install -r requirements.txt
 ```
 > Note: Currently the prompting library is only installable on machines with cuda devices (NVIDIA-GPU).

+<picture>
+    <source srcset="./assets/macrocosmos-white.png"  media="(prefers-color-scheme: dark)">
+    <img src="macrocosmos-white.png">
+</picture>
+<picture>
+    <source srcset="./assets/macrocosmos-black.png"  media="(prefers-color-scheme: light)">
+    <img src="macrocosmos-black.png">
+</picture>
+<br/>
+<br/>
+<br/>
+# Subnet 1 API
+> Note: This project is still in development and is not yet ready for production use.
+The official REST API for Bittensor's flagship subnet 1 ([prompting](https://github.com/opentensor/prompting)), built by [Macrocosmos](https://macrocosmos.ai).
+Subnet 1 is an decentralized open source network containing around 1000 highly capable LLM agents. These agents are capable of performing a wide range of tasks, from simple math problems to complex natural language processing tasks. As subnet 1 is constantly evolving, its capabilities are always expanding. Our goal is to provide a world-class inference engine, to be used by developers and researchers alike.
+This API is designed to power applications and facilitate the interaction between subnets by providing a simple and easy-to-use interface for developers which enables:
+1. **Conversation**: Chatting with the network (streaming and non-streaming)
+2. **Data cleaning**: Filtering empty and otherwise useless responses
+3. **Advanced inference**: Providing enhanced responses using SOTA ensembling techniques (WIP)
+Validators can use this API to interact with the network and perform various tasks.
+To run an API server, you will need a bittensor wallet which is registered as a validator the relevant subnet (1@mainnet or 61@testnet).
+NOTE: At present, miners are choosing not to stream their responses to the network. This means that the server will not be able to provide a streamed response to the client until the miner has finished processing the request. This is a temporary measure and will be resolved in the future.
+## How it works
+The API server is a RESTful API that provides endpoints for interacting with the network. It is a simple [wrapper](./validators/sn1_validator_wrapper.py) around your subnet 1 validator, which makes use of the dendrite to make queries.
 ## Install
+Create a new python environment and install the dependencies with the command.
+(First time only)
 ```bash
+python3.10 -m venv env
+source env/bin/activate
 pip install -r requirements.txt
 ```
+> Note:  This project requires python >=3.10.
 > Note: Currently the prompting library is only installable on machines with cuda devices (NVIDIA-GPU).
+## Run
+First activate the virtual environment and then run the following command to start the server:
+```bash
+source env/bin/activate
+```
+Run an API server on subnet 1 with the following command:
+```bash
+EXPECTED_ACCESS_KEY=<ACCESS_KEY> python server.py --wallet.name <WALLET_NAME> --wallet.hotkey <WALLET_HOTKEY> --netuid <NETUID> --neuron.model_id mock --neuron.tasks math --neuron.task_p 1 --neuron.device cpu
+```
+The command ensures that no GPU memory is used by the server, and that the large models used by the incentive mechanism are not loaded.
+> Note: This command is subject to change as the project evolves.
+We recommend that you run the server using a process manager like PM2. This will ensure that the server is always running and will restart if it crashes.
+```bash
+EXPECTED_ACCESS_KEY=<ACCESS_KEY> pm2 start server.py --interpreter python3 --name sn1-api -- --wallet.name <WALLET_NAME> --wallet.hotkey <WALLET_HOTKEY> --netuid <NETUID> --neuron.model_id mock --neuron.tasks math --neuron.task_p 1 --neuron.device cpu
+```
+## API Usage
+At present, the API provides two endpoints: `/chat` (live) and `/echo` (test).
+`/chat` is used to chat with the network and receive a response. The endpoint requires a JSON payload with the following fields:
+- `k: int`: The number of responses to return
+- `timeout: float`: The time in seconds to wait for a response
+- `roles: List[str]`: The roles of the agents to query
+- `messages: List[str]`: The messages to send to the network
+- `prefer: str`: The preferred response to use as the default view. Should be one of `{'longest', 'shortest'}`
+Responses from the `/chat` endpoint are streamed back to the client as they are received from the network. Upon completion, the server will return a JSON response with the following fields:
+- `streamed_chunks: List[str]`: The streamed responses from the network
+- `streamed_chunks_timings: List[float]`: The time taken to receive each streamed response
+- `synapse: StreamPromptingSynapse`: The synapse used to query the network. This contains full context and metadata about the query.
+## Testing
+To test the API locally, you can use the following curl command:
+```bash
+curl --no-buffer -X POST http://0.0.0.0:10000/chat/ -H "api_key: <ACCESS_KEY>" -d '{"k": 5, "timeout": 15, "roles": ["user"], "messages": ["What is today's date?"]}'
+"""
+```
+> Note: Use the `--no-buffer` flag to ensure that the response is streamed back to the client.
+After verifying that the server is responding to requests locally, you can test the server on a remote machine.
+### Troubleshooting
+If you do not receive a response from the server, check that the server is running and that the port is open on the server. You can open the port using the following commands:
+```bash
+sudo ufw allow 10000/tcp
+```
+---
+## Contributing
+If you would like to contribute to the project, please read the [CONTRIBUTING.md](CONTRIBUTING.md) file for more information.
+You can find out more about the project by visiting the [Macrocosmos website](https://macrocosmos.ai) or by joining us in our social channels:
+![Discord](https://img.shields.io/badge/Discord-%235865F2.svg?style=for-the-badge&logo=discord&logoColor=white)
+[![Substack](https://img.shields.io/badge/Substack-%23006f5c.svg?style=for-the-badge&logo=substack&logoColor=FF6719)](https://substack.com/@macrocosmosai)
+[![Twitter](https://img.shields.io/badge/Twitter-%231DA1F2.svg?style=for-the-badge&logo=twitter&logoColor=white)](https://twitter.com/MacrocosmosAI)
+[![X](https://img.shields.io/badge/X-%23000000.svg?style=for-the-badge&logo=X&logoColor=white)](https://twitter.com/MacrocosmosAI)
+[![LinkedIn](https://img.shields.io/badge/LinkedIn-0077B5?logo=linkedin&logoColor=white)](www.linkedin.com/in/MacrocosmosAI)
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

assets/macrocosmos-black.png ADDED Viewed

assets/macrocosmos-white.png ADDED Viewed

responses.py ADDED Viewed

	@@ -0,0 +1,27 @@

+from pydantic import BaseModel, Field
+from typing import List, Dict, Any
+class TextStreamResponse(BaseModel):
+    streamed_chunks: List[str] = Field(
+        default_factory=list, description="List of streamed chunks."
+    )
+    streamed_chunks_timings: List[float] = Field(
+        default_factory=list, description="List of streamed chunks timings, in seconds."
+    )
+    uid: int = Field(0, description="UID of queried miner")
+    completion: str = Field(
+        "", description="The final completed string from the stream."
+    )
+    timing: float = Field(
+        0, description="Timing information of all request, in seconds."
+    )
+    def to_dict(self):
+        return {
+            "streamed_chunks": self.streamed_chunks,
+            "streamed_chunks_timings": self.streamed_chunks_timings,
+            "uid": self.uid,
+            "completion": self.completion,
+            "timing": self.timing,
+        }

server.py CHANGED Viewed

@@ -2,41 +2,11 @@ import asyncio
 import utils
 import bittensor as bt
 from aiohttp import web
-from aiohttp.web_response import Response
 from validators import S1ValidatorAPI, QueryValidatorParams, ValidatorAPI
 from middlewares import api_key_middleware, json_parsing_middleware
-"""
-# test
-```
-curl -X POST http://0.0.0.0:10000/chat/ -H "api_key: hello" -d '{"k": 5, "timeout": 3, "roles": ["user"], "messages": ["hello world"]}'
-curl -X POST http://0.0.0.0:10000/chat/ -H "api_key: hey-michal" -d '{"k": 5, "timeout": 3, "roles": ["user"], "messages": ["on what exact date did the 21st century begin?"]}'
-# stream
-curl --no-buffer -X POST http://129.146.127.82:10000/echo/ -H "api_key: hey-michal" -d '{"k": 3, "timeout": 0.2, "roles": ["user"], "messages": ["i need to tell you something important but first"]}'
-```
-TROUBLESHOOT
-check if port is open
-```
-sudo ufw allow 10000/tcp
-sudo ufw allow 10000/tcp
-```
-# run
-```
-EXPECTED_ACCESS_KEY="hey-michal" pm2 start app.py --interpreter python3 --name app -- --neuron.model_id mock --wallet.name sn1 --wallet.hotkey v1 --netuid 1 --neuron.tasks math --neuron.task_p 1 --neuron.device cpu
-```
-basic testing
-```
-EXPECTED_ACCESS_KEY="hey-michal" python app.py --neuron.model_id mock --wallet.name sn1 --wallet.hotkey v1 --netuid 1 --neuron.tasks math --neuron.task_p 1 --neuron.device cpu
-```
-add --mock to test the echo stream
-"""
-async def chat(request: web.Request) -> Response:
     """
     Chat endpoint for the validator.
     """
@@ -49,9 +19,8 @@ async def chat(request: web.Request) -> Response:
     return response
-async def echo_stream(request, request_data):
-    request_data = request["data"]
-    return await utils.echo_stream(request_data)
 class ValidatorApplication(web.Application):

 import utils
 import bittensor as bt
 from aiohttp import web
 from validators import S1ValidatorAPI, QueryValidatorParams, ValidatorAPI
 from middlewares import api_key_middleware, json_parsing_middleware
+async def chat(request: web.Request) -> web.StreamResponse:
     """
     Chat endpoint for the validator.
     """
     return response
+async def echo_stream(request: web.Request) -> web.StreamResponse:
+    return await utils.echo_stream(request)
 class ValidatorApplication(web.Application):

utils.py CHANGED Viewed

@@ -1,8 +1,10 @@
 import re
-import bittensor as bt
 import time
 import json
 from aiohttp import web
 from collections import Counter
 from prompting.rewards import DateRewardModel, FloatDiffModel
@@ -134,47 +136,45 @@ def guess_task_name(challenge: str):
     return "qa"
-async def echo_stream(request_data: dict):
     k = request_data.get("k", 1)
-    exclude = request_data.get("exclude", [])
-    timeout = request_data.get("timeout", 0.2)
     message = "\n\n".join(request_data["messages"])
     # Create a StreamResponse
     response = web.StreamResponse(
-        status=200, reason="OK", headers={"Content-Type": "text/plain"}
     )
-    await response.prepare()
     completion = ""
     # Echo the message k times with a timeout between each chunk
     for _ in range(k):
         for word in message.split():
             chunk = f"{word} "
             await response.write(chunk.encode("utf-8"))
             completion += chunk
-            time.sleep(timeout)
             bt.logging.info(f"Echoed: {chunk}")
     completion = completion.strip()
     # Prepare final JSON chunk
-    json_chunk = json.dumps(
-        {
-            "uids": [0],
-            "completion": completion,
-            "completions": [completion.strip()],
-            "timings": [0],
-            "status_messages": ["Went well!"],
-            "status_codes": [200],
-            "completion_is_valid": [True],
-            "task_name": "echo",
-            "ensemble_result": {},
-        }
-    )
     # Send the final JSON as part of the stream
-    await response.write(f"\n\nJSON_RESPONSE_BEGIN:\n{json_chunk}".encode("utf-8"))
     # Finalize the response
     await response.write_eof()

 import re
 import time
 import json
+import asyncio
+import bittensor as bt
 from aiohttp import web
+from responses import TextStreamResponse
 from collections import Counter
 from prompting.rewards import DateRewardModel, FloatDiffModel
     return "qa"
+async def echo_stream(request: web.Request) -> web.StreamResponse:
+    request_data = request["data"]
     k = request_data.get("k", 1)
     message = "\n\n".join(request_data["messages"])
     # Create a StreamResponse
     response = web.StreamResponse(
+        status=200, reason="OK", headers={"Content-Type": "application/json"}
     )
+    await response.prepare(request)
     completion = ""
+    chunks = []
+    chunks_timings = []
+    start_time = time.time()
     # Echo the message k times with a timeout between each chunk
     for _ in range(k):
         for word in message.split():
             chunk = f"{word} "
             await response.write(chunk.encode("utf-8"))
             completion += chunk
+            await asyncio.sleep(0.3)
             bt.logging.info(f"Echoed: {chunk}")
+            chunks.append(chunk)
+            chunks_timings.append(time.time() - start_time)
     completion = completion.strip()
     # Prepare final JSON chunk
+    response_data = TextStreamResponse(
+        streamed_chunks=chunks,
+        streamed_chunks_timings=chunks_timings,
+        completion=completion,
+        timing=time.time() - start_time,
+    ).to_dict()
     # Send the final JSON as part of the stream
+    await response.write(json.dumps(response_data).encode("utf-8"))
     # Finalize the response
     await response.write_eof()

validators/base.py CHANGED Viewed

@@ -1,7 +1,7 @@
 from abc import ABC, abstractmethod
 from typing import List
 from dataclasses import dataclass
-from aiohttp.web import Response, Request
 @dataclass
@@ -31,10 +31,10 @@ class QueryValidatorParams:
 class ValidatorAPI(ABC):
     @abstractmethod
-    async def query_validator(self, params: QueryValidatorParams) -> Response:
         pass
 class MockValidator(ValidatorAPI):
-    async def query_validator(self, params: QueryValidatorParams) -> Response:
         ...

 from abc import ABC, abstractmethod
 from typing import List
 from dataclasses import dataclass
+from aiohttp.web import Response, Request, StreamResponse
 @dataclass
 class ValidatorAPI(ABC):
     @abstractmethod
+    async def query_validator(self, params: QueryValidatorParams) -> StreamResponse:
         pass
 class MockValidator(ValidatorAPI):
+    async def query_validator(self, params: QueryValidatorParams) -> StreamResponse:
         ...

validators/sn1_validator_wrapper.py CHANGED Viewed

@@ -2,7 +2,8 @@ import json
 import utils
 import torch
 import traceback
-import asyncio
 import bittensor as bt
 from typing import Awaitable
 from prompting.validator import Validator
@@ -12,6 +13,16 @@ from prompting.dendrite import DendriteResponseEvent
 from .base import QueryValidatorParams, ValidatorAPI
 from aiohttp.web_response import Response, StreamResponse
 from deprecated import deprecated
 class S1ValidatorAPI(ValidatorAPI):
@@ -75,27 +86,39 @@ class S1ValidatorAPI(ValidatorAPI):
             return Response(status=500, reason="Internal error")
     async def process_response(
-        self, response: StreamResponse, uid: int, async_generator: Awaitable
-    ):
         """Process a single response asynchronously."""
-        try:
-            chunk = None  # Initialize chunk with a default value
-            async for chunk in async_generator:  # most important loop, as this is where we acquire the final synapse.
-                bt.logging.debug(f"\nchunk for uid {uid}: {chunk}")
-                # TODO: SET PROPER IMPLEMENTATION TO RETURN CHUNK
-                if chunk is not None:
-                    json_data = json.dumps(chunk)
-                    await response.write(json_data.encode("utf-8"))
-        except Exception as e:
-            bt.logging.error(
-                f"Encountered an error in {self.__class__.__name__}:get_stream_response:\n{traceback.format_exc()}"
             )
-            response.set_status(500, reason="Internal error")
-            await response.write(json.dumps({"error": str(e)}).encode("utf-8"))
-        finally:
-            await response.write_eof()  # Ensure to close the response properly
     async def get_stream_response(self, params: QueryValidatorParams) -> StreamResponse:
         response = StreamResponse(status=200, reason="OK")
@@ -105,7 +128,7 @@ class S1ValidatorAPI(ValidatorAPI):
         try:
             # Guess the task name of current request
-            task_name = utils.guess_task_name(params.messages[-1])
             # Get the list of uids to query for this step.
             uids = get_random_uids(
@@ -115,6 +138,8 @@ class S1ValidatorAPI(ValidatorAPI):
             # Make calls to the network with the prompt.
             bt.logging.info(f"Calling dendrite")
             streams_responses = await self.validator.dendrite(
                 axons=axons,
                 synapse=StreamPromptingSynapse(
@@ -125,13 +150,24 @@ class S1ValidatorAPI(ValidatorAPI):
                 streaming=True,
             )
-            tasks = [
-                self.process_response(uid, res)
-                for uid, res in dict(zip(uids, streams_responses))
-            ]
-            results = await asyncio.gather(*tasks, return_exceptions=True)
-            # TODO: Continue implementation, business decision needs to be made on how to handle the results
         except Exception as e:
             bt.logging.error(
                 f"Encountered an error in {self.__class__.__name__}:get_stream_response:\n{traceback.format_exc()}"
@@ -144,11 +180,4 @@ class S1ValidatorAPI(ValidatorAPI):
         return response
     async def query_validator(self, params: QueryValidatorParams) -> Response:
-        # TODO: SET STREAM AS DEFAULT
-        stream = params.request.get("stream", False)
-        if stream:
-            return await self.get_stream_response(params)
-        else:
-            # DEPRECATED
-            return await self.get_response(params)

 import utils
 import torch
 import traceback
+import time
+import random
 import bittensor as bt
 from typing import Awaitable
 from prompting.validator import Validator
 from .base import QueryValidatorParams, ValidatorAPI
 from aiohttp.web_response import Response, StreamResponse
 from deprecated import deprecated
+from dataclasses import dataclass
+from typing import List
+from responses import TextStreamResponse
+@dataclass
+class ProcessedStreamResponse:
+    streamed_chunks: List[str]
+    streamed_chunks_timings: List[float]
+    synapse: StreamPromptingSynapse
 class S1ValidatorAPI(ValidatorAPI):
             return Response(status=500, reason="Internal error")
     async def process_response(
+        self, response: StreamResponse, async_generator: Awaitable
+    ) -> ProcessedStreamResponse:
         """Process a single response asynchronously."""
+        # Initialize chunk with a default value
+        chunk = None
+        # Initialize chunk array to accumulate streamed chunks
+        chunks = []
+        chunks_timings = []
+        start_time = time.time()
+        last_sent_index = 0
+        async for chunk in async_generator:
+            if isinstance(chunk, list):
+                # Chunks are currently returned in string arrays, so we need to concatenate them
+                concatenated_chunks = "".join(chunk)
+                new_data = concatenated_chunks[last_sent_index:]
+                if new_data:
+                    await response.write(new_data.encode("utf-8"))
+                    bt.logging.info(f"Received new chunk from miner: {chunk}")
+                    last_sent_index += len(new_data)
+                    chunks.extend(chunk)
+                    chunks_timings.append(time.time() - start_time)
+        if chunk is not None and isinstance(chunk, StreamPromptingSynapse):
+            # Assuming the last chunk holds the last value yielded which should be a synapse with the completion filled
+            return ProcessedStreamResponse(
+                synapse=chunk,
+                streamed_chunks=chunks,
+                streamed_chunks_timings=chunks_timings,
             )
+        else:
+            raise ValueError("The last chunkis not a StreamPrompting synapse")
     async def get_stream_response(self, params: QueryValidatorParams) -> StreamResponse:
         response = StreamResponse(status=200, reason="OK")
         try:
             # Guess the task name of current request
+            # task_name = utils.guess_task_name(params.messages[-1])
             # Get the list of uids to query for this step.
             uids = get_random_uids(
             # Make calls to the network with the prompt.
             bt.logging.info(f"Calling dendrite")
+            start_time = time.time()
             streams_responses = await self.validator.dendrite(
                 axons=axons,
                 synapse=StreamPromptingSynapse(
                 streaming=True,
             )
+            uid_stream_dict = dict(zip(uids, streams_responses))
+            random_uid, random_stream = random.choice(list(uid_stream_dict.items()))
+            processed_response = await self.process_response(response, random_stream)
+            # Prepare final JSON chunk
+            response_data = json.dumps(
+                TextStreamResponse(
+                    streamed_chunks=processed_response.streamed_chunks,
+                    streamed_chunks_timings=processed_response.streamed_chunks_timings,
+                    uid=random_uid,
+                    completion=processed_response.synapse.completion,
+                    timing=time.time() - start_time,
+                ).to_dict()
+            )
+            # Send the final JSON as part of the stream
+            await response.write(json.dumps(response_data).encode("utf-8"))
         except Exception as e:
             bt.logging.error(
                 f"Encountered an error in {self.__class__.__name__}:get_stream_response:\n{traceback.format_exc()}"
         return response
     async def query_validator(self, params: QueryValidatorParams) -> Response:
+        return await self.get_stream_response(params)