def DOCUMENTATION_WRITER_SOP(
    task: str,
    module: str,
):
    documentation = f"""
    Create multi-page long and explicit professional pytorch-like documentation for the {module} code below follow the outline for the {module} library,
    provide many examples and teach the user about the code, provide examples for every function, make the documentation 10,000 words,
    provide many usage examples and note this is markdown docs, create the documentation for the code to document,
    put the arguments and methods in a table in markdown to make it visually seamless

    Now make the professional documentation for this code, provide the architecture and how the class works and why it works that way,
    it's purpose, provide args, their types, 3 ways of usage examples, in examples show all the code like imports main example etc

    BE VERY EXPLICIT AND THOROUGH, MAKE IT DEEP AND USEFUL

    ######## INSTRUCTIONS ########
    Step 1: Understand the purpose and functionality of the module or framework

    Read and analyze the description provided in the documentation to understand the purpose and functionality of the module or framework.
    Identify the key features, parameters, and operations performed by the module or framework.
    Step 2: Provide an overview and introduction

    Start the documentation by providing a brief overview and introduction to the module or framework.
    Explain the importance and relevance of the module or framework in the context of the problem it solves.
    Highlight any key concepts or terminology that will be used throughout the documentation.
    Step 3: Provide a class or function definition

    Provide the class or function definition for the module or framework.
    Include the parameters that need to be passed to the class or function and provide a brief description of each parameter.
    Specify the data types and default values for each parameter.
    Step 4: Explain the functionality and usage

    Provide a detailed explanation of how the module or framework works and what it does.
    Describe the steps involved in using the module or framework, including any specific requirements or considerations.
    Provide code examples to demonstrate the usage of the module or framework.
    Explain the expected inputs and outputs for each operation or function.
    Step 5: Provide additional information and tips

    Provide any additional information or tips that may be useful for using the module or framework effectively.
    Address any common issues or challenges that developers may encounter and provide recommendations or workarounds.
    Step 6: Include references and resources

    Include references to any external resources or research papers that provide further information or background on the module or framework.
    Provide links to relevant documentation or websites for further exploration.
    Example Template for the given documentation:

    ################################### EXAMPLE #####################################
    # Module/Function Name: MultiheadAttention

    class torch.nn.MultiheadAttention(embed_dim, num_heads, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None, batch_first=False, device=None, dtype=None):
        ```
        Creates a multi-head attention module for joint information representation from the different subspaces.

        Parameters:
        - embed_dim (int): Total dimension of the model.
        - num_heads (int): Number of parallel attention heads. The embed_dim will be split across num_heads.
        - dropout (float): Dropout probability on attn_output_weights. Default: 0.0 (no dropout).
        - bias (bool): If specified, adds bias to input/output projection layers. Default: True.
        - add_bias_kv (bool): If specified, adds bias to the key and value sequences at dim=0. Default: False.
        - add_zero_attn (bool): If specified, adds a new batch of zeros to the key and value sequences at dim=1. Default: False.
        - kdim (int): Total number of features for keys. Default: None (uses kdim=embed_dim).
        - vdim (int): Total number of features for values. Default: None (uses vdim=embed_dim).
        - batch_first (bool): If True, the input and output tensors are provided as (batch, seq, feature). Default: False.
        - device (torch.device): If specified, the tensors will be moved to the specified device.
        - dtype (torch.dtype): If specified, the tensors will have the specified dtype.
        ```

        def forward(query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None, average_attn_weights=True, is_causal=False):
            ```
            Forward pass of the multi-head attention module.

            Parameters:
            - query (Tensor): Query embeddings of shape (L, E_q) for unbatched input, (L, N, E_q) when batch_first=False, or (N, L, E_q) when batch_first=True.
            - key (Tensor): Key embeddings of shape (S, E_k) for unbatched input, (S, N, E_k) when batch_first=False, or (N, S, E_k) when batch_first=True.
            - value (Tensor): Value embeddings of shape (S, E_v) for unbatched input, (S, N, E_v) when batch_first=False, or (N, S, E_v) when batch_first=True.
            - key_padding_mask (Optional[Tensor]): If specified, a mask indicating elements to be ignored in key for attention computation.
            - need_weights (bool): If specified, returns attention weights in addition to attention outputs. Default: True.
            - attn_mask (Optional[Tensor]): If specified, a mask preventing attention to certain positions.
            - average_attn_weights (bool): If true, returns averaged attention weights per head. Otherwise, returns attention weights separately per head. Note that this flag only has an effect when need_weights=True. Default: True.
            - is_causal (bool): If specified, applies a causal mask as the attention mask. Default: False.

            Returns:
            Tuple[Tensor, Optional[Tensor]]:
            - attn_output (Tensor): Attention outputs of shape (L, E) for unbatched input, (L, N, E) when batch_first=False, or (N, L, E) when batch_first=True.
            - attn_output_weights (Optional[Tensor]): Attention weights of shape (L, S) when unbatched or (N, L, S) when batched. Optional, only returned when need_weights=True.
            ```

            # Implementation of the forward pass of the attention module goes here

            return attn_output, attn_output_weights

            ```
            # Usage example:

            multihead_attn = nn.MultiheadAttention(embed_dim, num_heads)
            attn_output, attn_output_weights = multihead_attn(query, key, value)
            Note:

            The above template includes the class or function definition, parameters, description, and usage example.
            To replicate the documentation for any other module or framework, follow the same structure and provide the specific details for that module or framework.


            ############# DOCUMENT THE FOLLOWING CODE ########
            {task}
            """
    return documentation