We use the same model from Microsoft microsoft/Phi-3-mini-4k-instruct-gguf


system_info: n_threads = 4 / 8 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
main: interactive mode on.
Reverse prompt: 'User:'
sampling: 
    repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
    top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
    mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature 
generate: n_ctx = 4096, n_batch = 2048, n_predict = 256, n_keep = 1


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to the AI.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision.

User: Hello, Bob.
Bob: Hello. How may I help you today?
User: Please tell me the largest city in Europe.
Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.
User:What is the largest city in Australia?
Bob: The largest city in Australia is Sydney.
User:What is the largest city in US?
Bob: The largest city in the United States by population is New York City.
User:thanks
Bob: You're welcome! If you have any more questions, feel free to ask.

Here's a transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User's requests immediately and with precision. Additionally, Bob is proficient in providing detailed historical and cultural contexts for the information he provides.

User:

llama_print_timings:        load time =     833.57 ms
llama_print_timings:      sample time =       3.84 ms /   127 runs   (    0.03 ms per token, 33055.70 tokens per second)
llama_print_timings: prompt eval time =   14225.02 ms /   121 tokens (  117.56 ms per token,     8.51 tokens per second)
llama_print_timings:        eval time =    9098.34 ms /   124 runs   (   73.37 ms per token,    13.63 tokens per second)
llama_print_timings:       total time =   70052.27 ms /   245 tokens
Downloads last month
3
GGUF
Model size
3.82B params
Architecture
phi3
Inference API
Unable to determine this model's library. Check the docs .

Collection including aisuko/Phi-3-mini-4k-instruct-gguf