Performance on Apple M3 Chips (MPS)

by mahdimanesh - opened Nov 27, 2024

Nov 27, 2024

Hi all,

I am experimenting with the model for execution on larger sets of data (e.g. >50 MB in JSON Text) but on an M3 Max.
The MPS driver does not seem to be sufficient for such load. I was under the assumption that I do not need manual chunking but my next approach would be to chunk the input, create embeddings, select relevant chunks based on ANN search and then run the extraction model only on this subset.

Am I on the right path? Any hints are welcome! :-)

liamcripwell

NuMind org Nov 28, 2024

I'm not sure I totally understand the problem. The bottleneck should be from the length of individual input sequences.
Can you provide some more details about what your input looks like?

mahdimanesh

Nov 28, 2024

So my raw input is a set of JSON files with textual data. I chunk them with size 512 and overlap 128.
The sizes I get is then anything between 100-100.000 chunks. So assuming I select only top 5 chunks for input into the model, then we speak about 1000-2500 tokens on average.

liamcripwell

NuMind org Nov 28, 2024

What is the task? Can each file be processed individually?

mahdimanesh

Nov 28, 2024

Yes there will be one extraction template for each JSON file.
At the moment I am "falling back" to a local RAG approach using Ollama and Mixtral -- maybe I am using NuExtract in a way not intended?

liamcripwell

NuMind org Nov 28, 2024

Ok, and so each individual JSON file is at least 100*512 tokens in length? Or are you merging multiple JSON files?

mahdimanesh

Nov 28, 2024

Yes correct, each JSON can be anywhere between 100kb to 100MB in pure text size. I do not merge JSON files but I do extract the portions with pure text before handing it into the model so there is no JSON syntax passed in.
Also, I am obviously chunking and selecting the top 7 chunks for input.

liamcripwell

NuMind org Nov 28, 2024

Ok, I see. Yeah those are really long texts so you will need to be doing some sort of chunking or retrieval. Hard to say what exactly is the best approach without understanding the specific problem better.

You can try our continuation example if you haven't already, but I suspect you could get a lot of error propagation since you will have a huge number of chunks. Probably some sort of retrieval step like you suggest will be best, assuming only specific parts of each document will be relevant to the extraction and you don't actually need the full context.

mahdimanesh

Nov 29, 2024

Agreed! I will go with the chunking approach and see how far I can get. :-) The output result was quite good on one smaller example so I need to find a way to speed up things.
I will also try to compare output with my specific prompt and data with Mixtral on Ollama.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment