Spaces:
Runtime error
Runtime error
title: ReSym Space | |
emoji: 🐢 | |
colorFrom: green | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 5.22.0 | |
app_file: app.py | |
pinned: false | |
# ReSym Space | |
This is a space for testing the models from the [ReSym | |
artifacts](https://github.com/lt-asset/resym). Sadly, at the time I am writing | |
this, not all of ReSym is publicly available; specifically, the Prolog component | |
is [not available](https://github.com/lt-asset/resym/issues/2). | |
This space simply performs inference on the two pretrained models available as | |
part of the ReSym artifacts. It takes a variable name and some decompiled code | |
as input, and outputs the variable type and other information. | |
The examples are randomly selected from `vardecoder_test.jsonl`. As a result, the fields do not always parse correctly. | |
## Disclaimer | |
I'm not a ReSym developer and I may have messed something up. In particular, | |
you must prompt the variable names in the decompiled code as part of the prompt, | |
and I reused some of their own code to do this. | |
## Known Issues / Oddities | |
### sub_40FD86 | |
We do not get the same results for sub_40FD86. In fact, we don't create the same prompt. The prompt in `vardecoder_test.jsonl` is: | |
What are the original name and data type of variables `v3`, `v4`, `v5`? | |
It's unclear why a1, a2, and result are not listed. | |
### `first_token` weirdness | |
The [example | |
inference](https://github.com/lt-asset/resym/blob/main/training_src/fielddecoder_inf.py) | |
scripts get the first token of the output and include it in the prompt. | |
Technically this is data leakage, but since the first token is usually part of | |
the prompt (a variable name or field expression) it's probably OK? But it's | |
also pretty weird. | |
### Indentation | |
Some decompilations in the dataset have whitespace for indentation included, and | |
some do not. | |
### `field_access_driver` clang parser | |
ReSym uses a clang-based parsing tool to extract field accesses. The tool still | |
outputs the field accesses even if the code does not parse correctly. This | |
seems to be design, so I am doing this too. Otherwise, most of the ReSym | |
examples do not work, because external functions and variables are not properly | |
declared. | |
Another oddity is that sometimes the field access driver will output a field | |
access expression of `""`. This appears to be a bug in the field access driver. | |
### Other | |
* ReSym's parser fails for functions with a non-automatic name | |
## Todo | |
* Test field decoding more | |