matlok
's Collections
Papers - University - Hong Kong University of Science and Te
updated
Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss
Paper
•
2404.02731
•
Published
•
1
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
Models
Paper
•
2309.12284
•
Published
•
19
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting
for Text-to-Speech Synthesis
Paper
•
2404.03204
•
Published
•
7
Adapting LLaMA Decoder to Vision Transformer
Paper
•
2404.06773
•
Published
•
17
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real
Computer Environments
Paper
•
2404.07972
•
Published
•
46
RegionGPT: Towards Region Understanding Vision Language Model
Paper
•
2403.02330
•
Published
•
2
Dynamic Typography: Bringing Words to Life
Paper
•
2404.11614
•
Published
•
44
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
Paper
•
2404.14047
•
Published
•
45
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper
•
2404.14700
•
Published
•
29
Interactive3D: Create What You Want by Interactive 3D Generation
Paper
•
2404.16510
•
Published
•
18
LLaVA-OneVision: Easy Visual Task Transfer
Paper
•
2408.03326
•
Published
•
60