Simeon Emanuilov's picture

Simeon Emanuilov PRO

s-emanuilov

·

https://unfoldai.com/

AI & ML interests

Software Engineer, PhD | Building production ML/DL systems and AI tools

Recent Activity

liked a model about 1 month ago

microsoft/Fara-7B

posted an update about 2 months ago

Converted PaddleOCR models to ONNX for easier deployment and faster inference. These have been working well in production at Monkt.com, so figured I'd share them with the community. Just straight conversions of the original models—might save you some time if you're building OCR pipelines. https://huggingface.co/monkt/paddleocr-onnx

commented on a paper about 2 months ago

Stemming Hallucination in Language Models Using a Licensing Oracle

View all activity

Organizations

upvoted an article 3 months ago

Article

SOTA OCR with Core ML and dots.ocr

Oct 2, 2025

•

62

upvoted a collection 3 months ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15, 2025 • 50

upvoted an article 4 months ago

Article

mmBERT: ModernBERT goes Multilingual

+4

Sep 9, 2025

•

133

upvoted 2 papers 4 months ago

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Paper • 2509.03867 • Published Sep 4, 2025 • 210

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19, 2025 • 43

upvoted a collection 4 months ago

EmbeddingGemma

3 items • Updated Sep 11, 2025 • 105

upvoted an article 4 months ago

Article

Welcome EmbeddingGemma, Google's new efficient embedding model

+4

Sep 4, 2025

•

267

upvoted a collection 6 months ago

Health AI Developer Foundations (HAI-DEF)

Groups models released for use in health AI by Google. Read more about HAI-DEF at http://goo.gle/hai-def • 16 items • Updated 17 days ago • 140

upvoted a collection 7 months ago

Tucan

A series of open-source Bulgarian language models fine-tuned for function calling and tool use. 2.6B, 9B, and 27B parameter variants. • 12 items • Updated Jul 1, 2025 • 1

upvoted an article 8 months ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

Jan 15, 2025

•

222

upvoted a paper 9 months ago

CoLLM: A Large Language Model for Composed Image Retrieval

Paper • 2503.19910 • Published Mar 25, 2025 • 15

upvoted 2 articles 9 months ago

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28, 2024

•

263

Article

Training and Finetuning Reranker Models with Sentence Transformers v4

Mar 26, 2025

•

177

upvoted a paper 10 months ago

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 128

upvoted 3 articles 11 months ago

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

+1

Feb 19, 2025

•

74

Article

SigLIP 2: A better multilingual vision language encoder

+1

Feb 21, 2025

•

193

Article

Merge Large Language Models with mergekit

Jan 9, 2024

•

147

upvoted 3 papers 11 months ago

Fast Video Generation with Sliding Tile Attention

Paper • 2502.04507 • Published Feb 6, 2025 • 51

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 138

Executable Code Actions Elicit Better LLM Agents

Paper • 2402.01030 • Published Feb 1, 2024 • 184