Neural Network Optimization Engineer
Агентство / HR ресурс NEWHR ( new.hr )
Опыт работы от 3 до 5 лет
About the company
The company is developing a next-generation AI tool tailored for professional designers, illustrators, and marketers, setting a new standard in image generation. Its platform empowers creators to quickly produce and iterate on original images, vector art, illustrations, icons, and 3D graphics using AI. Trusted by industry leaders like Netflix, Asana, and Airbus, the platform is used by over 3 million users in 200+ countries, with more than 350 million images created to date.
The company’s mission is to become an essential, daily tool for every designer, giving creators full control over their creative process. It is focused on delivering innovative features that support large-scale creative projects and redefine the future of design with AI.
About the role
The company is looking for a Neural Network Optimization Engineer who'll be focused on improving the speed, efficiency, and processing capacity of neural network inference pipelines. The right candidate will have practical experience accelerating inference using tools like TensorRT, Triton, and applying quantization approaches. You’ll work closely with ML teams to guarantee our models deliver top performance and stability when deployed in real-world systems.
What you'll do
- Enhance neural network inference by reducing latency and boosting throughput.
- Apply quantization strategies (such as INT8, FP8) to optimize computational resources.
- Conduct benchmarking and profiling to identify bottlenecks and improve execution on specific hardware.
- Partner with ML researchers to implement and maintain efficient models in production.
- Keep abreast of new advancements in inference acceleration, quantization methods, and related software frameworks.
Who we look for
- Proven experience in tuning neural network inference workloads for high performance.
- Proficiency with TensorRT, Triton Inference Server, and CUDA programming.
- Hands-on knowledge of model quantization techniques.
- Strong coding skills in Python and PyTorch.
- In-depth understanding of GPU architecture and optimization principles.
- Sharp analytical abilities to diagnose and resolve performance issues effectively.
What we offer
- Full-time, office-based position in London.
- Relocation support available for non-UK candidates
- Competitive salary.
- Flexible working hours.
- Paid vacations (26 working days per year).
- Opportunities for professional growth and development.
- A supportive and collaborative work environment.
- The chance to work on exciting projects and make a significant impact on our brand.