A bachelor's, master's, or PhD in computer science, data science, computational linguistics, statistics, or a related field is ideal; shipped QA for ML/AI……
Experience using AI-assisted engineering tools such as GitHub Copilot, Gemini, Claude for efficient test automation development, debugging, and feature testing.…
You will operate as the technical arm of senior operations leadership in the field — leading site assessments and operational audits, driving team technical……
ISTQB or equivalent certification preferred. Experience with automation frameworks and load testing tools. You will be responsible for performing functional,……
We encourage motivated and qualified applicants to apply without regard to race, color, religion, sex (including pregnancy), sexual orientation, gender identity……
The QA Engineer will support engineering leadership in driving appropriate quality standards, preparing and adhering to test plans and schedules, building……
Influence design decisions by providing timely feedback on design documents and testability. Develop and maintain automation for UI and API tests using tools……
You will design and build full-stack features end-to-end, write high-quality automated tests, support production systems, and collaborate daily with Product……
Security Clearance: Ability to obtain a Security+ certification, if required, and undergo a background investigation for federal government positions.…
Proficiency in AI-driven QA tools for test case generation and analysis is required to support quality assurance processes. Develop automated test cases using.…
Master’s or Bachelor’s degree in GIS, geography, computer science, or related field. Demonstrated ability to communicate with both technical and non-technical……
Knowledge of QA methodology, industry-standard testing, and bug-tracking tools. Utility is an end-to-end mobile app and digital platform agency, focused on……
Proficiency in tools like Jenkins, GitLab CI, Azure DevOps, or GitHub Actions. Exposure to monitoring/logging tools like Prometheus, Grafana, ELK stack.…
Design, build, and ship voice agents that conduct outbound calls to payers - handling IVR navigation, hold queues, multi-turn dialogue with human reps, and……
Strong experience in test case design, performing automation testing using python frameworks. Software Developer in Test, designing, implementing and recording……
Compliance officers, mission managers, and administrators have their own specific audit and oversight requirements; to meet these needs, Silo also ensures……
Define and own the end-to-end test strategy covering unit, integration, end-to-. ISTQB or equivalent QA certification. Matrix analysis, data drift monitoring.…
Bachelor’s degree in Computer Science, IT, or related field. The role bridges software development and IT operations by implementing CI/CD pipelines, cloud……
We’re looking for a Senior/Lead QA Analyst to own our QA strategy, scale automation, and ensure that our products consistently meet the highest standards of……
ClearCaptions is a Federal Communications Commission (FCC)-certified telephone captioning provider, adhering to the highest industry standards of privacy,……
Please note that anyone who relies on the representations made by fraudulent employment agencies does so at their own risk, and LTM disclaims any liability in……
Strong communication skills and ability to collaborate effectively across technical and non-technical teams. Experience owning end-to-end quality certification……
Experience communicating technical instructions to non-technical users. This role performs senior-level business analysis work, overseeing the planning,……
Participates in design reviews and provide feedback on testability and quality risks. Provides technical oversight to ensure timely project completion.…
We are seeking a highly skilled Quality Assurance Lead (Python/JAVA) to design, develop, and maintain robust automated testing solutions that ensure high-……
Education: Bachelor's degree in Computer Science or a related technical field. Bot Development: Develop, configure, and test automation flows using Microsoft……
Bachelor’s degree in Computer Science or related field preferred. Design, develop, and maintain *manual and automated test scripts* for web, mobile, and API……
24h
Ameridial
2.9
AI QA Trainer - LLM Evaluation - Freelance Project
AI QA Trainer - LLM Evaluation - Freelance Project
United States
$6.00 - $65.00 Per Hour (Employer provided)
Is your resume a good match?
Use AI to find out how well the skills on your resume fit this job description.
Are you an AI QA expert eager to shape the future of AI? Large-scale language models are evolving from clever chatbots into enterprise-grade platforms. With rigorous evaluation data, tomorrow's AI can democratize world-class education, keep pace with cutting-edge research, and streamline workflows for teams everywhere. That quality begins with you—we need your expertise to harden model reasoning and reliability.
We're looking for AI QA trainers who live and breathe model evaluation, LLM safety, prompt robustness, data quality assurance, multilingual and domain-specific testing, grounding verification, and compliance/readiness checks. You'll challenge advanced language models on tasks like hallucination detection, factual consistency, prompt-injection and jailbreak resistance, bias/fairness audits, chain-of-reasoning reliability, tool-use correctness, retrieval-augmentation fidelity, and end-to-end workflow validation—documenting every failure mode so we can raise the bar.
On a typical day, you will converse with the model on real-world scenarios and evaluation prompts, verify factual accuracy and logical soundness, design and run test plans and regression suites, build clear rubrics and pass/fail criteria, capture reproducible error traces with root-cause hypotheses, and suggest improvements to prompt engineering, guardrails, and evaluation metrics (e.g., precision/recall, faithfulness, toxicity, and latency SLOs). You'll also partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time.
A bachelor's, master's, or PhD in computer science, data science, computational linguistics, statistics, or a related field is ideal; shipped QA for ML/AI systems, safety/red-team experience, test automation frameworks (e.g., PyTest), and hands-on work with LLM eval tooling (e.g., OpenAI Evals, RAG evaluators, W&B) signal fit. Skills that stand out include: evaluation rubric design, adversarial testing/red-teaming, regression testing at scale, bias/fairness auditing, grounding verification, prompt and system-prompt engineering, test automation (Python/SQL), and high-signal bug reporting. Clear, metacognitive communication—"showing your work"—is essential.
Ready to turn your QA expertise into the quality backbone for tomorrow's AI? Apply today and start teaching the model that will teach the world.
We offer a pay range of $6-to- $65 per hour, with the exact rate determined after evaluating your experience, expertise, and geographic location. Final offer amounts may vary from the pay range listed above. As a contractor you'll supply a secure computer and high-speed internet; company-sponsored benefits such as health insurance and PTO do not apply.
The minimum salary is $6.00 and the max salary is $65.00.
$6.00 – $65.00/hr (Employer provided)
$35.50
/hr Median
United States
If an employer includes a salary or salary range on their job, we display it as "Employer Provided". If a job has no salary data, Glassdoor displays a "Glassdoor Estimate" if available. To learn more about "Glassdoor Estimates," see our FAQ page.
Working here doesn’t have to be a secret
Sign in to browse authentic reviews, anonymous ratings and salary data before you apply.