en flag +1 214 306 68 37
AI-Powered Speech Recognition App for Sports Coaches

AI-Powered Speech Recognition App for Sports Coaches

Industry
Wellness and Sports
Technologies
AI, Android, AWS, iOS, React Native, Cloud

About Our Client

The Client is a sports technology startup.

Replacing Manual Stat Tracking With AI-Powered Voice Capture

As a startup led by a former amateur sports coach, the Client wanted to create a mobile app that would help coaches in the same field. During games and practices, coaches often record voice commentary on player actions, such as goals, assists, and fouls, using a standard voice recorder. After the event, they have to replay the recordings and spend additional time entering player statistics manually into their tracking software or writing them down.

The Client wanted to eliminate this inconvenient and time-consuming process by creating a mobile app that would not only record a coach’s commentary but also automatically extract player statistics from it, save them in a structured format, and make them available for review right after the event. To bring this idea to life, the Client turned to ScienceSoft.

AI-Powered Cross-Platform Mobile App for Sports Notation

ScienceSoft delivered an MVP of an AI-powered cross-platform mobile application for iOS and Android that turns coaches’ live voice commentary into structured player statistics. The MVP also supports in-app subscription purchases, enabling the Client to monetize it from the outset.

Using the app, coaches can set up the team roster with players’ names and positions, define the parameters they want to track, and record a voice sample to enable the app to recognize their voice. After that, they can create an event, such as a game or practice, and use a built-in voice recorder to record continuous voice commentary about player actions and achievements (e.g., goals, fouls, and assists). AI recognizes the coach's speech, converts it to text, extracts the metrics mentioned, and matches them to players on the team roster, allowing coaches to review each player’s performance after the event.

Once the coach starts recording, the system processes the captured audio through a multi-step AI pipeline that includes:

  • Voice activity detection to identify segments containing speech and remove silence, pauses, and other non-speech audio typical of live sports environments before further processing.
  • Speaker diarization to separate audio by speaker.
  • Speaker identification to detect the coach’s voice based on a previously recorded sample and filter out other speakers.
  • Automatic speech recognition (ASR) to convert speech into text.
  • Metric extraction to identify player actions and performance metrics in the recognized text.
  • Name matching to map extracted names to players on the team roster based on lexical and phonetic similarity.

For automatic speech recognition, ScienceSoft used a server-side deployment of the OpenAI Whisper v3 model hosted in the Client’s own environment.

Vosk was also considered, but Whisper delivered better accuracy on game-specific audio data provided by the Client. Its multilingual support and built-in confidence indicators for recognized words also influenced the final choice, as they helped improve the recognition of player names and game-specific terms.

For extracting metrics from transcribed speech, ScienceSoft used Google Gemma 2 2B, a large language model with 2 billion parameters. Its compact size is sufficient for the required tasks and helps reduce model running costs, speed up processing, and lower infrastructure demands. To improve the extraction of player names and metrics, ScienceSoft’s data science team tested two approaches: optimizing the handling of low-confidence word predictions and fine-tuning the model on game-specific audio materials provided by the Client. The first approach delivered better results, helping achieve 97% accuracy in extracting player names and metrics during internal testing.

The solution’s AI pipeline is designed for continuous improvement. Any user corrections to recognized metrics are stored in a separate database table, creating a dataset that can later be used to identify pipeline inaccuracies and further improve the model quality through organized user feedback.

Because app usage varied significantly depending on when coaches recorded events, ScienceSoft implemented a serverless architecture based on AWS Lambda for the application’s back end. This allows the back end to scale down to zero during idle periods and rapidly scale up during peak activity. ScienceSoft deployed the AI inference layer separately from the main app because it requires more powerful servers for model processing. This also makes it easier to update the models or switch to more cost-effective infrastructure later without rebuilding the whole application.

ScienceSoft also implemented Infrastructure as Code (IaC) to simplify deployment and support the Client’s future application management. To support secure and maintainable operation, ScienceSoft followed AWS best practices, using IAM-based access control, JWT authentication, RBAC, and logging of application activity and access events.

Market-Ready AI-Powered Sports Coaching App Built for Growth

The Client received an iOS and Android MVP of its innovative speech recognition app for sports coaches. The application’s AI architecture was designed to balance accuracy, speed, and operating costs while remaining flexible for future improvement. It also established a practical foundation for scaling the product and continuously enhancing model quality based on user feedback. The MVP can be commercialized immediately via app stores and used to attract investment for further growth.

The Client is now gathering live user feedback and assessing revenue performance. Based on the results, the Client plans to continue cooperation with ScienceSoft on the next development stage, enhancing the app’s functionality to turn it into a full-fledged platform for managing amateur sports teams. Future releases are planned to add features such as game and practice scheduling, match streaming, and messaging, as well as support for a wider range of sports disciplines to reduce seasonality and expand market reach.

The Client gave positive feedback on the app’s clean design and intuitive user experience, the robustness of the solution’s architecture, and the team’s quality of work and delivery speed.

Technologies and Tools

React Native, Expo, AWS Cloud (AWS Lambda, AWS EC2, AWS DynamoDB, AWS SQS, AWS Cognito, AWS Amplify, AWS S3), OpenAI Whisper, Google Gemma 2 2B, pyannote, SpeechBrain, vLLM.

Have a question for our team or need help with your project?

Our team is ready to provide client references, estimate your project, or answer any other question related to your IT initiative.

Upload file

Drag and drop or to upload your file(s)

?

Max file size 10MB, up to 5 files and 20MB total

Supported formats:

doc, docx, xls, xlsx, ppt, pptx, pps, ppsx, odp, jpeg, jpg, png, psd, webp, svg, mp3, mp4, webm, odt, ods, pdf, rtf, txt, csv, log