en flag +1 214 306 68 37
Data Scraping Solution to Let a Hedge Fund Automatically Detect Market News Updates

Data Scraping Solution to Let a Hedge Fund Automatically Detect Market News Updates

Industry
BFSI, Investment
Technologies
Selenium, .NET, MS SQL Server

About Our Client

The Client is a US investment firm with an international portfolio of about 150 companies. It operates across multiple industries, including healthcare and manufacturing.

Manual Analysis of Market News Hinders Timely Decision-Making

To make informed investment decisions, the Client used to perform daily analysis of financial market news across dozens of online sources, including websites of financial news providers, market research firms, and regulatory body websites. The process was fully manual and time-consuming, which made it hard to keep up with the fast pace of the market and could lead to missed opportunities. The Client had an idea of a web scraping solution to simplify the process. It was to automatically gather links that contained new information about the target companies and present the selection to users for analysis. The Company’s in-house IT team lacked the required expertise, and an outsourced team they had hired before failed to deliver the solution the Client envisioned. The Client turned to ScienceSoft, trusting our experience in data management and analytics.

Building a Solution to Automate News Screening

First, ScienceSoft’s team analyzed the news sources the Client needed to screen. Then, they elicited data analysis and presentation requirements for the future solution, including data sorting and filtering capabilities, such as including or excluding specific sources or setting search periods.

Based on the analysis, we delivered a data scraping and analytics solution that supports the following flows and operations:

News screening requests

Utilizing an Angular-based UI, users can start the screening session and specify the exact companies and institutions they are interested in, as well as the required search period.

Data scraping

Each website is crawled with a dedicated Selenium script that ScienceSoft created according to the specifics of the website structure (e.g., URL patterns, sitemaps), Robots.txt files, HTML layout, login requirements, session timeout and restrictions, and other parameters. The script imitates the behavior of an actual human navigating a website (e.g., logs in, clicks links) while gathering information. Selenium Grid allows for the simultaneous screening of multiple websites and thus can generate results much faster. ScienceSoft also integrated the solution with a third-party service, enabling CAPTCHA solving within 30 seconds. The service is available 24/7 and can ensure that data scraping is not interrupted due to difficulties with CAPTCHA.

Data accumulation, analysis, and presentation

The gathered data is uploaded to a Microsoft SQL Server in the . NET-based backend. The backend analyzes the data to filter out the news that the user has already seen. In the UI, users can see the number of news items related to the company or institution they are interested in. Users can click on each news item to check its brief summary and navigate to the news source.

Knowledge Transfer to the Client’s In-House Team

After trying the solution, the Client was fully satisfied with the result. Our team conducted a series of knowledge transfer sessions, drawing specific attention to adjusting the solution in case the websites’ source code changes. We also provided the Client with exhaustive software documentation.

News Screening Automated to Avoid Missed Investment Opportunities

With ScienceSoft’s assistance, the Client received a web scraping solution that automatically screens financial market websites and indicates sources that contain new information about the target company or institution. After the knowledge transfer, the Client’s team can maintain and adjust the crawlers to adapt to changes in the news websites’. The solution allowed the Client to significantly shorten the time spent searching for news updates and will contribute to making informed investment decisions based on recent market insights.

Technologies and Tools

Selenium, Selenium Grid, .NET, Angular, Microsoft SQL Server.

Have a question to our team or need help with your project?

Our team is ready to provide client references, estimate your project, or answer any other question related to your IT initiative.

Upload file

Drag and drop or to upload your file(s)

?

Max file size 10MB, up to 5 files and 20MB total

Supported formats:

doc, docx, xls, xlsx, ppt, pptx, pps, ppsx, odp, jpeg, jpg, png, psd, webp, svg, mp3, mp4, webm, odt, ods, pdf, rtf, txt, csv, log