Unit 3 – Question 1

The following is an example of Information Extraction (an excerpt from a news article about Valencia MotoGP and Marc Marques): Marc Marquez was fastest in the final MotoGP warm-up session of the 2016 season at Valencia, heading Maverick Vinales by just over a tenth of a second.

After qualifying second on Saturday, behind a rampant Jorge Lorenzo, Marquez took charge of the 20-minute session from the start, eventually setting the best time of 1m31.095s at half-distance.

Through information extraction, the following basic facts can be pulled out of the free-flowing text and organized in a structured, machine-readable image source form:

  • Person: Marc Marquez
  • Location: Valencia
  • Event: MotoGP
  • Related mentions: Maverick Vinales, Yamaha, Jorge Lorenzo

The primary means of extracting data relies on optical character recognition, or OCR, and is employed in numerous industries. For instance, Banks use OCR to capture account information, detect fraud, and ensure seamless operation flow, while OCR is used in the legal system to digitise printed documents. Healthcare employs OCR to extract reports from data, like X-rays and hospital records, and businesses use to extract serial codes from phones.

It is widely held that, while internet search engines are an essential application, email is the number one online activity for most users. However, surprisingly few advanced email technologies take advantage of the vast amount of information present in a user’s inbox.

Furthermore, several social networking companies have recently been formed to help connect friends and business associates. These companies aim to help businesses find employees, clients, and business partners by exploiting the topology of their social network. However, the networks these companies search are limited to the people who join them. Another company extracts university and company affiliations from news articles and websites to create databases of people, searchable by company, job title, and educational history, though it does not address social connections between people.

In light of these partial solutions, this paper describes a powerful data collection tool.

Data extraction is a vital process to automate structured data collection for using them in further analysis. The process provides necessary data from various sources like invoices, emails, or contracts. These data help automate processes and provide valuable insights and analytics for decision-making.

Data extraction has some advantages and disadvantages, such as better decision-making, cost savings, reduction of manual errors, faster processes, and employee motivation. Moreover, if a business is looking for data extraction software, it should be able to possess certain functionalities to have a higher impact on the workflow. Therefore, companies should consider the following factors when choosing a data extraction vendor: Extract structured data from general document formats; export data into widely used applications; improve data quality; advanced processing and enrichment; real-time extraction; and a user-friendly interface.



+ There are no comments

Add yours