As you can imagine, an artificial intelligence project has different phases, fundamental in achieving the proposed objectives. Here we present in more detail the steps to follow in a data project.
As you can imagine, an artificial intelligence project has different phases, fundamental in achieving the proposed objectives. In a deep sense, we use the CRISP-DM methodologies. In many cases, the different stages are somewhat diffuse. The interesting thing about this cyclical process is that it allows us to iterate and improve our products and services based on a series of stages.
Here we present a little more in detail what this methodology consists of:
Understand the Business and the problem
The first step we have to take is to understand the business and understand the problem at hand. For this reason, we work hand in hand with our client’s businesses to fully understand their needs and iterate over them until the problem is solved. This is essential and, finding the key person within a company or organization who has the business knowledge and can help us understand it and define its metrics, fundamental.
It is also important in this first stage to define the different tasks and the people in charge of each of them. A data science project is still a project, and it needs to be managed as such.
The second point is the data mining phase. It consists of extracting the data from its different sources, building the data models, and finding patterns in them that serve as a guide to get closer to our final goal.
For them, we will need knowledge of databases, descriptive visualizations and data extraction, modeling, and transformation processes.
The types of data that we can find can be of three different types:
- Structured: They follow a structure, generally a table, and are usually stored in relational databases.
- Unstructured: These are data that do not follow a common structure. This type of data is not stored in relational databases but has to be stored in other types of databases. An example would be images, sound, or text.
- Semi-structured: They cannot be found in relational databases, but due to their structure, they have a much simpler treatment than unstructured data. For example, the data extracted from web pages, which follows a certain structure, is not data tables.
The third point is the cleanliness of the data. We talk about methods for treating missing data, outliers, and methods to work with them.
This phase is critical and is where the most time is spent on a data science project, about 70-80% of the time.
It is essential because if we do not have fully prepared data, we will not draw the correct conclusions. In addition, in the case of using predictive models, if we give data that is incorrect or not clean, our model will be wrong. Therefore, this phase is basic.
After the data is cleaned and modeled, we will talk about exploration. In this phase, we will seek to extract useful information, see how the data is distributed, understand the different numerical and categorical variables and what each of their levels means, calculate ratios that can be intuitive, and generate graphs and correlations that give us clues as to where to go. To solve the problem. This point is very detective.
You also have to know that this point is not static but it is iterative. That is, we will extract useful information. We will come up with other possibilities and paths that will lead us to repeat the point of data mining and data cleaning many times until we are satisfied with the knowledge we have been able to extract.
This point is very interesting too. We will use different methods and models (some of them we will see in the advanced analytics block) that will allow us to extract those variables that may be relevant and that explain most of our data.
This point is the part that everyone expects: applying artificial intelligence models. In a deep sense, we are experts in generating predictive models based on digital twins that allow us to reduce risks, predict future behavior, reduce costs and increase profits. The interesting thing about digital twins is their applicability to different sectors, from industry to medicine, energy management, logistics, and even tourism or agriculture.
The last point of any process is to show the results. One of the very important points in the process is knowing how to present and communicate the results to business people to draw their conclusions, give us feedback, and make good decisions.
Remember that this whole process is not linear. After a stage, we can go back to the previous one, depending on what interests us. We are flexible in the process but always get the most out of the data.
Also Read: Types Of Learning In Artificial Intelligence