
Home care: Laundry & Cleaning
Skin care
peer-reviewed
Stain Removal Predictions via AI Driven Models
SELİN ERGUN*, GULSAH ACAR, OKAN YUZUAK, MEHMET SERHAN BODUROGLU
Hayat Kimya R&D Center Kocaeli, Turkey
*Corresponding author
ABSTRACT: This study investigates the cleaning efficacy of novel liquid laundry detergent formulations. Utilizing standardized stain sets from established suppliers, we systematically evaluated the performance of various formulations, developed from foundational principles, across a range of common stains. The core objective is to leverage this empirical data to develop a predictive machine learning model capable of forecasting detergent cleaning performance based on specific formulation parameters, aiming to accelerate and optimize detergent development.
??????????????????
“
“A study in healthy women providing probiotic yogurt for four weeks showed an improvement in emotional responses as measured by brain scans”

Figure 1. Skin Section with Microbiome. Most microorganisms live in the superficial layers of the stratum corneum and in the upper parts of the hair follicles. Some reside in the deeper areas of the hair follicles and are beyond the reach of ordinary disinfection procedures. There bacteria are a reservoir for recolonization after the surface bacteria are removed.
Materials and methods
Studies of major depressive disorder have been correlated with reduced Lactobacillus and Bifidobacteria and symptom severity has been correlated to changes in Firmicutes, Actinobacteria, and Bacteriodes. Gut microbiota that contain more butyrate producers have been correlated with improved quality of life (1).
A study in healthy women providing probiotic yogurt for four weeks showed an improvement in emotional responses as measured by brain scans (2). A subsequent study by Mohammadi et al. (3) investigated the impacts of probiotic yogurt and probiotic capsules over 6 weeks and found a significant improvement in depression-anxiety-stress scores in subjects taking the specific strains of probiotics contained in the yogurt or capsules. Other studies with probiotics have indicated improvements in depression scores, anxiety, postpartum depression and mood rating in an elderly population (4-7).
Other studies have indicated a benefit of probiotic supplementation in alleviating symptoms of stress. In particular, researchers have looked at stress in students as they prepared for exams, while also evaluating other health indicators such as flu and cold symptoms (1). In healthy people, there is an indication that probiotic supplementation may help to maintain memory function under conditions of acute stress.
Introduction
In the global home care sector, product development is directed towards enhancing consumer comfort through advancements in cleaning and hygiene. A primary strategic goal within the fast-moving consumer goods industry involves pioneering creative product and process innovations. To achieve market leadership and provide cost-effective, high-performing products, substantial research and development are undertaken. This encompasses the generation of new formulations, their performance evaluation relative to both internal standards and market competitors, and the identification of latent consumer demands through systematic field and customer analyses. Performance attributes, specifically stain removal efficacy, color fading, whiteness retention, and foam characteristics, are quantitatively assessed using spectrophotometric analysis and qualitatively validated through visual examination, enabling comprehensive product profiling. Such rigorous evaluation is routinely applied during the development and subsequent performance validation of novel formulations.
In order to eliminate the work and time loss that occurs within the scope of these studies and to uncover anticipated connections between the components, it is aimed to develop a system that will use the provided dataset to estimate soil removal performance for the given parameters. There are examples of application of stastical modelling or machine learning for cleaning process in literature (1, 2, 3). This system is designed to the cost data of the forecasted formulas with embedded real-prices through SAP. This cost integration may immediately addresses a crucial problem in product development: striking a balance between economic feasibility and stain removal performance. Therefore it is aimed to have this system not only to predict the perfomances but to be a potent decision-support system due to this special integration. By both predicted efficacy and economic data, the system facilitates the strategic selection of formulations, significantly reducing the time and resources required for experimental screening and commercialization. The price of the detergent fluctuates considerably due to several factors, which may be dissected into components. However, raw material costs are the primary for a detergent developer and are subject to fluctuations in worldwide supply and demand, crude oil prices, and/or agricultural prices. The rest is typical for a manufacturer (costs associated with packing, operations, etc.). Depending on size, formulation complexity, raw material procurement, and location, the production cost per liter of a liquid laundry detergent could range from $0.4 to $1.2+ per liter.
Generatıon and evaluatıon of data
To address the challenge of optimizing laundry detergent performance, this research involved the experimental evaluation of numerous liquid detergent formulations against standardized stains. The primary aim of this project is to build a machine learning model that can accurately predict the “stain removal performance” of a detergent, given its chemical composition and other parameters, based on the performance data collected. "To remove stains" is the primary purpose of detergents, which are found in every household, and it is also the user's fundamental expectation.
For this study, 260 different detergent formulas were created. In order to cover a wide and representative chemical space pertinent to liquid detergent solutions, these formulations systematically varied in the type of surfactant, its concentration, and the amount of enzyme. Measuring cleaning effectiveness based on the agents used and stain types remains the formula developer's most effective control method, even if the physical and chemical mechanism of stain removal is fully covered in the literature. Simply explained, using a laundry detergent to remove stains involves wetting the foreign material that has been deposited on the cloth and then using the appropriate agent(s) to remove it from the surface using a chemical and/or physical process. Stain removal performance was assessed for each formulation in a controlled laboratory setting. This included certain test circumstances, such as water hardness, temperature, and wash cycle characteristics pertinent to the detergent's use, as well as typical washing machine protocols. Their measurements, stains, and tests were standardised. While conducting performance tests, we followed the A.I.S.E. detergent guidelines (4); however, because of the study's maturity, a comprehensive stain set of 19 stains was employed. It is important to note that this set of 19 stains was chosen in earlier research to create a responsive set. (5) Each formulation underwent four independent experimental repetitions to ensure reproducibility and statistical robustness of the measurements. Throughout the investigation, all stain removal performance tests were carried out using the same test parameters; the washing test protocol is provided below.
Test Procedure for Washing:
- Cotton program 40°C; 1h:49 min 1000 rpm.
- Fuzzly logic control disabled Miele W 5872 Edition 111 washing machine
- 2,5 kg Cotton and polyester ballast
- 65 mL liquid detergent
- 150 ppm CaCO3 (15 french hardness)
- 19 stain set with 5x5 cm
- 4 repetition
The stains indicated above are standard stains produced by test materials firm like CFT and WFK. The CIELAB system was used to do a colorimetric evaluation of the stain removal performance. The colorimetric measurement's Y values are the response that was examined for the designed formula. Five raw components were employed in various proportions during the formulation design process, while ingredients that had no effect on performance were maintained constant.

Table 1. Examples of some of the tested detergent formulations. The input is represented by each column, while the formulation name is represented by each row.
By creating various detergent mixtures from scratch and testing them on a selected standartized stains, extensive data was gathered on their performance. The crucial part is to apply machine learning techniques to this data, enabling us to predict how well a new detergent will clean. Some experiments were repeated because they were deemed outliers since basic statistical markers were used during the data collection process. Testing of the machine learning model began once the dataset had reached a particular level of maturity. The goal of these tests is to determine whether the machine can "learn" the relationship or correlation between the results provided based on the given parameters. If it can, the machine will use this learning to "make predictions," which is the primary goal of this project.
The "known models" were examined as the tests were conducted and the data grew in order to assess the data and look for regression. 90% of the data was used to train the models, and the remaining 10% was utilized to assess the "prediction" capabilities in order to determine "whether the machine can learn from provided data." And the given graph reveals that “prediction” converge at earlier stages when AL(active learning) is employed.

Figure 1. Monitoring the MAE with 1-sample incremental training
AL: We start with a small subset of the available dataset and gradually grow the used set by adding examples identified by active learning along the x-axis. Performance improves as the used set grows.
Random : We start with the same small subset of the available dataset and gradually grow the used set by adding random samples along the x-axis. Performance improves as the used set grows.
According to the complexity of the dataset created, different “machine learning models” were tried in the literature and Mean Absolute Error (MAE) values were taken into account for the selection of appropriate model(s). Dataset enlargement studies continued in order to improve the prediction ability of the algorithm by selecting appropriate models.
Several "machine learning models" were tested based on the complexity of the dataset produced, and the Mean Absolute Error (MAE) values were taken into consideration while choosing the model or models. By choosing the right models, dataset enlargement studies were carried out to enhance the algorithm's prediction capability. In other words, the algorithm was requesting studies to enrich the data points that it considered “weak/insufficient”. This situation continued for a while, because the studies were done gradually, as the demands could change as new data was fed to the model. During data enrichment, improvements were made to the model structure. Two hundred and sixty different detergent formulas were designed for this study. The total number of experimental runs was 1040 (260 formulations x 4 repetitions), with 4 repetitions of the experiments performed for each of these formulations. Each of these runs yielded 19 data points to be evaluated, giving our model a rich feature set of 19760 data points.

Table 2. Performance of known models for the given data set.
The data enrichment procedure was carried out until the growth in the model improvement was deemed "insignificant", than there was a break for data enrichment. Since this model is "alive" and can be improved at any time, “break” term was used. Comparing the experimental and prediction results and determining if the predictions and experimental results are compatible are two of the project's milestones.
The algorithm was asked for the prediction results of “20 formulas that have not been tested before”, and these results were compared with the experimental results. According to this comparison, the algorithm can predict the “Y values”, but the prediction result is not the same as “numerical”. But the trend of the experimental results and the estimated results is the same. In other words, the formula with a low cleaning performance according to algorithm’s prediction will also have a bad experimental result, examples of this is shown below on a stain basis.

Figure 2. Overlaid experimental and predicted results for 20 formulations
Development of ınterface
The next stage in the project is to develop an interface where the detergent R&D team can use this algorithm. Because the aim is to have a user friendly “prediction” model and thus save both energy and time. Apart from project management, cost information sharing, and performance estimation, this portal also offered the capability of searching among the estimates based on certain criteria. Therfore, a "Formula Pool," which is not a pool made solely from estimation that individuals asked, was produced. In essence, a pool that is growing automatically is produced by the algorithm in the background depending on user-created criteria. Additionally, the price details will be provided so that a sizable pool can be accessed in the future if a search is required.

Figure 3. A screenshot of the developed interface shows how many predictions are made each month.
User generated : A member of Hayat R&D asked the model «to predict»
Auto generated : Model makes «prediction» according to Formula Pool Settings
The server went online at August 2024, as shown in the Figure 3 auto-generated “estimations” of the formulas allow user to scan formulas from larger library. To create the model, 19760 data points were gathered, as previously mentioned. Figure 3 displays the number of anticipated data points by month. Based on the user-specified parameters (ranges and increments for the five parameters), the model itself makes predictions. As mentioned before; one of the aim of this server is to scan through estimation and to find the possible formula for the given performance with intended cost. The addition of real-time pricing data for every raw ingredient in the formulation improves the predicted output even further. Through this integration, a useful database of reasonably priced and capable applicants is produced, which may be used to expedite further research and development initiatives. It would be useful to provide an example in order to highlight the "benefit" of this formula pool. One of the main problems in FMCG is unexpected (unforeseen) raw material crises brought on by cost or shortage. And it is essential to manage the issue with the least amount of harm possible without sacrificing the product's ability to remove stains, the “filter” option on the formula pool allows user for a thorough cost-benefit analysis of suggested formulations, facilitating cost and performance optimization.
Conclusion
The future of cosmetics lies in the continued evolution of holistic approaches which represents a transformative shift in the industry, merging scientific advancements, natural ingredients, and wellness principles. By understanding and embracing the interconnectedness of these elements, the cosmetics industry can cultivate products that not only enhance external beauty but also contribute to the overall well-being of individuals and the planet.
The interplay between beauty from within and topical cosmetics is the key for future products. The integration of biotechnology and green chemistry is revolutionizing cosmetic formulations, offering sustainable and biocompatible alternatives.
Developers can implement blockchain to trace the journey of ingredients from source to product. Nevertheless, the efficacy of the natural products should be scientifically proven. Marketers can communicate transparency as a brand value, and parallelly educate consumers by highlighting how specific ingredients contribute to radiant and healthy skin.
By embracing the synergy between these approaches and leveraging scientific advancements, the cosmetics industry can provide consumers with comprehensive beauty solutions that cater to both internal and external dimensions of beauty.
Surfactant Applications

The application area lends itself particularly well to the use of AI. Active today in this area is the US company Potion AI (6). The company provides AI-powered formulation tools for beauty and personal care R&D. Their offerings include Potion GPT, next generation ingredient and formula databases and AI document processing. Potion’s work could have a significant impact on the entire surfactant value chain, from raw material suppliers to end consumers. By using their GPT technology, they can help target work toward novel surfactant molecules that have optimal properties for specific applications. By using their ingredient and formula databases, they can access and analyze a vast amount of data on surfactant performance, safety, and sustainability. By using their AI document processing, they can extract and organize relevant information from patents, scientific papers, and regulatory documents. These capabilities could enable Potion AI's customers to design and optimize surfactant formulations that are more effective, eco-friendly, and cost-efficient. A particularly interesting application for this type of capability is deformulation.
Deformulation is the process of reverse engineering a product's formulation by identifying and quantifying its ingredients. Deformulation can be used for various purposes, such as quality control, competitive analysis, patent infringement, or product improvement. However, deformulation can be challenging, time-consuming, and costly, as it requires sophisticated analytical techniques, expert knowledge, and access to large databases of ingredients and formulas.
AI can potentially enhance and simplify the deformulation process by using data-driven methods to infer the composition and structure of a product from its properties and performance. For example, AI can use machine learning to learn the relationships between ingredients and their effects on the product's characteristics, such as color, texture, fragrance, stability, or efficacy. AI can also use natural language processing to extract and analyze information from various sources, such as labels, patents, literature, or online reviews, to identify the possible ingredients and their concentrations in a product.

Figure 2. Skin Section with Microbiome. Most microorganisms live in the superficial layers of the stratum corneum and in the upper parts of the hair follicles. Some reside in the deeper areas of the hair follicles and are beyond the reach of ordinary disinfection procedures. There bacteria are a reservoir for recolonization after the surface bacteria are removed.
References and notes
- Biranje, S. S.; Nathany, A.; Mehra, N; Adivarekar , R.; Optimisation of Detergent Ingredients for Stain Removal Using Statistical Modelling; J. Surfact Deterg. 2015, 18, 949-956 DOI 10.1007/s11743-015-1722-6 https://aocs.onlinelibrary.wiley.com/doi/10.1007/s11743-015-1722-6
- Simeone, A.; Woolleey, E.; Escrig, J.; Watson, N. J.; Intelligent Industrial Cleaning: A Multi-Sensor Approach Utilising Machine Learning-Based Regression; Sensors 2020, 20, 3642doi:10.3390/s20133642 https://www.mdpi.com/1424-8220/20/13/3642
- Jangir, K.; Gour, A; Suniya, N. K.; Meena, S. K.; Parihar K.; Multi-objective optimization of detergent pre-formulations using machine learning techniques; Journal of Indian Chemical Society 2023, 100 https://doi.org/10.1016/j.jics.2022.100815 https://www.sciencedirect.com/science/article/abs/pii/S0019452222004770?via%3Dihub
- A.I.S.E. Laundry Detergent Testing Guidelines https://aise.eu/priorities/product-stewardship/detergents/detergent-test-protocol/
- Acar, G., Yuzuak, O., & Vatansever, E. C. Laundry Detergent Performance Tests by Six Sigma-Based Sustainable Approach [Poster presentation], Sepawa Congress 2023.