Improving Text Search through ML (2023)

At this point, we assume you have read our Text Search Tutorial and accomplished the following steps.

  • Created and deployed a basic text search app in Vespa.
  • Fed the app with the MS MARCO full document dataset.
  • Compared and evaluated two different ranking functions.

We are now going to show you how to create a dataset that joins relevance information from the MS MARCO datasetwith ranking features from Vespa to enable you to train ML models to improve your application.More specifically, you will accomplish the following steps in this tutorial.

  • Learn how to collect rank feature data from Vespa associated with a specific query.
  • Create a dataset that can be used to improve your app’s ranking function.
  • Propose sanity-checks to help you detect bugs in your data collection logicand ensure you have a properly built dataset at the end of the process.
  • Illustrate the importance of going beyond pointwise loss functions when dealing with Learning To Rank (LTR) tasks.

Collect rank feature data from Vespa

Vespa’s rank feature set contains a large set of low and high level features.Those features are useful to understand the behavior of your app and to improve your ranking function.

Default rank features

To access the default set of ranking features,set the query parameter ranking.listFeatures to true.For example, below is the body of a post request that in a query,selects the bm25 rank-profile developed in the previous tutorialand returns the rank features associated with each of the results returned.

$ vespa query \ 'yql=select id,rankfeatures from msmarco where userQuery()' \ 'query=what is dad bod' \ 'ranking=bm25' \ 'type=weakAnd' \ 'ranking.listFeatures=true'

The list of rank features that are returned by default can change in the future - the current list can be checked in thesystem test.For the request specified by the body above we get the following (edited) json back.Each result will contain a field called rankfeatures containing the set of default ranking features:

{ "root": { "children": [ ... { "fields": { "rankfeatures": { ... "attributeMatch(id).totalWeight": 0.0, "attributeMatch(id).weight": 0.0, "elementCompleteness(body).completeness": 0.5051413881748072, "elementCompleteness(body).elementWeight": 1.0, "elementCompleteness(body).fieldCompleteness": 0.010282776349614395, "elementCompleteness(body).queryCompleteness": 1.0, "elementCompleteness(title).completeness": 0.75, "elementCompleteness(title).elementWeight": 1.0, "elementCompleteness(title).fieldCompleteness": 1.0, "elementCompleteness(title).queryCompleteness": 0.5, "elementCompleteness(url).completeness": 0.0, "elementCompleteness(url).elementWeight": 0.0, "elementCompleteness(url).fieldCompleteness": 0.0, "elementCompleteness(url).queryCompleteness": 0.0, "fieldMatch(body)": 0.7529285549778888, "fieldMatch(body).absoluteOccurrence": 0.065, ... } }, "id": "index:msmarco/0/811ccbaf9796f92bfa343045", "relevance": 37.7705101001455, "source": "msmarco" }, ], ...}

Chose and process specific rank features

If instead of returning the complete set of rank features you want to select specific ones,you can add a new rank-profile (let’s call it collect_rank_features) to our msmarco.sd schema definitionand disable the default ranking features by adding ignore-default-rank-features to the new rank-profile.In addition, we can specify the desired features within the rank-features element.In the example below we explicitly configured Vespa to only returnbm25(title), bm25(body), nativeRank(title) and nativeRank(body).

Note that using all available rank features comes with computational cost,as Vespa needs to calculate all these features.Using many features is usually only advisable using second phase ranking,see phased ranking with Vespa.

schema msmarco { document msmarco { field id type string { indexing: attribute | summary } field title type string { indexing: index | summary index: enable-bm25 } field url type string { indexing: index | summary } field body type string { indexing: index index: enable-bm25 } } document-summary minimal { summary id type string { } } fieldset default { fields: title, body, url } rank-profile default { first-phase { expression: nativeRank(title, body, url) } } rank-profile bm25 inherits default { first-phase { expression: bm25(title) + bm25(body) + bm25(url) } } rank-profile collect_rank_features inherits default { first-phase { expression: bm25(title) + bm25(body) + bm25(url) } second-phase { expression: random } match-features { bm25(title) bm25(body) bm25(url) nativeRank(title) nativeRank(body) nativeRank(url) } }}

The random global featurewill be useful in the next section when we describe our data collection process.

After adding the collect_rank_features rank-profile to msmarco.sd, redeploy the app:

$ vespa deploy --wait 300 app
(Video) Book Review - Machine Learning Techniques for Text

Create a training dataset

The MS MARCO dataset described in the previous tutorialprovides us with more than 300 000 training queries,each of which is associated with a specific document id that is relevant to the query.In this section we want to combine the information contained in the pairs (query, relevant_id)with the information available in the Vespa ranking featuresto create a dataset that can be used to train ML models to improve the ranking function of our msmarco text app.

Before we move on to describe the collection process in detail,we want to point out that the whole process can be replicated by the following callto the data collection script collect_training_data.pyavailable in this tutorial repository:

The following routine requires that you have downloaded the full dataset.

$ ./src/python/collect_training_data.py msmarco collect_rank_features 99

The command above use data contained in the query (msmarco-doctrain-queries.tsv.gz)and in the relevance (msmarco-doctrain-qrels.tsv.gz) files that are part of the MSMARCO dataset,and send queries to Vespa using the collect_rank_features rank-profiledefined in the previous section in order to request 99 randomly selected documents for each queryin addition to the relevant document associated with the query.All the data from the request are then parsed and stored in the output folder,which is chosen to be data in this case.

Data collection logic

Since we want to improve the first-phase ranking function of our application,our goal here is to create a dataset that will be used to train models that will generalize wellwhen used in the first-phase ranking of an actual Vespa instance running against possibly unseen queries and documents.This might be obvious at first but turns out to be easy to neglect when making some data collection decisions.

The logic behind the collect_training_data.py can be summarized by the pseudo-code below:

hits = get_relevant_hit(query, rank_profile, relevant_id)if relevant_hit: hits.extend(get_random_hits(query, rank_profile, number_random_sample)) data = annotate_data(hits, query_id, relevant_id) append_data(file, data) 

For each query, we first send a request to Vespa to get the relevant document associated with the query.If the relevant document is matched by the query, Vespa will return it,and we will expand the number of documents associated with the query by sending a second request to Vespa.The second request asks Vespa to return a number of random documentssampled from the set of documents that were matched by the query.We then parse the hits returned by Vespa and organize the data into a tabular formcontaining the rank features and the binary variable indicating if the query-document pair is relevant or not.

We are only interested in collecting documents that are matched by the querybecause those are the documents that would be presented to the first-phase model in a production environment.This means that we will likely leave some queries that contain information about relevant documentsout of the collected dataset, but it will create a dataset that are closer to our stated goal.In other words, the dataset we collect is conditional on our match criteria.

Get relevant hit

The first Vespa request is contained in the function call get_relevant_hit(query, rank_profile, relevant_id)where the query parameter contains the desired query string,rank_profile is set to the collect_rank_features defined earlierand relevant_id is the document id that is said to be relevant to that specific query.

The body of the request is given by:

(Video) What is ChatGPT and How You Can Use It

body = { "yql": "select id, rankfeatures from sources * where userQuery()", "query": query, "hits": 1, "recall": "+id:" + str(relevant_id), "ranking": {"profile": rank_profile, "listFeatures": "true"},}

where the yql and userQuery parameters instruct Vespa to return the id of the documentsalong with the selected rank-features defined in the collect_rank_features rank-profile.The hits parameter is set to 1 because we know there are only one relevant id for each query,so we set Vespa to return only one document in the result set.The recall parameter allow us to specify the exact document id we want to retrieve.

Note that the parameter recall only works if the document is matched by the query,which is exactly the behavior we want in this case.

The recall syntax to retrieve one document with id equal to 1 is given by "recall": "+id:1"and the syntax to retrieve more than one document,say documents with ids 1 and 2 is given by "recall": "+(id:1 id:2)".

If we wanted to retrieve the document even if it did not match the query specification we couldalter the query to use the following query specification:

body = { "yql": "select id, rankfeatures from sources * where true or userQuery()", "query": query, "hits": 1, "recall": "+id:" + str(relevant_id), "ranking": {"profile": rank_profile, "listFeatures": "true"},}

Get random hits

The second Vespa request happens when we want to extend the datasetby adding randomly selected documents from the matched set.The request is contained in the function call get_random_hits(query, rank_profile, number_random_sample)where the only new parameter is number_random_sample,which specify how many documents we should sample from the matched set.

The body of the request is given by

body = { "yql": "select id, rankfeatures from sources * where (userInput(@userQuery))", "userQuery": query, "hits": number_random_sample, "ranking": {"profile": collect_features, "listFeatures": "true"},}

where the only changes with respect to the get_relevant_hit is that we no longer need to use the recall parameterand that we set the number of hits returned by Vespa to be equal to number_random_sample.

(Video) Zac Brown Band - Colder Weather (Official Music Video) | You Get What You Give

Remember we had configured the second phase to use random scoring:

second-phase { expression: random}

Using random as our second-phase ranking functionensures that the top documents returned by Vespa are randomly selectedfrom the set of documents that were matched by the query.

Annotated data

Once we have both the relevant and the random documents associated with a given query,we parse the Vespa result and store it in a file with the following format:

bm25(body) bm25(title) nativeRank(body) nativeRank(title) docid qid relevant
25.792076 12.117309 0.322567 0.084239 D312959 3 1
22.191228 0.043899 0.247145 0.017715 D3162299 3 0
13.880625 0.098052 0.219413 0.036826 D2823827 3 0

where the values in the relevant column are equal to 1 if document docid is relevant to the query qidand zero otherwise.

Data collection sanity check

In the process of writing this tutorial and creating the data collection logic described above,we found it useful to develop a data collection sanity-check to help us catch bugs in our process.There is no unique right answer here,but our proposal is to use the dataset to train a model using the same features and functional formused by the baseline you want to improve upon.If the dataset is well-built and contains useful information about the task you are interested in,you should be able to get results at least as good as the one obtained by your baseline on a separate test set.

In our case, the baseline is the ranking function used in our previous tutorial:

rank-profile bm25 inherits default { first-phase { expression: bm25(title) + bm25(body) }}

Therefore, our sanity-check model will be a linear model containing only the two features above,i.e. a + b * bm25(title) + c * bm25(body), where a, band c should be learned by using our collected dataset.

We split our dataset into training and validation sets,train the linear model and evaluate it on the validation dataset.We then expect the difference observed in the collected validation set between the model and the baselineto be similar to the difference observed on a running instance of Vespa when applied to an independent test set.In addition, we expect that the trained model to do at least as good as the baseline on a test set,given that the baseline model is contained in the set of possible trained modelsand is recovered when a=0, b=1 and c=1.

This is a simple procedure, but it did catch some bugs while we were writing this tutorial.For example, at one point we forgot to include

(Video) Sam Smith - Too Good At Goodbyes (Lyrics)

first-phase { expression: random}

in the collect_rank_features rank-profile leading to a biased datasetwhere the negative examples were actually quite relevant to the query.The trained model did well on the validation set,but failed miserably on the test set when deployed to Vespa.This showed us that our dataset probably had a different distribution than what was observed on a running Vespa instanceand led us to investigate and catch the bug.

Beyond pointwise loss functions

The most straightforward way to train the linear model mentioned in the previous sectionwould be to use a vanilla logistic regression,since our target variable relevant is binary.The most commonly used loss function in this case (binary cross-entropy)is referred to as a pointwise loss function in the LTR literature,as it does not take the relative order of documents into account.However, as we described in the previous tutorial,the metric that we want to optimize in this case is the Mean Reciprocal Rank (MRR).The MRR is affected by the relative order of the relevance we assign to the list of documents generated by a queryand not by their absolute magnitudes.This disconnect between the characteristics of the loss function and the metric of interestmight lead to suboptimal results.

For ranking search results, it is preferable to use a listwise loss function when training our linear model,which takes the entire ranked list into consideration when updating the model parameters.To illustrate this, we trained linear models using the TF-Ranking framework.The framework is built on top of TensorFlow and allow us to specify pointwise, pairwise and listwise loss functions,among other things.The following script was used to generate the results below(just remember to increase the number of training steps when using the script).

$ ./src/python/tfrank.py

The two rank-profile’s below are obtained by training the linear model with a pointwise (sigmoid cross-entropy)and listwise (softmax cross-entropy) loss functions, respectively:

rank-profile pointwise_linear_bm25 inherits default { first-phase { expression: 0.22499913 * bm25(title) + 0.07596389 * bm25(body) }}rank-profile listwise_linear_bm25 inherits default { first-phase { expression: 0.13446581 * bm25(title) + 0.5716889 * bm25(body) }}

It is interesting to see that a pointwise loss function set more weight into the title in relation to the bodywhile the opposite happens when using the listwise loss function.

The figure below shows how frequently (over more than 5.000 test queries)those two ranking functions allocate the relevant document between the 1st and 10th positionof the list of documents returned by Vespa.Although there is not a huge difference between those models on average,we can clearly see in the figure below that a model based on a listwise loss functionallocate more documents in the first two positions of the ranked list when compared to the pointwise model:

Improving Text Search through ML (1)

Overall, on average, there is not much difference between those models (with respect to MRR),which was expected given the simplicity of the models described here.The point was simply to point out the importance of choosing better loss functions when dealing with LTR tasksand to give a quick start for those who want to give it a shot in their own applications.We expect the difference in MRR between pointwise and listwise loss functions to increaseas we move on to more complex models.

(Video) Freddie Dredd - Limbo (Lyrics)

Next steps

In this tutorial we have looked at using a simple linear ranking function. Vespa integrates with several popular machine learning libraries which can be used for Machine Learned Ranking:

  • Ranking with XGBoost Models
  • Ranking with LightGBM Models
  • Ranking with Tensorflow Models
  • Ranking with ONNX Models

FAQs

How ml is used in search? ›

Search engines use machine learning algorithms to detect patterns in the URLs and body content of a page that help identify spam or duplicate content. They included common attributes of low quality content, such as: Multiple outbound links to unrelated pages. Excessive use of the same keywords.

Does Google search use ML? ›

Google uses machine learning algorithms to provide its customers with a valuable and personalized experience. Gmail, Google Search and Google Maps already have machine learning embedded in services.

What is text processing in ML? ›

The term text processing refers to the automation of analyzing electronic text. This allows machine learning models to get structured information about the text to use for analysis, manipulation of the text, or to generate new text.

Which is the best algorithm in ML? ›

Below is the list of Top 10 commonly used Machine Learning (ML) Algorithms:
  • Linear regression.
  • Logistic regression.
  • Decision tree.
  • SVM algorithm.
  • Naive Bayes algorithm.
  • KNN algorithm.
  • K-means.
  • Random forest algorithm.
14 Nov 2022

What ML technique helps answer questions? ›

Machine Learning technique that helps in answering the question and the group to which the data belongs to is Classification. * Classification in machine learning is a process of identifying to which of the set a group of experimental observations belongs.

Does Grammarly use ML? ›

At Grammarly, we are passionate about improving human communication. Core to this mission has been our work in natural language processing (NLP). We rely on our team's deep expertise in NLP, machine learning (ML), and linguistics to create a delightful product for Grammarly's 30 million daily active users.

What algorithm does ML agents use? ›

Training using two deep reinforcement learning algorithms, Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC)

Is ML used in face recognition? ›

Stay organized with collections Save and categorize content based on your preferences. With ML Kit's face detection API, you can detect faces in an image, identify key facial features, and get the contours of detected faces. Note that the API detects faces, it does not recognize people .

What are the 4 types of processing? ›

This lesson introduces students to four common types of processing: if/then (conditionals), finding a match (searching), counting, and comparing.

How to use machine learning for text analysis? ›

How does machine learning text analysis work?
  1. Gather the data. Decide what information you will study and how you will collect it. ...
  2. Prepare the data. Unstructured data needs to be prepared, or preprocessed. ...
  3. Apply a machine learning algorithm for text analysis. You can write your algorithm from scratch or use a library.
17 Dec 2020

Which is better ML or NLP? ›

Machine learning focuses on creating models that learn automatically and function without needing human intervention. On the other hand, NLP enables machines to comprehend and interpret written text.

Which algorithm is best for text classification? ›

Linear Support Vector Machine is widely regarded as one of the best text classification algorithms.

Which algorithm is suitable for text data? ›

Some of the most popular text classification algorithms include the Naive Bayes family of algorithms, support vector machines (SVM), and deep learning.

Which algorithm is used for text analysis? ›

There are many machine learning algorithms used in text classification. The most frequently used are the Naive Bayes (NB) family of algorithms, Support Vector Machines (SVM), and deep learning algorithms.

Which is the easiest ML algorithm? ›

K-means clustering

K-means clustering is one of the simplest and a very popular unsupervised machine learning algorithms.

How can I improve my ML algorithm? ›

  1. Method 1: Add more data samples. Data tells a story only if you have enough of it. ...
  2. Method 2: Look at the problem differently. ...
  3. Method 3: Add some context to your data. ...
  4. Method 4: Finetune your hyperparameter. ...
  5. Method 5: Train your model using cross-validation. ...
  6. Method 6: Experiment with a different algorithm. ...
  7. Takeaways.
17 Feb 2021

Should I AI or ML first? ›

So, should I learn machine learning or artificial intelligence first? If you're looking to get into fields such as natural language processing, computer vision or AI-related robotics then it would be best for you to learn AI first.

What are the 4 techniques based on artificial intelligence and machine learning? ›

In this post, we will go through the top most AI techniques: Heuristics, Natural Language Processing, Artificial Neural Networks, Machine Learning, Support Vector Machines, and Markov Decision Process.

What are the 2 types of learning in ML? ›

Based on the methods and way of learning, machine learning is divided into mainly four types, which are:
  • Supervised Machine Learning.
  • Unsupervised Machine Learning.
  • Semi-Supervised Machine Learning.
  • Reinforcement Learning.

What problems can be solved by ML? ›

9 Real-World Problems Solved by Machine Learning
  • Identifying Spam. Spam identification is one of the most basic applications of machine learning. ...
  • Making Product Recommendations. ...
  • Customer Segmentation. ...
  • Image & Video Recognition. ...
  • Fraudulent Transactions. ...
  • Demand Forecasting. ...
  • Virtual Personal Assistant. ...
  • Sentiment Analysis.

Do engineers use ML code? ›

Not only should Machine Learning Engineers possess knowledge of how to code and develop in programming languages such as Python, Java, and C++, many machine learning engineers also find it helpful to master the following machine learning tools and resources: TensorFlow. Spark and Hadoop. R Programming.

Does uber use AI or ML? ›

Uber AI is at the heart of AI-powered innovation and technologies at Uber. AI research and its applications solve challenges across the whole of Uber.

Do data engineers use ML? ›

Data engineers are primarily software engineers that specialize in data pipelines and ensuring that data flows where, when, and how it's needed for these models to actually work. They don't need to understand the machine learning or statistical models the way data scientists do.

What are 2 main types of machine learning algorithm? ›

There are four types of machine learning algorithms: supervised, semi-supervised, unsupervised and reinforcement.

Is ML an optimization? ›

Machine learning optimization is the process of adjusting hyperparameters in order to minimize the cost function by using one of the optimization techniques.

What ML algorithm does Spotify use? ›

Reinforcement learning (Rl) is a type of ML-based recommendation system that learns and responds to data from an interactive trial and error. Spotify uses RL to bring accurate and meaningful songs and artists to their subscribers' home pages.

How is ML used in NLP? ›

Machine Learning is an application of AI that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine Learning can be used to help solve AI problems and to improve NLP by automating processes and delivering accurate responses.

What is ML OCR? ›

Optical character recognition (OCR) is the process of recognizing characters from images using computer vision and machine learning techniques.

Is ML an NLP? ›

Machine learning (ML) for natural language processing (NLP) and text analytics involves using machine learning algorithms and “narrow” artificial intelligence (AI) to understand the meaning of text documents.

What are the 5 methods of data processing? ›

Methods of Data Processing
  • Single user programming.
  • Multiple programming.
  • Real-time processing.
  • On-line processing.
  • Time sharing processing.
  • Distributed processing.

What are the five methods of processing? ›

5 traditional food processing techniques explained
  • 1 Homogenisation. ...
  • 2 Pasteurisation. ...
  • 3 Canning. ...
  • 4 Drying. ...
  • 5 Smoking.

What are four strategies you can use to Analyse text? ›

Strategies for Developing Analysis
  • Find a Counterargument.
  • Invent a Counterargument or Misinterpretation.
  • Find a Significant Pattern.
  • Translate Stylization.
  • Explain Ambiguity.
  • Contrast with Fictional Alternative.
  • Use a Touchstone.

Which ML model is best for sentiment analysis? ›

Sentiment analysis models

Logistic regression is a good model because it trains quickly even on large datasets and provides very robust results. Other good model choices include SVMs, Random Forests, and Naive Bayes.

What is the best way to analyze a text? ›

How to analyze a text?
  1. Read or reread the text with specific questions in mind.
  2. Marshal basic ideas, events and names. ...
  3. Think through your personal reaction to the book: identification, enjoyment, significance, application.

Why Python is so popular in ML? ›

Python offers concise and readable code. While complex algorithms and versatile workflows stand behind machine learning and AI, Python's simplicity allows developers to write reliable systems. Developers get to put all their effort into solving an ML problem instead of focusing on the technical nuances of the language.

Which language is best for ML? ›

Python, C++, and java are general-purpose programming languages. You can use them to build almost any app. If you want to build your career in any of these machine learning languages, Iron Hack is here to help you.
...
The 5 Most In-Demand Machine Learning Languages in 2022
  • Python. ...
  • JavaScript. ...
  • R. ...
  • Java. ...
  • C++
7 Mar 2022

Why Python is best for ML? ›

Python is a straightforward programming language, generating code that is short, easy to read and understand. In fact, many developers consider Python to be the most intuitive of all programming languages. The language's simplicity makes it easy to construct trustworthy AI and ML models and systems.

How do you improve text classification accuracy? ›

How to get 90% accuracy with no preprocessing
  1. Convert raw text to a document.
  2. Tokenize document to break it up into words.
  3. Normalize the tokens to remove punctuation.
  4. Remove the stopwords.
  5. Reduce the remaining words to their lemma.
  6. Then I could create word embeddings.
27 Mar 2021

Is Knn good for text classification? ›

k-Nearest Neighbor is one of the most popular algorithms for text categorization[1]. Many researchers have found that the kNN algorithm achieves very good performance in their experiments on different data sets [2][3][4].

Is SVM good for text classification? ›

It pro- vides both theoretical and empirical evidence that SVMs are very well suited for text categorization. The theoretical analysis concludes that SVMs acknowledge the particular properties of text: a high dimensional feature spaces, b few irrelevant features dense concept vector , and c sparse instance vectors.

How do you analyze text data? ›

e are several ways that you can analyze text. You can: Count the occurrence of specific letters, words, or phrases, often summarized as Word Clouds.
...
There are six steps.
  1. Step 1 – Get the Data into a Spreadsheet. ...
  2. Step 2 – Scrub the Responses. ...
  3. Step 3 – Assign Descriptors. ...
  4. Step 5 – Repeat Steps 3 and 4. ...
  5. Step 6 – Analyze.
12 Feb 2017

How is machine learning used in text classification? ›

Text classification is a machine learning technique that automatically assigns tags or categories to text. Using natural language processing (NLP), text classifiers can analyze and sort text by sentiment, topic, and customer intent – faster and more accurately than humans.

What are the 3 algorithm analysis techniques? ›

In Sections 1.3 through 1.6, we explore three important techniques of algorithm design—divide-and-conquer, dynamic programming, and greedy heuristics.

Is NLP the same as text analysis? ›

Text mining (also referred to as text analytics) is an artificial intelligence (AI) technology that uses natural language processing (NLP) to transform the free (unstructured) text in documents and databases into normalized, structured data suitable for analysis or to drive machine learning (ML) algorithms.

What are the five steps of analyzing a text? ›

Guide students through the five steps of understanding and writing literary analysis: choosing and focusing a topic, gathering, presenting and analyzing textual evidence, and concluding.

What are the five steps of analysis a text? ›

Five steps to analyse complex texts
  • Read the text! It goes without saying that to be able to analyse a complex text, you need to read it first. ...
  • Expand your notes. Now is the time to start working on a broader understanding of the text. ...
  • Write your own topics. ...
  • Use the quotes in your text file to structure your responses.
3 May 2021

How machine learning is used in Google? ›

This is done using Image Recognition, wherein Deep Learning is used to sort millions of images on the internet in order to classify them more accurately. So using Deep Learning, the images that are classified as “Dog” in your Google Photos are displayed.

How is AI used in search engines? ›

Google and other search engines rely on complex AI to determine how content gets ranked. The algorithms used by these AI systems have many rules that prioritize different factors, from the types of keywords in your content to your site's user experience.

How can we use ML in website? ›

2. Develop your web application with Flask and integrate your model
  1. 2.1. Install Flask: ...
  2. 2.2. Import necessary libraries, initialize the flask app, and load our ML model: ...
  3. 2.3. Define the app route for the default page of the web-app : ...
  4. 2.4. Redirecting the API to predict the CO2 emission : ...
  5. 2.5. Starting the Flask Server :
5 Sept 2020

How is ML used in social media? ›

Simply put, machine learning enables computers to identify emotions behind specific content put up by a user on social media platforms. Sentiment analysis can be applied by businesses in social media as well as customer support for gathering feedback on a particular new product, design or service.

What are the 5 types of machine learning? ›

There are four types of machine learning algorithms: supervised, semi-supervised, unsupervised and reinforcement.

What are the 3 types of learning in machine learning? ›

The three machine learning types are supervised, unsupervised, and reinforcement learning.

What are the 4 types of data that machine learning can use? ›

What type of data does machine learning need? Data can come in many forms, but machine learning models rely on four primary data types. These include numerical data, categorical data, time series data, and text data.

Which search algorithm is best in AI? ›

A* search algorithm is the best algorithm than other search algorithms. A* search algorithm is optimal and complete.

What is a search strategy AI? ›

In Artificial Intelligence, Search techniques are universal problem-solving methods. Rational agents or Problem-solving agents in AI mostly used these search strategies or algorithms to solve a specific problem and provide the best result. Problem-solving agents are the goal-based agents and use atomic representation.

How is NLP used in search engines? ›

Natural Language Search is carried out in regular language, phrasing questions as you would ask them if you were speaking to a person. These queries can be typed right into a search engine, spoken aloud with voice search, or posed as a question to a virtual assistant like Siri or Cortana.

How do you use ML algorithms? ›

6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study
  1. Get a basic understanding of the algorithm.
  2. Find some different learning sources.
  3. Break the algorithm into chunks.
  4. Start with a simple example.
  5. Validate with a trusted implementation.
  6. Write up your process.
27 Sept 2018

How do you do ML algorithms? ›

How to build a machine learning model in 7 steps
  1. 7 steps to building a machine learning model. ...
  2. Understand the business problem (and define success) ...
  3. Understand and identify data. ...
  4. Collect and prepare data. ...
  5. Determine the model's features and train it. ...
  6. Evaluate the model's performance and establish benchmarks.
6 Apr 2021

Can text data be used in ML and why? ›

ML can work with different types of textual information such as social media posts, messages, and emails. Special software helps to preprocess and analyze this data.

How ML is used in Netflix? ›

As users browse through the company's thousands of movies, Netflix employs AI and ML to determine which visuals are most likely to captivate each viewer. In the year 2022, it is one of the greatest ways that Netflix efficiently uses artificial intelligence.

Why is ML so popular now? ›

Reduces Overload

Currently, there is an abundance of data. Data that is collected and stored from emails, social networks, blogs, webinars, RSS and podcasts is growing. Keeping track of useful data is hard. With the introduction of machine learning, it is easy to locate your information.

Videos

1. Ryan Sings The Boo Boo Kids Songs and Pretend Play Nursery Rhymes!!!
(Ryan's World)
2. Summer Walker - "Sense dat God gave you" with Sexyy Red
(Summer Walker)
3. What will AI Programming look like in 5 Years?
(Fireship)
4. Using AI To Code Better? ChatGPT and Copilot change everything
(Theo - t3․gg)
5. "Using AI/ML to improve customer experience" by Mpumelelo Cindi
(XprncDsgn)
6. U.S.A. For Africa - We Are the World
(USAforAfricaVEVO)
Top Articles
Latest Posts
Article information

Author: Duncan Muller

Last Updated: 11/10/2022

Views: 5591

Rating: 4.9 / 5 (59 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Duncan Muller

Birthday: 1997-01-13

Address: Apt. 505 914 Phillip Crossroad, O'Konborough, NV 62411

Phone: +8555305800947

Job: Construction Agent

Hobby: Shopping, Table tennis, Snowboarding, Rafting, Motor sports, Homebrewing, Taxidermy

Introduction: My name is Duncan Muller, I am a enchanting, good, gentle, modern, tasty, nice, elegant person who loves writing and wants to share my knowledge and understanding with you.