loader image

Bias in AI: A primer

Reading Time: 3 minutes

Bias in AI: A primer

While Artificial Intelligence (AI) systems can be highly accurate, they are imperfect. As such, they may make incorrect decisions or predictions. Several challenges need to be solved for the development and adoption of technology.
One major challenge is the bias in AI systems. Bias in AI refers to the systematic differences between a model’s predicted and true output. These deviations can lead to incorrect or unfair outcomes, which can seriously affect critical fields like healthcare, finance, and criminal justice.

AI algorithms are only as good as the data they are trained on. If the data is biased, the AI system will also be. This can lead to unfair and unequal treatment of particular groups of people based on race, gender, age or sexuality. Adding to these problematic ethical implications is the potential for misuse of the technology by corporations or politicians.
Several types of bias can occur in AI.

These include:

  • Selection/ Sampling bias: This occurs when the total dataset correctly represents the population, but sample of data used to build the model is only partially representative of the entire population. For example, if a model is trained on data from a single demographic group, it may not accurately predict outcomes for other groups.
  • Algorithmic bias: This occurs when the algorithms used to build the model are inherently biased. For example, a model that uses gender as a predictor variable may be biased against a particular gender. As a result, the model learns the incorrect representation of data and its relationships within and with the target variable.
  • Human bias: This occurs when the data used to train the model is biased due to human error or bias. For example, if a dataset contains mostly positive examples, a model trained on this data may be biased toward predicting positive outcomes.
  • Representation bias: This occurs when certain groups or individuals are underrepresented in the overall dataset from which the training data is taken. For example, a model trained on data from a predominantly white population may not accurately predict outcomes for people of color. Here the data is not inclusive of the anomalies, outliers and diversity of the population. This is distinct from sampling bias because it concerns the entire underlying dataset–not just the sample.

Recognizing and addressing bias is important, as it can lead to unfair and potentially harmful outcomes. Some steps that can be taken to mitigate bias in general include:

  • Ensuring that the training data is representative of the entire population
  • Using algorithms that are less prone to bias
  • Regularly checking for discrimination in the model’s predictions
  • Ensuring that the model is fair and unbiased in its treatment of different groups

By taking these steps, we can work towards building machine learning models that are more accurate, fair, and unbiased.

Additionally, bias can occur in a model at various stages of the model-building process.

These include:

  • The data gathering stage: Bias could be present in the data itself due to underlying disparities between groups or due to the data gathering process. For instance, data collected through a survey targeting a specific demographic is non-representative of the global population.
  • The labeling stage: Bias could be introduced at the labeling stage if the labeling process is biased against certain groups. Human annotators can have individual biases at different levels. This can skew the data that will be picked up during model training.
  • The modeling stage: Bias can also be introduced at the modeling stage as different models have learned information in different ways. For instance, some models may be better than others at learning examples that may occur more infrequently for the minority group.

Let’s get into more detail and learn how to mitigate each one of the types of bias in AI.

Mitigating selection/sampling bias

Selection/sample bias refers to the systematic differences between the training data used to build the model and the true population. This can occur when the training data sample is representative of only some of the population, resulting in an inaccurate or generalizable model.

With this bias type, the data has the complete representation of the entire population/scenario but ends up underrepresenting one or more demographics or samples during selection. There are several ways to mitigate sample bias in AI. These include:

  • Stratified sampling: This sampling method ensures that the training data is representative of the entire population by dividing the population into smaller groups (strata) and selecting a representative sample from each group.
  • Oversampling or undersampling: If certain groups or classes are underrepresented in the training data, oversampling can be used to increase the number of examples from these groups. Undersampling can be used to decrease the number of samples from groups that are overrepresented.
  • Data augmentation: Data augmentation involves generating synthetic examples from the training data to increase the size of the dataset. This can help the model learn more about the relationships between different features and improve its generalizability.

Mitigating algorithmic bias

Algorithmic bias refers to the inherent biases present in the algorithms used to build the model. These biases can arise from various sources, including the assumptions built into the algorithm and the data used to train the model.

There are several ways to mitigate algorithmic bias in machine learning. These include:

  • Using algorithms that are less prone to bias: Some algorithms, such as decision trees and logistic regression, are more prone to bias than others, such as support vector machines and neural networks. Choosing an algorithm that is less prone to bias can help mitigate the risk of bias in the model.
  • Using a diverse and representative dataset: Using a diverse and representative dataset to train even a biased algorithm can also help reduce algorithmic bias by ensuring that the model is exposed to a wide range of sufficient examples.
  • Fairness metrics: Various fairness metrics can be used to evaluate the fairness of a machine learning model. These metrics can help identify bias in the model and suggest ways to mitigate it.
  • Leveraging debiasing techniques: There are a few techniques that can be used to reduce bias in machine learning models, such as preprocessing the data to remove sensitive variables, applying regularization to the model, and using counterfactual data.

By taking these steps, it is possible to reduce algorithmic bias and build fairer and more accurate machine learning models.

Mitigating human bias

Human bias refers to the prejudices in the data used to train the model due to human error or opinion. These biases can arise from various sources that the human interacted with the model design directly or how the data was collected, labeled, or even processed.

There are several ways to mitigate human bias in AI. These include:

  • Automated data labeling techniques: Automated data labeling can help minimize human bias by removing the need for human annotators. This can be done using techniques such as active learning, which allows the model to select examples for labeling based on its current performance.
  • Using debiasing techniques: Various human-intensive techniques can reduce bias in machine learning models. These include preprocessing the data to remove sensitive variables, applying regularization to the model, and using counterfactual data.
  • Monitoring and assessing the model’s performance: It is essential to monitor and evaluate the model’s performance to ensure that it is not exhibiting bias. This can be done using various fairness metrics and comparing the model’s predictions to the true outcomes.

Mitigating representation bias

Representation bias refers to the biases present in the training data due to the underrepresentation of certain groups or individuals. This can occur when certain groups or individuals are not included in the training data or are significantly underrepresented.

With this bias type, the data itself is incompletely representing the entire population/scenario. Thus the model is trained on underrepresented scenarios leading to bias. There are several ways to mitigate representation bias in machine learning. These include:

  • Using a diverse and representative dataset: Ensuring that the training data is representative of the entire population can help reduce representation bias by ensuring that the model is exposed to various examples from different groups.
  • Use oversampling or undersampling: If certain groups or classes are underrepresented in the training data, oversampling can be used to increase the number of examples from these groups. Undersampling can be used to decrease the number of examples from groups that are overrepresented.
  • Use data augmentation: Data augmentation involves generating synthetic examples from the training data to increase the size of the dataset. This can help the model learn more about the relationships between different features and improve its generalizability.
  • Use fairness metrics: Various fairness metrics can be used to evaluate the fairness of a machine learning model. These metrics can help identify bias in the model and suggest ways to mitigate it.

In this blog, we focused on bias in AI. We decoded some of the crucial questions around this topic, from types of bias to preventive methods. We hope you use the tips discussed to design fair and responsible AI systems.

Author

Manish Singh

Senior Specialist - Data Science, Fosfor

Manish Singh has 11+ years of progressive experience in executing data-driven solutions. He is adept at handling complex data problems, implementing efficient data processing, and delivering value. He is proficient in machine learning and statistical modelling algorithms/techniques for identifying patterns and extracting valuable insights. He has a remarkable track record of managing complete software development lifecycles and accomplishing mission-critical projects. And finally, he is highly competent in blending data science techniques with business understanding to transform data into business value seamlessly.

More on the topic

Read more thought leadership from our team of experts

Empowering organizations to solve attrition with AI

Employees who start and end their careers in a single business organization rarely come by. Employees often switch jobs after a few years of service in any given organization. Although the reasons may vary on a case-to-case basis, these switches could be either voluntary attrition, or organization-driven.

Read more

AI in a box: How Refract simplifies end-to-end machine learning

The modern tech world has become a data hub reliant on processing. Today, there is user data on everything from driving records to scroll speed on social media applications. As a result, there has been a considerable demand for methods to process this data, given that it holds hidden insights that can propel a company into the global stage quicker than ever before.

Read more

Generative AI - Accelerate ML operations using GPT

As Data Science and Machine Learning practitioners, we often face the challenge of finding solutions to complex problems. One powerful artificial intelligence platform that can help speed up the process is the use of Generative Pretrained Transformer 3 (GPT-3) language model.

Read more
We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. View more
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active

What is a cookie?

A cookie is a small piece of data that a website asks your browser to store on your computer or mobile device. The cookie allows the website to “remember” your actions or preferences over time. On future visits, this data is then returned to that website to help identify you and your site preferences. Our websites and mobile sites use cookies to give you the best online experience. Most Internet browsers support cookies; however, users can set their browsers to decline certain types of cookies or specific cookies. Further, users can delete cookies at any time.

Why do we use cookies?

We use cookies to learn how you interact with our content and to improve your experience when visiting our website(s). For example, some cookies remember your language or preferences so that you do not have to repeatedly make these choices when you visit one of our websites.

What kind of cookies do we use?

We use the following categories of cookie:

Category 1: Strictly Necessary Cookies

Strictly necessary cookies are those that are essential for our sites to work in the way you have requested. Although many of our sites are open, that is, they do not require registration; we may use strictly necessary cookies to control access to some of our community sites, whitepapers or online events such as webinars; as well as to maintain your session during a single visit. These cookies will need to reset on your browser each time you register or log in to a gated area. If you block these cookies entirely, you may not be able to access gated areas. We may also offer you the choice of a persistent cookie to recognize you as you return to one of our gated sites. If you choose not to use this “remember me” function, you will simply need to log in each time you return.
Cookie Name Domain / Associated Domain / Third-Party Service Description Retention period
__cfduid Cloudflare Cookie associated with sites using CloudFlare, used to speed up page load times 1 Year
lidc linkedin.com his is a Microsoft MSN 1st party cookie that ensures the proper functioning of this website. 1 Day
PHPSESSID lntinfotech.com Cookies named PHPSESSID only contain a reference to a session stored on the web server When the browsing session ends
catAccCookies lntinfotech.com Cookie set by the UK cookie consent plugin to record that you accept the fact that the site uses cookies. 29 Days
AWSELB Used to distribute traffic to the website on several servers in order to optimise response times. 2437 Days
JSESSIONID linkedin.com Preserves users states across page requests. 334,416 Days
checkForPermission bidr.io Determines whether the visitor has accepted the cookie consent box. 1 Day
VISITOR_INFO1_LIVE Tries to estimate users bandwidth on the pages with integrated YouTube videos. 179 Days
.avia-table-1 td:nth-of-type(1):before { content: 'Cookie Name'; } .avia-table-1 td:nth-of-type(2):before { content: 'Domain / Associated Domain / Third-Party Service'; } .avia-table-1 td:nth-of-type(3):before { content: 'Description'; } .avia-table-1 td:nth-of-type(4):before { content: 'Retention period'; }

Category 2: Performance Cookies

Performance cookies, often called analytics cookies, collect data from visitors to our sites on a unique, but anonymous basis. The results are reported to us as aggregate numbers and trends. LTI allows third-parties to set performance cookies. We rely on reports to understand our audiences, and improve how our websites work. We use Google Analytics, a web analytics service provided by Google, Inc. (“Google”), which in turn uses performance cookies. Information generated by the cookies about your use of our website will be transmitted to and stored by Google on servers Worldwide. The IP-address, which your browser conveys within the scope of Google Analytics, will not be associated with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser. However, you have to note that if you do this, you may not be able to use the full functionality of our website. You can also opt-out from being tracked by Google Analytics from any future instances, by downloading and installing Google Analytics Opt-out Browser Add-on for your current web browser: https://tools.google.com/dlpage/gaoptout & cookiechoices.org and privacy.google.com/businesses
Cookie Name Domain / Associated Domain / Third-Party Service Description Retention period
_ga lntinfotech.com Used to identify unique users. Registers a unique ID that is used to generate statistical data on how the visitor uses the web site. 2 years
_gid lntinfotech.com This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. 1 day
_gat lntinfotech.com Used by Google Analytics to throttle request rate 1 Day
.avia-table-2 td:nth-of-type(1):before { content: 'Cookie Name'; } .avia-table-2 td:nth-of-type(2):before { content: 'Domain / Associated Domain / Third-Party Service'; } .avia-table-2 td:nth-of-type(3):before { content: 'Description'; } .avia-table-2 td:nth-of-type(4):before { content: 'Retention period'; }

Category 3: Functionality Cookies

We may use site performance cookies to remember your preferences for operational settings on our websites, so as to save you the trouble to reset the preferences every time you visit. For example, the cookie may recognize optimum video streaming speeds, or volume settings, or the order in which you look at comments to a posting on one of our forums. These cookies do not identify you as an individual and we don’t associate the resulting information with a cookie that does.
Cookie Name Domain / Associated Domain / Third-Party Service Description Retention period
lang ads.linkedin.com Set by LinkedIn when a webpage contains an embedded “Follow us” panel. Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in. When the browsing session ends
lang linkedin.com In most cases it will likely be used to store language preferences, potentially to serve up content in the stored language. When the browsing session ends
YSC Registers a unique ID to keep statistics of what videos from Youtube the user has seen. 2,488,902 Days
.avia-table-3 td:nth-of-type(1):before { content: 'Cookie Name'; } .avia-table-3 td:nth-of-type(2):before { content: 'Domain / Associated Domain / Third-Party Service'; } .avia-table-3 td:nth-of-type(3):before { content: 'Description'; } .avia-table-3 td:nth-of-type(4):before { content: 'Retention period'; }

Category 4: Social Media Cookies

If you use social media or other third-party credentials to log in to our sites, then that other organization may set a cookie that allows that company to recognize you. The social media organization may use that cookie for its own purposes. The Social Media Organization may also show you ads and content from us when you visit its websites.

Ref links:

LinkedInhttps://www.linkedin.com/legal/privacy-policy Twitterhttps://gdpr.twitter.com/en.html & https://twitter.com/en/privacy & https://help.twitter.com/en/rules-and-policies/twitter-cookies Facebookhttps://www.facebook.com/business/gdpr Also, if you use a social media-sharing button or widget on one of our sites, the social network that created the button will record your action for its own purposes. Please read through each social media organization’s privacy and data protection policy to understand its use of its cookies and the tracking from our sites, and also how to control such cookies and buttons.

Category 5: Targeting/Advertising Cookies

We use tracking and targeting cookies, or ask other companies to do so on our behalf, to send you emails and show you online advertising, which meet your business and professional interests. If you have registered on our websites, we may send you emails, tailored to reflect the interests you have shown during your visits. We ask third-party advertising platforms and technology companies to show you our ads after you leave our sites (retargeting technology). This technology allows us to make our website services more interesting for you. Retargeting cookies are used to record anonymized movement patterns on a website. These patterns are used to tailor banner advertisements to your interests. The data used for retargeting is completely anonymous, and is only used for statistical analysis. No personal data is stored, and the use of the retargeting technology is subject to the applicable statutory data protection regulations. We also work with companies to reach people who have not visited our sites. These companies do not identify you as an individual, instead rely on a variety of other data to show you advertisements, for example, behavior across websites, information about individual devices, and, in some cases, IP addresses. Please refer below table to understand how these third-party websites collect and use information on our behalf and read more about their opt out options.
Cookie Name Domain / Associated Domain / Third-Party Service Description Retention period
BizoID ads.linkedin.com These cookies are used to deliver adverts more relevant to you and your interests 183 days
iuuid demandbase.com Used to measure the performance and optimization of Demandbase data and reporting 2 years
IDE doubleclick.net This cookie carries out information about how the end user uses the website and any advertising that the end user may have seen before visiting the said website. 2,903,481 Days
UserMatchHistory linkedin.com This cookie is used to track visitors so that more relevant ads can be presented based on the visitor’s preferences. 60,345 Days
bcookie linkedin.com This is a Microsoft MSN 1st party cookie for sharing the content of the website via social media. 2 years
__asc lntinfotech.com This cookie is used to collect information on consumer behavior, which is sent to Alexa Analytics. 1 Day
__auc lntinfotech.com This cookie is used to collect information on consumer behavior, which is sent to Alexa Analytics. 1 Year
_gcl_au lntinfotech.com Used by Google AdSense for experimenting with advertisement efficiency across websites using their services. 3 Months
bscookie linkedin.com Used by the social networking service, LinkedIn, for tracking the use of embedded services. 2 years
tempToken app.mirabelsmarketingmanager.com When the browsing session ends
ELOQUA eloqua.com Registers a unique ID that identifies the user’s device upon return visits. Used for auto -populating forms and to validate if a certain contact is registered to an email group . 2 Years
ELQSTATUS eloqua.com Used to auto -populate forms and validate if a given contact has subscribed to an email group. The cookies only set if the user allows tracking . 2 Years
IDE doubleclick.net Used by Google Double Click to register and report the website user’s actions after viewing clicking one of the advertiser’s ads with the purpose of measuring the efficiency of an ad and to present targeted ads to the user. 1 Year
NID google.com Registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads. 6 Months
PREF youtube.com Registers a unique ID that is used by Google to keep statistics of how the visitor uses YouTube videos across different web sites. 8 months
test_cookie doubleclick.net This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor’s browser supports cookies. 1,073,201 Days
UserMatchHistory linkedin.com Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor’s preferences. 29 days
VISITOR_INFO1_LIVE youtube.com 179 days
.avia-table-4 td:nth-of-type(1):before { content: 'Cookie Name'; } .avia-table-4 td:nth-of-type(2):before { content: 'Domain / Associated Domain / Third-Party Service'; } .avia-table-4 td:nth-of-type(3):before { content: 'Description'; } .avia-table-4 td:nth-of-type(4):before { content: 'Retention period'; }
Third party companies Purpose Applicable Privacy/Cookie Policy Link
Alexa Show targeted, relevant advertisements https://www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html To opt out: http://www.bluekai.com/consumers.php#optout
Eloqua Personalized email based interactions https://www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html To opt out: https://www.oracle.com/marketingcloud/opt-status.html
CrazyEgg CrazyEgg provides visualization of visits to website. https://help.crazyegg.com/article/165-crazy-eggs-gdpr-readiness Opt Out: DAA: https://www.crazyegg.com/opt-out
DemandBase Show targeted, relevant advertisements https://www.demandbase.com/privacy-policy/ Opt out: DAA: http://www.aboutads.info/choices/
LinkedIn Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites https://www.linkedin.com/legal/privacy-policy Opt-out: https://www.linkedin.com/help/linkedin/answer/62931/manage-advertising-preferences
Google Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites https://policies.google.com/privacy Opt Out: https://adssettings.google.com/ NAI: http://optout.networkadvertising.org/ DAA: http://optout.aboutads.info/
Facebook Show targeted, relevant advertisements https://www.facebook.com/privacy/explanation Opt Out: https://www.facebook.com/help/568137493302217
Youtube Show targeted, relevant advertisements. Show embedded videos on LTI websites https://policies.google.com/privacy Opt Out: https://adssettings.google.com/ NAI: http://optout.networkadvertising.org/ DAA: http://optout.aboutads.info/
Twitter Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites https://twitter.com/en/privacy Opt out: https://twitter.com/personalization DAA: http://optout.aboutads.info/
. .avia-table tr {} .avia-table th, .flex_column .avia-table td { color: #343434; padding: 5px !important; border: 1px solid #ddd !important; } .avia-table th {background-color: #addeec;} .avia-table tr:nth-child(odd) td {background-color: #f1f1f1;}
Save settings
Cookies settings