Trending November 2023 # Top 8 Computer Vision Techniques Entwined With Deep Learning # Suggested December 2023 # Top 12 Popular

You are reading the article Top 8 Computer Vision Techniques Entwined With Deep Learning updated in November 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested December 2023 Top 8 Computer Vision Techniques Entwined With Deep Learning

Edge AI will be a crucial technology for bringing deep learning from the cloud to the edge. Here are 8 computer vision techniques entwined with deep learning. 1. Tumor Detection

In the medical area, computer vision and deep learning applications have shown to be quite useful, particularly in the precise diagnosis of brain cancers. If left untreated, brain tumors spread swiftly to other areas of the brain and spinal cord, making early discovery critical to the patient’s survival. Medical experts may employ computer vision software to speed up and simplify the detection procedure.  

2. Medical Imaging

Computer vision has been utilized in a variety of healthcare applications to help doctors make better treatment decisions for their patients. Medical imaging, also known as medical image analysis, is a technique for seeing specific organs and tissues in order to provide a more precise diagnosis. With medical image analysis, physicians and surgeons may get a better look into the patient’s interior organs and spot any problems or anomalies. Medical imaging includes X-ray radiography, ultrasound, MRI, endoscopy, and other procedures.  

3. Cancer Detection

Deep-learning computer vision models have attained physician-level accuracy when it comes to diagnosing moles and melanomas. Skin cancer, for example, can be difficult to diagnose early since the symptoms are often similar to those of other skin conditions. As a solution, scientists have used computer vision technologies to successfully distinguish between malignant and non-cancerous skin lesions. There are various benefits to employing computer vision and deep learning systems to identify breast cancer, according to research. It can assist automate the detection process and limiting the likelihood of human mistakes by using a large library of photos including both healthy and malignant tissue.  

4. Medical Training

Not just for medical diagnosis, but also for medical skill development, computer vision is frequently employed. Currently, surgeons are not only reliant on the conventional method of learning skills via hands-on experience in the operating room. Simulation-based surgical platforms, on the other hand, have shown to be an excellent tool for teaching and testing surgical abilities. Surgical simulation gives trainees the opportunity to practice their surgical abilities before entering the operating room. It allows them to receive thorough feedback and evaluations of their performance, helping them to develop a better understanding of patient care and safety before performing surgery on them.  

5. Combating Covid-19

The Covid-19 epidemic has presented a major threat to the worldwide healthcare system. With governments all around the world attempting to battle the disease, computer vision can make a huge contribution to overcoming this obstacle. Computer vision applications can help in the diagnosis, treatment, control, and prevention of Covid-19 thanks to rapid technological improvements. In conjunction with computer vision programs like COVID-Net, digital chest x-ray radiography pictures may readily diagnose illness in patients. The prototype program, built by Darwin AI in Canada, has shown results in covid detection with a 92.4 percent accuracy.  

6. Health Monitoring

Medical practitioners are increasingly using computer vision and AI technologies to track their patients’ health and fitness. Doctors and surgeons can make better judgments in less time using these assessments, especially in emergency situations. Computer vision models can assess whether a patient has reached its final stage by measuring the volume of blood lost during surgery. One such application is Gauss Surgical’s Triton, which successfully monitors and calculates the volume of blood lost during surgery. It aids surgeons in determining how much blood the patient will require during or after surgery.  

7. Machine-assisted Diagnosis

These technologies can assist doctors in detecting malignancy by detecting tiny changes in tumors. Such instruments can assist in the discovery, prevention, and treatment of a variety of illnesses by scanning medical images.  

8. Timely Detection of Disease

The patient’s life and death are dependent on prompt identification and treatment for a variety of disorders such as cancer and tumors. Early detection of symptoms increases the patient’s chances of survival. Computer vision applications are educated with large volumes of data, such as hundreds of photos, in order to detect even the tiniest differences with high accuracy. As a consequence, medical practitioners may spot minor alterations that would otherwise go unnoticed by the naked eye.  

More Trending Stories 

You're reading Top 8 Computer Vision Techniques Entwined With Deep Learning

Top 5 Computer Vision Use Cases In Agriculture In 2023

The agriculture sector is one of the most important industries in the world since it is the source of our food. As digital technologies revolutionize every industry, agriculture is no exception. Like every other sector, the agriculture sector also faces various challenges, including climate change, labor shortage, and the disruptions created by the pandemic.

Digital technologies such as computer vision can help the agricultural sector overcome these challenges and achieve efficiency, resiliency, and sustainability.

This article explores 5 computer vision use cases that can help agriculture tackle current challenges and excel in the future.

1. Crop monitoring with drones

Drone technology is being extensively used in the agriculture sector to overcome labor shortages and improve efficiency. The market for drones in agriculture is projected to reach $3.7 billion by 2027.

In precision agriculture crop monitoring, drones are installed with a high-definition camera which is enabled with computer vision and geothermal technology to:

Detect crop condition and health

Monitor soil condition

Map the farmland according to the crop area

Detect abnormalities

These drones can be highly efficient and can cover a large area much faster and more accurately than human monitoring.

Source: Business insider

However, investing in drones enabled with computer vision can be expensive; therefore it is important to study the business, short/long-term expectations, and ROI before purchasing such technologies.

2. Crop sorting and grading

Computer vision-enabled machines are being extensively used in sorting and grading the harvest. Since these jobs involve repetitive and time-consuming tasks, automating them can offer efficiency and speed.

Through machine vision systems, crops of different types can be identified and sorted based on order requirements. For example, some orders require large size potatoes, and some require medium-sized ones. A machine vision system can do this in a fraction of the time it would take to do it manually.

Machine vision systems can also sort products based on perishability to identify which batch to ship first and which ones to ship later.

Check out this computer vision-enabled apple grading machine.

Computer vision systems are also used in counting fruits and vegetables. Check out this example of a computer vision system counting apples directly from trees.

3. Pesticide spraying with drones

Spraying pesticides on crops is a common practice to protect the produce from pests and diseases. However, this can be a time-consuming process and if inhaled, it can be harmful to the farmer’s health. 

Automated drones can perform this task with higher precision and speed. Drones with spray guns and cameras enabled with computer vision can identify areas that need pesticide and spray accordingly in required amounts.

4. Computer vision phenotyping

Phenotyping refers to measuring and analyzing plants’ characteristics for research purposes. Information is gathered to learn how plants grow, what environment is best for specific plants, and insight into plant genetics. 

In the past, it was done manually, but now it is performed through AI and computer vision. As climate change threatens the agricultural sector, computer vision-enabled phenotyping enables breeders to learn more about plants to make them more resilient to the changing weather. It also helps farmers in finding the crop that would be most successful and sustainable. 

Watch this short video to learn how computer vision-enabled phenotyping works.

5. Livestock farming

Artificial intelligence is being widely used in the livestock farming market. The investment in AI is projected to significantly increase by 2026 and computer vision accounts for the largest chunk of that market.

Figure 1. Overview of AI livestock market increase from 2023 to 2026

Source: GMC

Computer vision technology combined with IoT can provide the following benefits for precision livestock farming:

Monitor the health of all livestock including cattle, livestock, sheep, pigs, and poultry 

Examine the health of the livestock with high definition cameras

Monitor food supply for the livestock

Detect abnormal behavior of the livestock

Counting livestock through drones

Send real-time information to the farmers for farm management planning and decision making

A recent study was conducted on a computer vision and deep learning system to monitor dairy cows with accuracy and with real-time data. The system successfully identified cows through pelt patterns, evaluated their position, understood the actions of the cows, and tracked movement.

Source: ScienceDirect

To ensure the success of your computer vision projects you can find the best annotation solutions from our sortable and filterable lists of:

Further reading

You have any questions, feel free to contact us:

Shehmir Javaid

Shehmir Javaid is an industry analyst at AIMultiple. He has a background in logistics and supply chain management research and loves learning about innovative technology and sustainability. He completed his MSc in logistics and operations management from Cardiff University UK and Bachelor’s in international business administration From Cardiff Metropolitan University UK.





Implementing Computer Vision – Face Detection

This article was published as a part of the Data Science Blogathon


Computational Vision is the part of Artificial Intelligence, which aims to design intelligent algorithms that have the ability to see as if it were a human vision.

In this article, we’ll cover three of the main scopes.

Face Detection

Object Detection

Facial Recognition

Object Tracking

In this first article, we will focus on the introduction of computer vision, and the face identification application based on the Python OpenCV library. In future articles, we will demonstrate the applications of object identification, face recognition, and object tracking in real-time video.

1. Introduction

2. Face Detection Algorithm

3. Face Detection Implementation

4  Alternative to OpenCV

5  Conclusions

6 References


The reader of this article will be able to understand the functioning of several visual computing applications, their operation in the background and architecture, and what are the necessary steps to implement an application for real use.

Let’s now look at some of the other applications that can be developed in this area that we’ve already discussed earlier.

Face detection will put a little square when it finds faces and face recognition will put a name for those people. We’re going to do an implementation somewhat similar to this one. We have this image of Kinect from Microsoft which is integrated with the Xbox video game which is motion detection.

You can use computer vision to detect the person controlling the car while moving the steering wheel. Computer vision techniques for image recognition are needed, that is, the robot needs to see what is in front of it to make a decision.

Another example is autonomous cars. You can notice that there is a series of sensors in this car, for example, it needs to detect pedestrians to avoid hitting a person.

You need to detect traffic signs or if you have a traffic light.

If the signal is red it has to stop if the signal is green it has to continue. For this, computer vision techniques are used, including those used. This face detection technology is also used for object detection.

Finally, we have this other example which is called deep Durin which is an image generated by a neural network. These are hallucinogenic images you can see that it has some characteristics some animal traces in parts of this image ie algorithms already have information about the animals about these images of animals are very similar to that idea of ​​evolutionary neural networks and there is an algorithm that will combine the characteristics of an image with that image of the landscape, for example.

An example of application is the deep faces, which are faces of people created by artificial intelligence.

Face Detection Algorithm

The Cascade Classifier is an algorithm you will learn to classify a certain object to start training. We need two sets of images the first set with faces with the positive images you want to detect and the second set of images are called the images negative that are images of anything but ease.

If you for example want to detect cars you will have as positive images the cars various models and types of vehicles as negative images.

Any other type of image and you need to have these two sets of images for you to submit to the algorithm to train.

There is training with this algorithm called Ada boost in the machine learning area. I won’t go into details of how this algorithm works, but basically, you apply this algorithm to both positive and negative images and the idea is to make the selection of features.

We have several features or these little squares with these black and white colors and the idea for you to classify a face and apply those features.

These little squares are for each subwindow of an image.

This window concept indicates that it moves from left to right and top to bottom.

Face Detection Implementation

We will use the Python OpenCV Library, which is one of the main tools on the market for developing Visual Computing applications.



Let’s now show our code in Python:


import cv2 # OpenCV Import img = cv2.imread('/content/imagem-computer-vision.jpg', cv2.IMREAD_UNCHANGED) # Import Image with Peoples cv2_imshow(img)

detector_face = cv2.CascadeClassifier('/content/haarcascade_frontalface_default.xml') imagem_cinza = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) cv2_imshow(imagem_cinza)

deteccoes = detector_face.detectMultiScale(imagem_cinza, scaleFactor=1.3, minSize=(30,30)) deteccoes array([[1635, 156, 147, 147], [ 284, 262, 114, 114], [1149, 260, 129, 129], [ 928, 491, 171, 171], [ 222, 507, 151, 151]], dtype=int32) for (x, y, l, a) in deteccoes: #print(x, y, l, a) cv2.rectangle(img, (x, y), (x + l, y + a), (0,255,0), 2) cv2_imshow(img)

We visualized the processing for the identification of faces through the Google Colab notebook:

Arrests he will return the number 5 indicates that he detected seven faces and we have these points that indicate each of the faces for you to understand better and let’s use this last face.

len(deteccoes) # Fotal Faces= 5 5

Alternative to OpenCV


In selecting the alternatives to OpenCV, we adopted the following criteria:

Ease of adoption of the technology






Below is a list of my alternatives, following the criteria above:

1) Microsoft Computer Vision API

2) AWS Rekognition

3) Google Cloud Vision API

4) Scikit-Image

5) SimpleCV

6) Azure Face API

7) DeepDream

8) IBM Watson Visual Recognition

9) Clarifi

10) DeepPy


In this article, we use the Python OpenCV library, as a tool that speeds up the identification of Face, in an agile and efficient way.

With the help of this article, the Data Scientist will be able to implement other Visual Computing applications, such as identification of the use of masks, body temperature, social distance in supermarkets, object identification, facial recognition, real-time object tracking.


Author Reference:

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.


Playing Super Mario Bros With Deep Reinforcement Learning

This article was published as a part of the Data Science Blogathon


From this article, you will learn how to play Super Mario Bros with Deep Q-Network and Double Deep Q-Network (with code!).

Photo by Cláudio Luiz Castro on Unsplash

Super Mario Bros is a well-known video game title developed and published by Nintendo in the 1980s. It is one of the classical game titles that lived through the years and need no explanations. It is a 2D side-scrolling game, allowing the player to control the main character — Mario.

The game environment was taken from the OpenAI Gym using the Nintendo Entertainment System (NES) python emulator. In this article, I will show how to implement the Reinforcement Learning algorithm using Deep Q-Network (DQN) and Deep Double Q- Network (DDQN) algorithm using PyTorch library to examine each of their performance. The experiments conducted on each algorithm were then evaluated.

Data understanding and preprocessing

The original observation space for Super Mario Bros is 240 x 256 x 3 an RGB image. And the action space is 256 which means the agent is able to take 256 different possible actions. In order to speed up the training time of our model, we have used the gym’s wrapper function to apply certain transformations to the original environment:

Repeating each action of the agent over 4 frames and reducing the video frame size, i.e. each state in the environment is a 4 x 84 x 84 x 1 (a list of 4 continuous 84 x 84 grayscale pixel frames)

Normalizing the pixel value to the range from 0 to 1

Reducing the number of actions to 5 (Right only), 7 (Simple movement) and 12 (Complex movement)

Theoretical Results

Initially, I am thinking to perform an experiment using Q-learning which uses a 2-D array to store all possible combinations of state and action pair values. However, in this environment setting, I realized that it is impossible to apply Q-learning as there is a requirement to store a very large Q-table and this is not feasible.

Therefore, this project used the DQN algorithm as the baseline model. DQN algorithms use Q- learning to learn the best action to take in the given state and a deep neural network to estimate the Q- value function.

The type of deep neural network I used is a 3 layers convolutional neural network followed by two fully connected linear layers with a single output for each possible action. This network works like the Q-table in the Q-Learning algorithm. The objective loss function we used is Huber loss or smoothed mean absolute error on Q-values. Huber loss combines both MSE and MAE to minimize the objective function. The optimizer we used to optimize the objective function is Adam.

However, the DQN network has the problem of overestimation.

Figure 1: Illustration of how the DQN network is overestimated

There are 2 main reasons for overestimation as shown in Fig1. The first reason is due to the maximization function used to calculate the target value. Let’s assume the True action-values are denoted by: x(a₁) … x(aₙ). Noisy estimations made by DQN are denoted by Q(s,a₁;w), … Q(s, aₙ;w) Mathematically,

therefore it is an overestimation over true Q-value.

The second reason is that overestimated Q-values are again being used to update the weights of the Q-Network through backward propagation. This made overestimation more severe.

The main drawback of the overestimation is due to non-uniform overestimation done by DQN. The intuition is that the more frequent a specific state, action pairs appear in the replay buffer, the more overestimation is done on that state-action pairs.

To obtain a more accurate Q-value, we would like to use the DDQN network on our problem then compare the experiment result against the previous DQN network. To alleviate the overestimation caused by maximization, DDQN uses 2 Q-network, one for getting actions and another for updating the weights through backpropagation. The DDQN Q-learning update equation is:

Q* is the one for updating weight and Q^ is the one for getting actions for a specific state. Q^ simply copies the values of Q* every n step.

Experiment Results

There were 5 experiments conducted based on different movements of the agent using 2 algorithms, DQN and DDQN. The different movements are complex movements, simple movements, and right-only movements.

The parameters settings are as follows :

Observation space: 4 x 84 x 84 x 1

Action space: 12 (Complex Movement) or 7 (Simple Movement) or 5 (Right only movement)

Loss function: HuberLoss with δ = 1

Optimizer: Adam with lr = 0.00025

betas = (0.9, 0.999)

Batch size = 64 Dropout = 0.2

gamma = 0.9

Max memory size for experience replay = 30000

For epsilon greedy: Exploration decay = 0.99, Exploration min = 0.05

At the beginning of exploration, max = 1, the agent will take random action. After each episode, it will decay by the exploration decay rate until it reaches an exploration min of 0.05.

Experiment 1

The first experiment conducted was to compare DDQN and DQN algorithms for the complex movement of the agent.

Experiment 2 Experiment 3

From the above 3 experiment results, we can see that in all cases, DQN’s performance at episode 10,000 is approximately the same as DDQN’s performance at episode 2,000. So, we can conclude that the DDQN network helps to eliminate the overestimation problem caused by the DQN network.

A further experiment conducted using DDQN and DQN for the 3 different movements.

Experiment 4

The fourth experiment conducted was using the DDQN algorithm on all 3 different movements.

Experiment 5

From the above 2 experiment results, we can conclude that the network is able to train better on right-only movement action space which only allows the agent to move to the right.

Codes import torch import chúng tôi as nn import random from nes_py.wrappers import JoypadSpace import gym_super_mario_bros from tqdm import tqdm import pickle from gym_super_mario_bros.actions import RIGHT_ONLY, SIMPLE_MOVEMENT, COMPLEX_MOVEMENT import gym import numpy as np import collections import cv2 import matplotlib.pyplot as plt %matplotlib inline import time import pylab as pl from IPython import display class MaxAndSkipEnv(gym.Wrapper): """ Each action of the agent is repeated over skip frames return only every `skip`-th frame """ def __init__(self, env=None, skip=4): super(MaxAndSkipEnv, self).__init__(env) # most recent raw observations (for max pooling across time steps) self._obs_buffer = collections.deque(maxlen=2) self._skip = skip def step(self, action): total_reward = 0.0 done = None for _ in range(self._skip): obs, reward, done, info = self.env.step(action) self._obs_buffer.append(obs) total_reward += reward if done: break max_frame = np.max(np.stack(self._obs_buffer), axis=0) return max_frame, total_reward, done, info def reset(self): """Clear past frame buffer and init to first obs""" self._obs_buffer.clear() obs = self.env.reset() self._obs_buffer.append(obs) return obs class MarioRescale84x84(gym.ObservationWrapper): """ Downsamples/Rescales each frame to size 84x84 with greyscale """ def __init__(self, env=None): super(MarioRescale84x84, self).__init__(env) self.observation_space = gym.spaces.Box(low=0, high=255, shape=(84, 84, 1), dtype=np.uint8) def observation(self, obs): return MarioRescale84x84.process(obs) @staticmethod def process(frame): if chúng tôi == 240 * 256 * 3: img = np.reshape(frame, [240, 256, 3]).astype(np.float32) else: assert False, "Unknown resolution." # image normalization on RBG img = img[:, :, 0] * 0.299 + img[:, :, 1] * 0.587 + img[:, :, 2] * 0.114 resized_screen = cv2.resize(img, (84, 110), interpolation=cv2.INTER_AREA) x_t = resized_screen[18:102, :] x_t = np.reshape(x_t, [84, 84, 1]) return x_t.astype(np.uint8) class ImageToPyTorch(gym.ObservationWrapper): """ Each frame is converted to PyTorch tensors """ def __init__(self, env): super(ImageToPyTorch, self).__init__(env) old_shape = self.observation_space.shape self.observation_space = gym.spaces.Box(low=0.0, high=1.0, shape=(old_shape[-1], old_shape[0], old_shape[1]), dtype=np.float32) def observation(self, observation): return np.moveaxis(observation, 2, 0) class BufferWrapper(gym.ObservationWrapper): """ Only every k-th frame is collected by the buffer """ def __init__(self, env, n_steps, dtype=np.float32): super(BufferWrapper, self).__init__(env) self.dtype = dtype old_space = env.observation_space self.observation_space = gym.spaces.Box(old_space.low.repeat(n_steps, axis=0), old_space.high.repeat(n_steps, axis=0), dtype=dtype) def reset(self): self.buffer = np.zeros_like(self.observation_space.low, dtype=self.dtype) return self.observation(self.env.reset()) def observation(self, observation): self.buffer[:-1] = self.buffer[1:] self.buffer[-1] = observation return self.buffer class PixelNormalization(gym.ObservationWrapper): """ """ def observation(self, obs): return np.array(obs).astype(np.float32) / 255.0 def create_mario_env(env): env = MaxAndSkipEnv(env) env = MarioRescale84x84(env) env = ImageToPyTorch(env) env = BufferWrapper(env, 4) env = PixelNormalization(env) return JoypadSpace(env, SIMPLE_MOVEMENT) class DQNSolver(nn.Module): """ Convolutional Neural Net with 3 conv layers and two linear layers """ def __init__(self, input_shape, n_actions): super(DQNSolver, self).__init__() chúng tôi = nn.Sequential( nn.Conv2d(input_shape[0], 32, kernel_size=8, stride=4), nn.ReLU(), nn.Conv2d(32, 64, kernel_size=4, stride=2), nn.ReLU(), nn.Conv2d(64, 64, kernel_size=3, stride=1), nn.ReLU() ) conv_out_size = self._get_conv_out(input_shape) chúng tôi = nn.Sequential( nn.Linear(conv_out_size, 512), nn.ReLU(), nn.Linear(512, n_actions) ) def _get_conv_out(self, shape): o = self.conv(torch.zeros(1, *shape)) return int( def forward(self, x): conv_out = self.conv(x).view(x.size()[0], -1) return self.fc(conv_out) class DQNAgent: def __init__(self, state_space, action_space, max_memory_size, batch_size, gamma, lr, dropout, exploration_max, exploration_min, exploration_decay, double_dqn, pretrained): # Define DQN Layers self.state_space = state_space self.action_space = action_space self.double_dqn = double_dqn self.pretrained = pretrained self.device = 'cuda' if torch.cuda.is_available() else 'cpu' # Double DQN network if self.double_dqn: self.local_net = DQNSolver(state_space, action_space).to(self.device) self.target_net = DQNSolver(state_space, action_space).to(self.device) if self.pretrained: self.local_net.load_state_dict(torch.load("", map_location=torch.device(self.device))) self.target_net.load_state_dict(torch.load("", map_location=torch.device(self.device))) self.optimizer = torch.optim.Adam(self.local_net.parameters(), lr=lr) chúng tôi = 5000 # Copy the local model weights into the target network every 5000 steps chúng tôi = 0 # DQN network else: chúng tôi = DQNSolver(state_space, action_space).to(self.device) if self.pretrained: self.dqn.load_state_dict(torch.load("", map_location=torch.device(self.device))) self.optimizer = torch.optim.Adam(self.dqn.parameters(), lr=lr) # Create memory self.max_memory_size = max_memory_size if self.pretrained: self.STATE_MEM = torch.load("") self.ACTION_MEM = torch.load("") self.REWARD_MEM = torch.load("") self.STATE2_MEM = torch.load("") self.DONE_MEM = torch.load("") with open("ending_position.pkl", 'rb') as f: self.ending_position = pickle.load(f) with open("num_in_queue.pkl", 'rb') as f: self.num_in_queue = pickle.load(f) else: self.STATE_MEM = torch.zeros(max_memory_size, *self.state_space) self.ACTION_MEM = torch.zeros(max_memory_size, 1) self.REWARD_MEM = torch.zeros(max_memory_size, 1) self.STATE2_MEM = torch.zeros(max_memory_size, *self.state_space) self.DONE_MEM = torch.zeros(max_memory_size, 1) self.ending_position = 0 self.num_in_queue = 0 self.memory_sample_size = batch_size # Learning parameters self.gamma = gamma self.l1 = nn.SmoothL1Loss().to(self.device) # Also known as Huber loss self.exploration_max = exploration_max self.exploration_rate = exploration_max self.exploration_min = exploration_min self.exploration_decay = exploration_decay def remember(self, state, action, reward, state2, done): """Store the experiences in a buffer to use later""" self.STATE_MEM[self.ending_position] = state.float() self.ACTION_MEM[self.ending_position] = action.float() self.REWARD_MEM[self.ending_position] = reward.float() self.STATE2_MEM[self.ending_position] = state2.float() self.DONE_MEM[self.ending_position] = done.float() self.ending_position = (self.ending_position + 1) % self.max_memory_size # FIFO tensor self.num_in_queue = min(self.num_in_queue + 1, self.max_memory_size) def batch_experiences(self): """Randomly sample 'batch size' experiences""" idx = random.choices(range(self.num_in_queue), k=self.memory_sample_size) STATE = self.STATE_MEM[idx] ACTION = self.ACTION_MEM[idx] REWARD = self.REWARD_MEM[idx] STATE2 = self.STATE2_MEM[idx] DONE = self.DONE_MEM[idx] return STATE, ACTION, REWARD, STATE2, DONE def act(self, state): """Epsilon-greedy action""" if self.double_dqn: chúng tôi += 1 if random.random() < self.exploration_rate: return torch.tensor([[random.randrange(self.action_space)]]) if self.double_dqn: # Local net is used for the policy return torch.argmax(self.local_net( else: return torch.argmax(self.dqn( def copy_model(self): """Copy local net weights into target net for DDQN network""" self.target_net.load_state_dict(self.local_net.state_dict()) def experience_replay(self): """Use the double Q-update or Q-update equations to update the network weights""" if self.double_dqn and chúng tôi % chúng tôi == 0: self.copy_model() return # Sample a batch of experiences STATE, ACTION, REWARD, STATE2, DONE = self.batch_experiences() STATE = ACTION = REWARD = STATE2 = DONE = self.optimizer.zero_grad() if self.double_dqn: # Double Q-Learning target is Q*(S, A) <- r + γ max_a Q_target(S', a) target = REWARD + torch.mul((self.gamma * self.target_net(STATE2).max(1).values.unsqueeze(1)), 1 - DONE) current = self.local_net(STATE).gather(1, ACTION.long()) # Local net approximation of Q-value else: # Q-Learning target is Q*(S, A) <- r + γ max_a Q(S', a) target = REWARD + torch.mul((self.gamma * self.dqn(STATE2).max(1).values.unsqueeze(1)), 1 - DONE) current = self.dqn(STATE).gather(1, ACTION.long()) loss = self.l1(current, target) loss.backward() # Compute gradients self.optimizer.step() # Backpropagate error self.exploration_rate *= self.exploration_decay # Makes sure that exploration rate is always at least 'exploration min' self.exploration_rate = max(self.exploration_rate, self.exploration_min) def show_state(env, ep=0, info=""): """While testing show the mario playing environment""" plt.figure(3) plt.clf() plt.imshow(env.render(mode='rgb_array')) plt.title("Episode: %d %s" % (ep, info)) plt.axis('off') display.clear_output(wait=True) display.display(plt.gcf()) def run(training_mode, pretrained, double_dqn, num_episodes=1000, exploration_max=1): env = gym_super_mario_bros.make('SuperMarioBros-1-1-v0') env = create_mario_env(env) # Wraps the environment so that frames are grayscale observation_space = env.observation_space.shape action_space = env.action_space.n agent = DQNAgent(state_space=observation_space, action_space=action_space, max_memory_size=30000, batch_size=32, gamma=0.90, lr=0.00025, dropout=0.2, exploration_max=1.0, exploration_min=0.02, exploration_decay=0.99, double_dqn=double_dqn, pretrained=pretrained) # Restart the enviroment for each episode num_episodes = num_episodes env.reset() total_rewards = [] if training_mode and pretrained: with open("total_rewards.pkl", 'rb') as f: total_rewards = pickle.load(f) for ep_num in tqdm(range(num_episodes)): state = env.reset() state = torch.Tensor([state]) total_reward = 0 steps = 0 while True: if not training_mode: show_state(env, ep_num) action = agent.act(state) steps += 1 state_next, reward, terminal, info = env.step(int(action[0])) total_reward += reward state_next = torch.Tensor([state_next]) reward = torch.tensor([reward]).unsqueeze(0) terminal = torch.tensor([int(terminal)]).unsqueeze(0) if training_mode: agent.remember(state, action, reward, state_next, terminal) agent.experience_replay() state = state_next if terminal: break total_rewards.append(total_reward) if ep_num != 0 and ep_num % 100 == 0: print("Episode {} score = {}, average score = {}".format(ep_num + 1, total_rewards[-1], np.mean(total_rewards))) num_episodes += 1 print("Episode {} score = {}, average score = {}".format(ep_num + 1, total_rewards[-1], np.mean(total_rewards))) # Save the trained memory so that we can continue from where we stop using 'pretrained' = True if training_mode: with open("ending_position.pkl", "wb") as f: pickle.dump(agent.ending_position, f) with open("num_in_queue.pkl", "wb") as f: pickle.dump(agent.num_in_queue, f) with open("total_rewards.pkl", "wb") as f: pickle.dump(total_rewards, f) if agent.double_dqn:, ""), "") else:, ""), ""), ""), ""), ""), "") env.close() # For training run(training_mode=True, pretrained=False, double_dqn=True, num_episodes=1, exploration_max = 1) # For Testing run(training_mode=False, pretrained=True, double_dqn=True, num_episodes=1, exploration_max = 0.05)



DDQN takes much less episode to train in comparison with DQN. Thus, the DDQN network helps to eliminate overestimation issues found in the DQN network. Both DQN and DDQN networks are able to train better on right-only movement compared to simple & complex movement action space.

About the Author

Connect with me on LinkedIn Here.

Thanks for giving your time!

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Difference Between Deep Learning And Neural Network

Deep learning and neural networks are both machine learning methods that are used to identify patterns and make predictions. While the two terms are often used interchangeably, there are important differences between them that can have significant implications for their use.

What is Deep Learning?

Deep learning is a broader category of machine learning that encompasses neural networks and other approaches. Deep learning involves training models to recognize patterns in data by processing multiple layers of information. These models can learn from vast amounts of data and can recognize patterns that are too complex for humans to identify.

What is Neural Network?

The term “Neural Networks” is used to describe a system of virtual neurons or nodes that is loosley modelled after the neural networks that make up the brains of various animals. A lot of today’s AI has its roots in this technique. In fact, research indicates that the current implications and applications of AI are just the result of the evolution of the special qualities of neural networks (such as machine learning, deep learning, etc.).

Computer science, physics, information science, psychology, and engineering all have a hand in developing and refining the neural network paradigm. Neural networks are networks of nodes whose functioning is inspired by animal neurons but only in a very general way. Neural networks are widely employed in many fields today, from issue solving and consumer research to data validation and sales forecasting and risk management.

Differences: Deep Learning and Neural Network

One of the key differences between neural networks and deep learning is their complexity. Neural networks are relatively simple compared to deep learning models. They typically consist of a single layer of neurons that are connected to each other. These networks are effective at recognizing simple patterns in data, but they are not able to handle complex data sets.

Deep learning models, on the other hand, can process multiple layers of data and can recognize complex patterns that are not immediately visible to humans. This makes them ideal for applications like image and speech recognition, where the data is highly complex and requires sophisticated processing.

Another important difference between neural networks and deep learning is the way that they are trained. Neural networks are typically trained using a process called backpropagation, which involves adjusting the weights of the connections between neurons based on the error between the predicted output and the actual output.

Deep learning models, on the other hand, are trained using a process called gradient descent, which involves adjusting the weights of the connections between layers of neurons based on the gradient of the error function. This allows the model to learn more complex patterns in the data and to make more accurate predictions.

The following table highlights the major differences between Neural Network and Deep Learning −


Neural Network

Deep Learning


Neural network, also called artificial neural network, is an information processing model that stimulates the mechanism of learning biological organisms.

It is inspired by the idea of how the nervous system operates. The nervous system contains cells which are referred to as neurons.

Similarly, neural networks consist of nodes which mimic the biological function of neurons.

Deep learning, on the other hand, is much broader concept than artificial neural networks and includes several different areas of connected machines.

Deep learning is an approach to AI and a technique that enables computer systems to improve with experience and data.


Neural networks are simple architectural models based on how the nervous system works and are divided into single-layer and multi-layer neural networks. The simple instantiation of a neural network is also referred to as the perceptron.

In the single-layer network, a set of inputs is mapped directly onto an output using generalized variation of a linear function. In multi-layer networks, as the name suggests, the neurons are arranged in layers, in which a layer of neutrons is sandwiched between the input layer and output layer, which is called the hidden layer.

Deep learning architecture, on the other hand, is based on artificial neural networks.


Neural networks allow modeling of non-linear processes, so they make great tools for solving several different problems such as classification, pattern recognition, clustering, prediction and analysis, control and optimization, machine translation, decision making, machine learning, deep learning and more.

Deep learning models can be applied to various fields including speech recognition, natural language processing, self-driving vehicles, computer-aided diagnosis, voice assistant, sound creation, robotics, computer games, image recognition, brain cancer detection, social network filtering, pattern recognition, biomedicine, and more.


The main difference between deep learning and neural networks is the complexity of the models and the way that they are trained. Neural networks are simpler and more limited in their capabilities, while deep learning models are more complex and can handle more complex data sets.

Both approaches have their strengths and weaknesses, and the choice between them will depend on the specific application and the type of data that is being analyzed.

Some Advanced Opencv Operations For Your Computer Vision Project!

This article was published as a part of the Data Science Blogathon


Computer Vision is said to be one of the most interesting application fields of Machine Learning. Computer Vision. As implied by the name, Computer Vision enables computers to detect, identify, and recognize, objects and patterns in the 3D surroundings.

If you have not viewed my previous articles on Computer Vision and would like to do so, kindly navigate to the following hyperlinks:

This article will introduce you to more features of the OpenCV Library.

Source: Analytics Vidhya.


As we seek to explore more about the world of OpenCV, let us take a moment to recap, or brush up, on what we have learned up to this point.

We have obtained a short, general insight into what is Machine Learning?

We have been introduced to the OpenCV Library.


We loaded an image into system memory and observed an image in two different colour formats, viz., standard (default) and GRAYSCALE.

We understand the base nature in which OpenCV represents images, i.e., The NumPy Array comprising integers, representing pixel intensity. Up to this point, we have only looked at a GRAYSCALE image.

We viewed a few important properties of the image array such as the contents, shape, and type of data.

OpenCV Python Programming.

Source: Pinterest.

Loading The Image.

The first and foremost step will be to load the image into our system memory as follows:

import cv2 # load it in GRAYSCALE color mode... image = cv2.imread("""C:/Users/Shivek/Pictures/cd0c13629f217c1ab72c61d0664b3f99.jpg""", 0) cv2.imshow('Analytics Vidhya Computer Vision- chúng tôi Gray', image) cv2.waitKey() cv2.destroyAllWindows()

Please Note: Replace C:/Users/Shivek/Pictures/cd0c13629f217c1ab72c61d0664b3f99.jpg with the location the image is stored on your personal computer (YOUR file path).

have loaded the image in GRAYSCALE colour format.

(Explanation to the above code will be omitted as it is the same as the previous article(s).

Output to the above code block will be seen as below:

Obtaining and Understanding Image Properties.

Let us gain insight into our image by viewing the associated properties. We will now talk about the shape of the image.


Output is as below:

Notice that the shape method returns two values to us in the form of a tuple. These two values represent the height (y-axis) and width (x-axis) of the image, respectively. Essentially what I am saying is:

print("The Image Height Is: %d Pixels"%(image.shape[0])) print("The Image Width Is: %d Pixels"%(image.shape[1]))

as below:

There are some situations in which the shape method returns a tuple of three integer values. The first value represents the height (y-axis), the second value the width (x-axis), and the third value, the number of color channels.

one value that ranges from 0 to 255.

Looking at Image Color Channels.

For example, and experience purposes, let us re-load our nature image, and view the shape properties- However, in this particular instance, we are loading the image in standard color format.

import cv2 # re-load it in color mode... image = cv2.imread("C:/Users/Shivek/Pictures/cd0c13629f217c1ab72c61d0664b3f99.jpg", cv2.IMREAD_COLOR) cv2.imshow('Analytics Vidhya Computer Vision- chúng tôi Color', image) cv2.waitKey() cv2.destroyAllWindows()

Please Note: Replace the file path in the code, with the absolute location of the image on your personal computer.

The color image will be seen as below:

Our image in color is as downloaded- i.e., it has not been manipulated in any way whatsoever. Now let us proceed to use the shape method on this image.

print(image.shape) print("Image Height: %d Pixels"%(image.shape[0])) print("Image Width: %d Pixels"%(image.shape[1])) print("Number Of Color Channels: %d "%(image.shape[2]))

Output to the above code will be seen as follows:

Notice that there is a third number present in the tuple. This is the number of color channels present in the image. The particular color channel present in this image is BGR Color Channel. It is BGR, not RGB. By default, OpenCV was created to use the BGR Color Channel.

allowing for the colors to be mixed and manipulated. We will focus more on color images in future articles.

Learning How to Resize an Image.

Coming back to our GRAYSCALE image, we will now attempt to resize the image, to make it cover a smaller surface area on our computer screen. To rescale an image, we need to use the resize() method offered by the OpenCV Library.

import cv2 # load the image in a GRAYSCALE format image = cv2.imread("C:/Users/Shivek/Pictures/cd0c13629f217c1ab72c61d0664b3f99.jpg", cv2.IMREAD_GRAYSCALE) cv2.imshow('Analytics Vidhya Computer Vision- Nature Standard Image', image) cv2.waitKey() cv2.destroyAllWindows() # resize the image to be pixel dimensions 350 by 350 resized_image = cv2.resize(image, dsize=(350, 350)) cv2.imshow('Analytics Vidhya Computer Vision- Nature Resized (350, 350)', resized_image) cv2.waitKey() cv2.destroyAllWindows() Explanation of the second block of code: resized_image = cv2.resize(src=image, dsize=(350, 350))

We make use of the resize() method to resize the image at hand- this could mean either increasing the size or decreasing it. We chose to decrease the image size, i.e., make it smaller. This method takes in two primary arguments:

be resized).

It requires that you specify the pixel dimensions for height and width, between which the image will be resized. The values need to be passed in an immutable tuple. We have specified a height and width of both 350 pixels.

cv2.imshow('Analytics Vidhya Computer Vision- Nature Resized (350, 350)', resized_image) cv2.waitKey() cv2.destroyAllWindows()

Up to this point, one should be familiar with what the three lines of code above do- It will display the window, wait an indefinite period of time, and terminate all windows upon user command.

The last task for us would be to verify that the image size has been reduced.

The output will be seen as follows:

This concludes my article on Advanced OpenCV Features. I do hope that you enjoyed reading through this article and have added new information to your knowledge base.

Please feel free to connect with me on LinkedIn.

Thank you for your time.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Update the detailed information about Top 8 Computer Vision Techniques Entwined With Deep Learning on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!