Improved Minkowsky Metric for Image Region Partition

Author(s):  
Ionut Pirnog ◽  
Cristina Oprea ◽  
Constantin Paleologu ◽  
Dragos Nicolae Vizireanu
2008 ◽  
Vol 28 (6) ◽  
pp. 1073-1078 ◽  
Author(s):  
杨海涛 Yang Haitao ◽  
常义林 Chang Yilin ◽  
霍俊彦 Huo Junyan ◽  
熊联欢 Xiong Lianhuan ◽  
林四新 Lin Sixin

2012 ◽  
Vol 31 (6) ◽  
pp. 1628-1630
Author(s):  
Jia-jia OU ◽  
Bi-ye CAI ◽  
Bing XIONG ◽  
Feng LI

Author(s):  
Kholilatul Wardani ◽  
Aditya Kurniawan

 The ROI (Region of Interest) Image Quality Assessment is an image quality assessment model based on the SSI (Structural Similarity Index) index used in the specific image region desired to be assessed. Output assessmen value used by this image assessment model is 1 which means identical and -1 which means not identical. Assessment model of ROI Quality Assessment in this research is used to measure image quality on Kinect sensor capture result used in Mobile HD Robot after applied Multiple Localized Filtering Technique. The filter is applied to each capture sensor depth result on Kinect, with the aim to eliminate structural noise that occurs in the Kinect sensor. Assessment is done by comparing image quality before filter and after filter applied to certain region. The kinect sensor will be conditioned to capture a square black object measuring 10cm x 10cm perpendicular to a homogeneous background (white with RGB code 255,255,255). The results of kinect sensor data will be taken through EWRF 3022 by visual basic 6.0 program periodically 10 times each session with frequency 1 time per minute. The results of this trial show the same similar index (value 1: identical) in the luminance, contrast, and structural section of the edge region or edge region of the specimen. The value indicates that the Multiple Localized Filtering Technique applied to the noise generated by the Kinect sensor, based on the ROI Image Quality Assessment model has no effect on the image quality generated by the sensor.


Author(s):  
Lianli Gao ◽  
Pengpeng Zeng ◽  
Jingkuan Song ◽  
Yuan-Fang Li ◽  
Wu Liu ◽  
...  

To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA. Compared with image QA that focuses primarily on understanding the associations between image region-level details and corresponding questions, video QA requires a model to jointly reason across both spatial and long-range temporal structures of a video as well as text to provide an accurate answer. In this paper, we specifically tackle the problem of video QA by proposing a Structured Two-stream Attention network, namely STA, to answer a free-form or open-ended natural language question about the content of a given video. First, we infer rich longrange temporal structures in videos using our structured segment component and encode text features. Then, our structured two-stream attention component simultaneously localizes important visual instance, reduces the influence of background video and focuses on the relevant text. Finally, the structured two-stream fusion component incorporates different segments of query and video aware context representation and infers the answers. Experiments on the large-scale video QA dataset TGIF-QA show that our proposed method significantly surpasses the best counterpart (i.e., with one representation for the video input) by 13.0%, 13.5%, 11.0% and 0.3 for Action, Trans., TrameQA and Count tasks. It also outperforms the best competitor (i.e., with two representations) on the Action, Trans., TrameQA tasks by 4.1%, 4.7%, and 5.1%.


1994 ◽  
Vol 3 (6) ◽  
pp. 868-872 ◽  
Author(s):  
Yian-Leng Chang ◽  
Xiaobo Li
Keyword(s):  

2021 ◽  
Vol 16 (2) ◽  
pp. 170-178
Author(s):  
Ting Da

In this exploration, based on the principle and system parameters of laser three-dimensional (3D) radar imaging technology, the corresponding photoelectric sensor circuit scheme is formulated. The sense circuit of avalanche photon diode (APD) converts the signal through the transresistance amplifier circuit. Then, LMH6629 is selected as a precision amplifier with low input noise voltage and low input error current. The capacitance is used as a compensation element to compensate the phase. For the power supply scheme, choosing the mode of switching power supply and LDO to work together can improve the efficiency of power supply and reduce the output of current ripple. At the same time, semantic segmentation is carried out for the obtained photoelectric images. Based on the traditional spatial pyramid pooling algorithm, the fusion of mean intersection over union and cross information entropy loss function is introduced to improve the weight of local image region. In the experiment, Multisim software is used to simulate the circuit. The APD reverse bias voltage is set to 90 V, and the multiplication coefficient is 98.7. The feedback resistance, bandwidth, phase compensation capacitance and other parameters are further calculated. It is found that there is obvious self-excited phenomenon in the output waveform of the transresistance amplifier without phase compensation capacitor. When the feedback capacitance reaches 0.8 pF, the oscillation phenomenon is obviously reduced; further calculation shows that the bandwidth of transresistance amplifier is 230 MHz, and the noise of APD power supply is mainly caused by BUCK switching power supply switch when the bottom noise of oscilloscope is ignored. However, the noise is suppressed under the action of the back-end LDO device; after the loss function is introduced, the contour of the photoelectric image is preserved completely, and then the more accurate segmentation results are obtained.


Sign in / Sign up

Export Citation Format

Share Document