Manually designate a timeout by using the VISA Set Timeout and referring to Configuring Timeout for VISA Sessions. If you installed a third-party instrument driver, follow the steps in After Installing 3rd Party VISA Software NI VISA No Longer Works to make sure NI-VISA is the primary implementation of VISA on your system. Thanks for this, you saved me (I forget to delete an old ip from hosts file, and I waste days on find the solution, why I can’t load the site, while every other device to site was just fine). Tom Zsomborgi December 15, 2019 at 10:23 am. Bus 002 Device 002: ID 0a12:0001 Cambridge Silicon Radio, Ltd Bluetooth Dongle (HCI mode) Bus 002 Device 003: ID 04f3:0235 Elan Microelectronics Corp. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus. JAVAHOME=C:Program Filesjavajdk1.8.0121 'C:Program FilesAndroidAndroid Studiogradlegradle-2.14.1bingradle' is not recognized as an internal or external command.
For users on the cloud, the timeout problem in the business log is often difficult to deal with, because 1) the problem may be in the cloud infrastructure layer, or in the business software layer, and the scope of troubleshooting is very wide; 2) such problems are often non recurring problems, and it is difficult to catch the scene. In this paper, we will analyze how to identify and troubleshoot the root causes of such problems.
Service timeout! = network packet loss
Due to different business forms and different software implementation languages and frameworks, the information printed in the business log may be different, such as the following keywords:
“Socketimeout”, “read timed out”, “request timeout”, etc
It belongs to the type of network timeout in form, but a clear concept is needed: this kind of problem occurs because the request exceeds the set timeout time. This setting may come from the client, server or network intermediate node, which is the direct reason. Network packet loss may lead to timeout, but it is not sufficient. The relationship between service timeout and network packet loss is summarized as follows:
Network packet loss may cause service timeout, but the cause of service timeout is not necessarily packet loss.
After clarifying the causal relationship, let’s see how to analyze the business timeout. If we arbitrarily equate the service timeout with the network jitter packet loss, the troubleshooting and analysis process will completely miss the cause of the service software layer itself, and it is easy to enter the dilemma.
This article will analyze the service timeout from the cloud infrastructure layer and business software layer. Generally speaking, the cause of packet loss at the basic setting level is relatively easy to troubleshoot. Alibaba cloud has a complete bottom-level monitoring. According to the corresponding time period of error reporting in the business log, you can determine whether there is an infrastructure network problem from the monitoring data. The timeout of business layer is usually the setting of software layer, which is related to software implementation and business form, which is often more difficult to troubleshoot.
Why network packet loss leads to service timeout
Network jitter may cause service timeout. The main reason is that network jitter will bring different degrees of delay. In this paper, TCP, which has been widely used in the Internet, is introduced. A packet loss has no effect on the integrity of data transmission, because the TCP protocol itself has been designed to deal with packet loss and disorder. And all the retransmission processing is done in the kernel TCP protocol stack. The user space process of the operating system is not aware of this processing. The only side effect of packet loss is to increase the delay. If the delay is long enough to reach a certain timeout set by the application process, the service timeout is displayed on the service application side.
Whether or not a timeout occurs when packet loss occurs depends on the timeout setting of the application process. For example, in data transmission, only one TCP packet is lost, causing a timeout retransmission after 200ms:
- If the timeout set by the application is 100 ms, the TCP protocol stack has no chance to retransmit, the application will consider it as timeout and close the connection;
- If the timeout set by the application is 500 ms, the TCP protocol stack will complete retransmission, which is transparent to the application process. The only perception of the application is that the processing time of this message interaction is 200 ms longer than the baseline processing time, which has a very small impact on applications with not very high time sensitivity.
How much is the delay?
Is there a quantitative value to refer to when setting the timeout time of the application process? Although RTT / RTO in TCP is dynamic, the impact of TCP packet loss can be summarized quantitatively.
There are two main types of delay caused by packet loss:
- TCP connection timeout. If the network jitter unfortunately loses the first connection syn message of TCP, the client will retransmit the syn message to initiate connection again after 1 second (as defined in draft RFC 2988bis-02) for the not too old kernel version. One second is very large for the Intranet environment. For a machine room in an area of alicloud, the normal RTT is at the level of a small number of milliseconds. If there is no packet loss within one second, it can complete the interaction of 100 datagrams.
- TCP intermediate packet loss. There are two mechanisms to deal with packet loss in TCP protocol: fast retransmission and overtime retransmission. Fast retransmission is usually faster, related to RTT, and has no quantitative value. Timeout retransmission (RTO, retransmission timeout) is also related to RTT. However, the minimum value of RTO defined in Linux is TCP? RTO? Min = 200ms. So in the network environment where RTT is relatively small, even if RTT is less than 1ms, the minimum value of RTO for TCP overtime retransmission can only be 200ms. The delay caused by such packet loss is relatively small.
In addition to packet loss, another common type of delay is the delay caused by TCP delayed ack. This is related to the protocol design and has nothing to do with packet loss. It is summarized here as the delay quantification part. It is easy to trigger in the scenario of interactive data flow + Nagle algorithm. The minimum value of delayed ack defined in Linux, TCP? Delay? Min, is 40 ms.
Therefore, the following quantitative time can be summarized for reference:
- 40 msIn the interactive data flow, the minimum delay value for the scenario where TCP delayed ack + Nagle algorithm is enabled.
- 200 msIn the normal network environment with small RTT, the minimum value of TCP packet loss and timeout retransmission.
- 1 s, retransmission time when the first syn packet established by the newer kernel version TCP loses packet, initial RTO value “TCP” timeout “init defined in rfc2988bis.
- 3 s, retransmission time when the first syn packet established by the older kernel version TCP loses packet, initial RTO value “TCP” timeout “init defined in RFC 1122.
Cloud infrastructure network packet loss
There may be many reasons for packet loss in infrastructure network, which are summarized as follows:
Cloud infrastructure network jitter
There may be software and hardware problems in the host virtualization network where network links, physical network devices, ECS / RDS, etc. are located. The cloud infrastructure has made complete redundancy to ensure the rapid isolation, switching and recovery in case of problems.
phenomenon: because there are network redundant devices and they can be recovered quickly, this kind of problem is usually manifested as a single point in time network jitter, usually in seconds. The specific phenomenon of jitter is that the new connection fails in that period of time, the established connection is interrupted, and the service may show timeout.
Influence surface: there are usually many hosts attached to the network device, which usually have a large impact, such as multiple ECS connections to RDS at the same time.
Speed limit packet loss of cloud products
Many network cloud products have specifications and bandwidth options when they are sold, such as ECs, SLB, NAT gateway, etc. When the traffic or connection number of cloud products exceeds the specification or bandwidth limit, packet loss will also occur. This kind of packet loss is not the fault of the cloud manufacturer, but the deviation of the actual business flow scale and the selection of cloud product specifications. This kind of problem can usually be distinguished from the monitoring provided by cloud products.
phenomenon: when the traffic or connection number exceeds the specification, the traffic or connection is discarded. Problems may occur intermittently and continuously, and the probability of network traffic peak is greater.
Influence surface: usually only a single instance is affected. However, for NAT gateway SNAT scenario, multiple instances after SNAT may be affected.
Operator network problems
In the public network scenario, the message interaction between the client and the server often goes through multiple as (autonomous system). If there is a problem in the intermediate link of the operator, it will lead to end-to-end packet loss.
phenomenon: with bi-directional MTR, you can see that a hop starts to lose packets in the middle of the link.
Influence surface: the influence surface may be large, which may affect the message interaction through an as link. But for a single user, the impact is usually only on a specific instance.
Timeout raised by timeout of application settings
The above analysis analyzes the application timeout caused by packet loss due to abnormal or speed limit reasons of several infrastructure. Generally speaking, as long as the packet loss point can be found, the root cause can be basically found. However, in some cases, all kinds of network monitoring shows no sign of packet loss. At this time, we need to continue to check from the application side. Because of the diversity of application forms, here are two typical examples to explain why there will be timeout in the business log when there is no network packet loss.
An example of timeout when ECS virtual machine accesses the third-party API
Problem phenomenon
The ECS server on the user cloud needs to access the API of the third-party server through the HTTP protocol, but it is found that the “request timeout” error occurs in the business log when accessing the third-party API from time to time, and the root cause needs to be found out.
Investigation thought
- According to the basic idea, first check whether the ECS instance and link have packet loss from the monitoring. The result shows that there is no packet loss that matches the time point of timeout error in the business log.
- In this case, we can only make further use of the bag grabbing to explore the problem.
Capture analysis
After getting the packet capture, you can filter whether there is retransmission through Wireshark’s “analysis expert information” or the following expression.
tcp.analysis.retransmission
It turns out that no packet loss is seen, which proves that the problem is not caused by packet loss. If it is not caused by packet loss, why does it cause problems? At this time, we need to further analyze the packet capturing.
We can notice that there is a keyword such as “request timeout” in the business log, which literally means the HTTP request timeout to the outside. In terms of HTTP protocol, it can be translated as: “HTTP request has been sent to the opposite end, but the opposite end has not returned a complete response in a period of time.”. Along with this idea, we can see whether there is only HTTP request without HTTP response in the message. In Wireshark, you can use the following methods to filter HTTP requests without HTTP responses:
http.request.method GET && !http.response_in
Sure enough, we found some of these HTTP requests. Select one of the HTTP requests, and then follow the TCP stream to see the whole process of message interaction as follows:
According to the packet capturing, there are some findings as follows:
- In TCP stream, you can see that there are two HTTP requests on a TCP connection, so ECS uses a long connection to access the third-party API.
- The first HTTP get request (package 735), returned after 65 ms (package 778).
- The second HTTP get request (package 780) does not have a corresponding HTTP response return (it is the message filtered by this condition).
- In packet 954, the client actively fin lost the TCP connection before receiving the HTTP response. This is a very unusual behavior, and it is initiated by the client. Looking closely at the time interval between fin and the second HTTP get request, we found that it was about 300 ms. Next, look at the other unresponsive HTTP streams, with an interval of about 300 ms.
At this point, we have reason to infer that the ECS server actively fin lost the TCP connection after sending an HTTP request to the third-party API for 300 ms. This may be the timeout set by the client in the program, and the business program may have its own retry logic after timeout.
The user finally confirms that the timeout setting exists in the business software.
Summary of problems
1) is the 300 ms timeout reasonable?
It can be seen from the packet capture that the RTT of the ECs to the end API server is about 7 ms, and it is inferred that it is a city wide access. For single digit millisecond level RTT, there is a certain margin for 300 ms timeout, and it may even allow a timeout retransmission (200 ms).
2) the root cause of the problem?
This problem is mainly caused by the unstable speed of processing requests by the peer API server. Some requests are processed and returned in tens of milliseconds, and some are not processed in 300 ms. This instability may be related to the resource water level and pressure of the API server, but this is a black box and needs to be analyzed on the opposite end.
3) solutions
1 > the best solution is to contact the owner of the peer API server to find the root cause and root it out.
2 > the temporary solution is to adjust and increase the client timeout set on ECs.
An example of ECS intranet access self built redis timeout
The timeout of ECS accessing RDS / cache or self built database / cache is another type of problem. Let’s use an ECS intranet accessing byte redis timeout to illustrate this type of problem.
Problem phenomenon
When jedis, the redis client, is used on ECs to access the redis server built on ECs, occasionally an error is reported as follows:
redis.clients.jedis.exceptions.JedisConnectionException: java.Net.SocketTimeoutException: Read timed out
Investigation thought
- This kind of problem is very common because of redis slow query. The user has checked redis’s big key and slow query by himself and found no query with a long time. So it needs to be further confirmed at the network level.
- According to the basic idea, first check whether the ECS instance and link have packet loss from the monitoring. It was found that there was no packet loss that matched the error time of “read timed out”.
- Further use of the problem when the bag to explore. Because of the occasional problems, it is necessary to deploy the cyclic packet grabbing by using the tcpdump-c-w parameter on the client side, and stop the cyclic packet grabbing to check after the problems occur.
Capture analysis
After getting the packet capture, check whether there is packet loss and retransmission first. As a result, no packet loss and retransmission are found. Unlike the previous example, this example has no way to filter packets through certain characteristics. So we can only find the location of the corresponding package according to the time point of error reported in the jedis log to see if there is any clue.
Find the corresponding message according to the error time point of jedis log, and see the whole process of message interaction with TCP stream as follows (jedis client is 9.20, redis server is 20.66):
According to the following findings:
- You can see that there is no retransmission in the middle, and it looks normal on the network.
- The client uses a long connection to access the server. Before the connection ends, there are many redis requests and responses.
- In packet 181073, the server fin lost the connection. In packet 181078, the client sent TCP reset.
The above information is obviously not enough to explain why there is a “read timed out” error. The last TCP reset initiated from the client may be an interference item. You can see that the TCP reset is sent after receiving the fin from the server. For the normal process of TCP four times recycling to end the connection, the client should also send a fin to the server to end the connection after receiving the server’s fin. However, TCP has a linger option, which can control this behavior. After the linger option is set, the client can directly return to reset, so that both parties can quickly close this set of sockets, avoiding the active shutdown and putting into the time ﹣ wait state of up to 60 seconds. It seems that it’s caused by the client’s linger setting. After searching the jedis code, there is this setting in connection.java (below), so that we can end why the client resets the TCP connection. This behavior is logical.
Next, check whether the redis command in the message interaction is normal. Tracking TCP stream can see the data in ASCII form, as follows:
You can see that after the client sends del command, it sends quit command again, and you can check against the message.
- The client issued the Del command in package 181061.
- Received the ack of redis server after 3.9 Ms. note that this is only an ACK packet, and the message length is 0, indicating that there is no payload data.
- In package 181070, the client issued another quit command. Note that the difference between the package and the Del command is about 200 ms. The quit command is used to close the connection with the current client and redis server. Once all the replies (if any) waiting are written to the client successfully, the connection will be closed.
- In package 181072, 161 ms after the quit command is issued, the redis server replies “: 1” and “+ OK”. Where “: 1” responds to del command and “+ OK” responds to quit command.
- In packet 181073, redis ended the long TCP connection with a fin message.
As mentioned above, the key point of the connection being interrupted is that the client sends the quit command to the redis server. As for why the quit is sent, and when the previous command does not return 200 ms after the command is sent, it is likely that there is a timeout setting. Look at the other several TCP streams. They basically end a long TCP connection in a similar mode. Basically, we can draw this conclusion.
ارور Can't Init Device. Reason Cmd Timeout Error
This case is very similar to the first case. The difference is that in packet capturing, we can’t see that the client directly fin drops the connection after the timeout. Instead, we issue the redis quit command, and finally close the connection after the execution of the previous command. Compared with the first method, this is a more elegant method, provided that there is a quit command in redis and jedis internalizes this operation. This case more clearly illustrates the impact of specific services on connection behavior. It is necessary to use packets to reverse the interaction behavior between redis client and server.
Can't Init Device. Reason Cmd Timeout Error G532g
summary
This paper introduces two aspects that need to be considered when dealing with the timeout problem in business log: cloud infrastructure layer and business software layer. A considerable part of the problems may be caused by the network packet loss of the infrastructure. It is very important to locate the packet loss point through the network monitoring and cloud monitoring of network products, and pay attention not to equate the service timeout with the packet loss. The timeout caused by the timeout setting of the other service software layer is relatively small, but it needs more extensive troubleshooting, and it is not easy to ignore the causes Timeout.
Author: Huai Zhi
Read the original text
A3s Can't Init Device Reason Cmd Timeout Error
This is the original content of yunqi community, which can not be reproduced without permission.